File size: 3,620 Bytes
c508417
 
 
af328fc
 
 
 
 
 
 
c508417
 
 
 
 
 
af328fc
c508417
af328fc
 
 
c508417
 
 
 
 
 
 
af328fc
 
 
 
 
 
 
 
 
c508417
af328fc
 
 
 
 
 
 
 
 
 
 
c508417
 
 
af328fc
c508417
 
 
af328fc
 
c508417
 
 
 
 
 
 
 
af328fc
 
 
c508417
af328fc
c508417
af328fc
 
 
 
 
 
c508417
af328fc
c508417
 
 
 
 
 
 
af328fc
 
c508417
 
 
af328fc
c508417
 
 
af328fc
c508417
 
 
 
af328fc
 
 
 
 
c508417
 
 
 
 
af328fc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: apache-2.0
tags:
  - diffusion
  - dream
  - gguf
  - cpu-inference
  - diffuse-cpp
language:
  - en
base_model: Dream-org/Dream-v0-Instruct-7B
pipeline_tag: text-generation
---

# Dream-v0-Instruct-7B-GGUF

GGUF quantizations of [Dream-org/Dream-v0-Instruct-7B](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) for use with [diffuse-cpp](https://github.com/iafiscal1212/diffuse-cpp), the first C++ inference engine for Diffusion Language Models.

Dream is a masked diffusion language model based on the Qwen2.5-7B backbone with Grouped Query Attention (GQA). It generates all tokens in parallel through iterative refinement, excelling at math and factual tasks.

**Dream correctly solves 15 x 23 = 345 in just 2 denoising steps at 21.6 tok/s — 2.5x faster than llama.cpp.**

## Available Quantizations

| File | Type | Size | Description |
|------|------|------|-------------|
| `dream-7b-f16.gguf` | F16 | ~15 GB | Full precision, best quality |
| `dream-7b-q8_0.gguf` | Q8_0 | ~8.2 GB | 8-bit quantization, near-lossless |
| `dream-7b-q4km.gguf` | Q4_K_M | ~5.0 GB | 4-bit mixed, best speed/quality ratio |

**Recommended:** Q4_K_M for most users.

## Quick Start

```bash
# Download
huggingface-cli download diffuse-cpp/Dream-v0-Instruct-7B-GGUF dream-7b-q4km.gguf

# Build diffuse-cpp (v0.2.0+)
git clone --recursive https://github.com/iafiscal1212/diffuse-cpp.git
cd diffuse-cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

# Run
./build/diffuse-cli -m ../dream-7b-q4km.gguf \
    --tokens "151644,8948,198,2610,525,264,10950,17847,13,151645,198,151644,872,198,3838,374,220,868,1303,220,1419,30,151645,198,151644,77091,198" \
    -n 64 -s 16 -t 12 --remasking entropy_exit
```

## Performance

Benchmarked on AMD EPYC 4465P 12-Core, Q4_K_M, entropy_exit + inter-step cache, B=64:

| Prompt | tok/s | Steps | vs llama.cpp |
|--------|-------|-------|-------------|
| Capital of France? | **21.6** | 2 | 2.5x |
| 15 x 23? | **21.6** | 2 | 2.5x |
| Translate to French | 14.3 | 6 | 1.7x |
| Translate to Spanish | 13.2 | 10 | 1.6x |
| Python is_prime() | 8.2 | 7 | 1.0x |
| Why sky blue? | 4.9 | 16 | 0.6x |
| List planets | 4.9 | 16 | 0.6x |
| Poem about ocean | 4.5 | 16 | 0.5x |
| **Average** | **11.6** | | **1.4x** |

- Dream excels at **math and code** (converges in 2-7 steps)
- 5 of 8 prompts match or beat llama.cpp (8.51 tok/s baseline)
- llama.cpp baseline: Qwen2.5-7B-Instruct, Q4_K_M, same hardware

## Dream vs LLaDA

| Strength | Dream-7B | LLaDA-8B |
|----------|----------|----------|
| Math/Arithmetic | 21.6 tok/s (2 steps) | 6.0 tok/s (16 steps) |
| Code generation | 8.2 tok/s (7 steps) | 4.5 tok/s (15 steps) |
| Translation | 13-14 tok/s | 23-28 tok/s |
| Creative writing | 4.5 tok/s | 5.0 tok/s |

**Use Dream for math, code, factual tasks. Use LLaDA for translation, conversation.**

## Model Details

- **Architecture:** Qwen2.5-7B backbone with bidirectional attention
- **Parameters:** 7.62B
- **Layers:** 28
- **Hidden size:** 3584
- **Attention:** GQA (28 query / 4 KV heads)
- **FFN:** SwiGLU, intermediate 18944
- **Vocabulary:** 152,064 tokens
- **RoPE theta:** 1,000,000
- **Mask token ID:** 151666
- **QKV biases:** Yes (kept at F32 in all quantizations)

## Conversion Details

339 tensors (255 weights + 84 QKV biases). Converted with `convert-dream.py` from diffuse-cpp.

## Citation

```bibtex
@software{diffuse_cpp_2026,
  title={diffuse-cpp: High-Performance Inference for Diffusion Language Models},
  author={Carmen Esteban},
  year={2026},
  url={https://github.com/iafiscal1212/diffuse-cpp}
}
```

## License

Apache 2.0