File size: 3,620 Bytes
c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc c508417 af328fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | ---
license: apache-2.0
tags:
- diffusion
- dream
- gguf
- cpu-inference
- diffuse-cpp
language:
- en
base_model: Dream-org/Dream-v0-Instruct-7B
pipeline_tag: text-generation
---
# Dream-v0-Instruct-7B-GGUF
GGUF quantizations of [Dream-org/Dream-v0-Instruct-7B](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) for use with [diffuse-cpp](https://github.com/iafiscal1212/diffuse-cpp), the first C++ inference engine for Diffusion Language Models.
Dream is a masked diffusion language model based on the Qwen2.5-7B backbone with Grouped Query Attention (GQA). It generates all tokens in parallel through iterative refinement, excelling at math and factual tasks.
**Dream correctly solves 15 x 23 = 345 in just 2 denoising steps at 21.6 tok/s — 2.5x faster than llama.cpp.**
## Available Quantizations
| File | Type | Size | Description |
|------|------|------|-------------|
| `dream-7b-f16.gguf` | F16 | ~15 GB | Full precision, best quality |
| `dream-7b-q8_0.gguf` | Q8_0 | ~8.2 GB | 8-bit quantization, near-lossless |
| `dream-7b-q4km.gguf` | Q4_K_M | ~5.0 GB | 4-bit mixed, best speed/quality ratio |
**Recommended:** Q4_K_M for most users.
## Quick Start
```bash
# Download
huggingface-cli download diffuse-cpp/Dream-v0-Instruct-7B-GGUF dream-7b-q4km.gguf
# Build diffuse-cpp (v0.2.0+)
git clone --recursive https://github.com/iafiscal1212/diffuse-cpp.git
cd diffuse-cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
# Run
./build/diffuse-cli -m ../dream-7b-q4km.gguf \
--tokens "151644,8948,198,2610,525,264,10950,17847,13,151645,198,151644,872,198,3838,374,220,868,1303,220,1419,30,151645,198,151644,77091,198" \
-n 64 -s 16 -t 12 --remasking entropy_exit
```
## Performance
Benchmarked on AMD EPYC 4465P 12-Core, Q4_K_M, entropy_exit + inter-step cache, B=64:
| Prompt | tok/s | Steps | vs llama.cpp |
|--------|-------|-------|-------------|
| Capital of France? | **21.6** | 2 | 2.5x |
| 15 x 23? | **21.6** | 2 | 2.5x |
| Translate to French | 14.3 | 6 | 1.7x |
| Translate to Spanish | 13.2 | 10 | 1.6x |
| Python is_prime() | 8.2 | 7 | 1.0x |
| Why sky blue? | 4.9 | 16 | 0.6x |
| List planets | 4.9 | 16 | 0.6x |
| Poem about ocean | 4.5 | 16 | 0.5x |
| **Average** | **11.6** | | **1.4x** |
- Dream excels at **math and code** (converges in 2-7 steps)
- 5 of 8 prompts match or beat llama.cpp (8.51 tok/s baseline)
- llama.cpp baseline: Qwen2.5-7B-Instruct, Q4_K_M, same hardware
## Dream vs LLaDA
| Strength | Dream-7B | LLaDA-8B |
|----------|----------|----------|
| Math/Arithmetic | 21.6 tok/s (2 steps) | 6.0 tok/s (16 steps) |
| Code generation | 8.2 tok/s (7 steps) | 4.5 tok/s (15 steps) |
| Translation | 13-14 tok/s | 23-28 tok/s |
| Creative writing | 4.5 tok/s | 5.0 tok/s |
**Use Dream for math, code, factual tasks. Use LLaDA for translation, conversation.**
## Model Details
- **Architecture:** Qwen2.5-7B backbone with bidirectional attention
- **Parameters:** 7.62B
- **Layers:** 28
- **Hidden size:** 3584
- **Attention:** GQA (28 query / 4 KV heads)
- **FFN:** SwiGLU, intermediate 18944
- **Vocabulary:** 152,064 tokens
- **RoPE theta:** 1,000,000
- **Mask token ID:** 151666
- **QKV biases:** Yes (kept at F32 in all quantizations)
## Conversion Details
339 tensors (255 weights + 84 QKV biases). Converted with `convert-dream.py` from diffuse-cpp.
## Citation
```bibtex
@software{diffuse_cpp_2026,
title={diffuse-cpp: High-Performance Inference for Diffusion Language Models},
author={Carmen Esteban},
year={2026},
url={https://github.com/iafiscal1212/diffuse-cpp}
}
```
## License
Apache 2.0
|