Update README: correct GGUF sizes + vocab size note
Browse files
README.md
CHANGED
|
@@ -54,10 +54,12 @@ LoRA adapter merged into BF16 weights via `merge_and_unload()`. Exported as shar
|
|
| 54 |
| File | Size | Description |
|
| 55 |
|------|------|-------------|
|
| 56 |
| `model-0000X-of-00004.safetensors` | ~15GB | Merged BF16 weights (full precision) |
|
| 57 |
-
| `ghostshell-4b-Q4_K_M.gguf` | ~
|
| 58 |
-
| `ghostshell-4b-Q8_0.gguf` | ~
|
| 59 |
|
| 60 |
-
**Recommended**: `ghostshell-4b-Q4_K_M.gguf` for llama.cpp, Ollama, LM Studio, or any GGUF-compatible runtime.
|
|
|
|
|
|
|
| 61 |
|
| 62 |
---
|
| 63 |
|
|
|
|
| 54 |
| File | Size | Description |
|
| 55 |
|------|------|-------------|
|
| 56 |
| `model-0000X-of-00004.safetensors` | ~15GB | Merged BF16 weights (full precision) |
|
| 57 |
+
| `ghostshell-4b-Q4_K_M.gguf` | ~5.0GB | Q4_K_M quantization — recommended for most use |
|
| 58 |
+
| `ghostshell-4b-Q8_0.gguf` | ~7.5GB | Q8_0 quantization — near-lossless, for power users |
|
| 59 |
|
| 60 |
+
**Recommended**: `ghostshell-4b-Q4_K_M.gguf` for llama.cpp, Ollama, LM Studio, or any GGUF-compatible runtime.
|
| 61 |
+
|
| 62 |
+
> **Note on file sizes**: These GGUFs are larger than a typical 4B model because Gemma 4 uses a 262,144-token vocabulary. The embedding/output weight tensors (which stay in higher precision) account for ~2–3GB of the total. The transformer layers themselves are fully quantized. Expect ~6–8GB VRAM for Q4_K_M, ~10–12GB for Q8_0.
|
| 63 |
|
| 64 |
---
|
| 65 |
|