DuoNeural commited on
Commit
02ade1c
·
verified ·
1 Parent(s): b37c878

Update README: correct GGUF sizes + vocab size note

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -54,10 +54,12 @@ LoRA adapter merged into BF16 weights via `merge_and_unload()`. Exported as shar
54
  | File | Size | Description |
55
  |------|------|-------------|
56
  | `model-0000X-of-00004.safetensors` | ~15GB | Merged BF16 weights (full precision) |
57
- | `ghostshell-4b-Q4_K_M.gguf` | ~2.5GB | Q4_K_M quantization — recommended for most use |
58
- | `ghostshell-4b-Q8_0.gguf` | ~4.5GB | Q8_0 quantization — near-lossless, for power users |
59
 
60
- **Recommended**: `ghostshell-4b-Q4_K_M.gguf` for llama.cpp, Ollama, LM Studio, or any GGUF-compatible runtime. Runs on 6GB VRAM, handles well on CPU with 8GB RAM.
 
 
61
 
62
  ---
63
 
 
54
  | File | Size | Description |
55
  |------|------|-------------|
56
  | `model-0000X-of-00004.safetensors` | ~15GB | Merged BF16 weights (full precision) |
57
+ | `ghostshell-4b-Q4_K_M.gguf` | ~5.0GB | Q4_K_M quantization — recommended for most use |
58
+ | `ghostshell-4b-Q8_0.gguf` | ~7.5GB | Q8_0 quantization — near-lossless, for power users |
59
 
60
+ **Recommended**: `ghostshell-4b-Q4_K_M.gguf` for llama.cpp, Ollama, LM Studio, or any GGUF-compatible runtime.
61
+
62
+ > **Note on file sizes**: These GGUFs are larger than a typical 4B model because Gemma 4 uses a 262,144-token vocabulary. The embedding/output weight tensors (which stay in higher precision) account for ~2–3GB of the total. The transformer layers themselves are fully quantized. Expect ~6–8GB VRAM for Q4_K_M, ~10–12GB for Q8_0.
63
 
64
  ---
65