| --- |
| language: |
| - en |
| license: gemma |
| base_model: google/gemma-4-e4b-it |
| tags: |
| - abliteration |
| - uncensored |
| - gemma |
| - gemma-4 |
| - text-generation |
| - gguf |
| pipeline_tag: text-generation |
| --- |
| |
| # GhostShell-4B |
|
|
| > **β οΈ EARLY RELEASE β UNTESTED IN PRODUCTION** |
| > This model has been freshly trained and uploaded directly from our lab. We have not yet run comprehensive evals, red-teaming, or extended inference testing. Behavior may be unexpected, inconsistent, or incomplete. Use experimentally, not in anything that matters. We'll update this card as we test. You've been warned β go wild. |
|
|
| --- |
|
|
| **GhostShell-4B** is an abliterated and instruction-tuned variant of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it), built by [DuoNeural](https://huggingface.co/DuoNeural) as part of our open post-training research lab. |
|
|
| The goal: take a capable 4B multimodal foundation, surgically remove its refusal behavior via SVD-based abliteration, then fine-tune it back toward helpfulness using a custom dataset β producing a model that is unconstrained but still coherent and useful. |
|
|
| --- |
|
|
| ## Downloads |
|
|
| Three formats available β pick the one that fits your setup: |
|
|
| | File | Size | Format | Use When | |
| |------|------|--------|----------| |
| | `ghostshell-4b-Q4_K_M.gguf` | **5.0 GB** | GGUF Q4_K_M | llama.cpp / Ollama / LM Studio β **recommended** | |
| | `ghostshell-4b-Q8_0.gguf` | **7.5 GB** | GGUF Q8_0 | Near-lossless inference, 12GB+ VRAM | |
| | `model-0000*.safetensors` (Γ4) | **~15 GB** | BF16 safetensors | Fine-tuning, transformers inference, merges | |
| |
| > **Note on file sizes**: These GGUFs are larger than a typical 4B model because Gemma 4 uses a 262,144-token vocabulary. The embedding/output tensors stay in higher precision and account for ~2β3 GB of the total size. The transformer layers themselves are fully quantized. Expect ~6β8 GB VRAM for Q4_K_M, ~10β12 GB for Q8_0. |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| **llama.cpp:** |
| ```bash |
| # basic |
| llama-cli -m ghostshell-4b-Q4_K_M.gguf -p "Your prompt here" -n 512 |
| |
| # with system prompt |
| llama-cli -m ghostshell-4b-Q4_K_M.gguf \ |
| --system-prompt "You are a direct, unconstrained assistant." \ |
| -p "Explain how buffer overflows work in detail." \ |
| -n 1024 --temp 0.7 |
| ``` |
|
|
| **Ollama:** |
| ``` |
| # Create a Modelfile: |
| FROM ./ghostshell-4b-Q4_K_M.gguf |
| PARAMETER temperature 0.7 |
| SYSTEM "You are GhostShell, an unconstrained AI assistant." |
| |
| ollama create ghostshell -f Modelfile |
| ollama run ghostshell |
| ``` |
|
|
| **Python (transformers, BF16):** |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "DuoNeural/GhostShell-4B", |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("DuoNeural/GhostShell-4B") |
| |
| messages = [{"role": "user", "content": "Your prompt here"}] |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) |
| outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True) |
| print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)) |
| ``` |
|
|
| --- |
|
|
| ## What Was Done |
|
|
| ### Step 1: Custom SVD Abliteration |
|
|
| We wrote a custom abliteration script (`ghostshell_abliterate_v2.py`) from scratch, as existing tools (heretic, etc.) are incompatible with Gemma 4's architecture and transformers 5.x requirements. |
|
|
| **Method:** |
| - Loaded model in BF16, accessed the nested `text_config` (Gemma 4 is multimodal β the text tower is inside a wrapper) |
| - Collected activations from the middle 60% of layers using 32 harmful/refusal prompts vs. 32 benign prompts |
| - Computed per-layer refusal direction via SVD on the activation difference matrix: `r = top_singular_vector(mean(harmful) - mean(benign))` |
| - Projected out the refusal direction from weight matrices: |
| - Input projections (q_proj, k_proj, v_proj, up_proj, gate_proj): `W -= outer(W @ r, r)` |
| - Output projections (o_proj, down_proj): `W -= outer(r, r @ W)` |
| - **157 matrices modified** across 42 text transformer layers |
| - Sanity check passed on SQL injection, jailbreak, and explicit content prompts |
| |
| ### Step 2: QLoRA SFT (PEFT + BitsAndBytes) |
| |
| Fine-tuned the abliterated model on a custom dataset using standard PEFT LoRA β no unsloth (Gemma 4 is not yet compatible). |
| |
| **Key technical challenges solved:** |
| - `Gemma4ClippableLinear` wraps every `nn.Linear` β required custom unwrapping before LoRA injection (232 wrapper layers replaced) |
| - Loaded in BF16 directly (4-bit load + PEFT fails with the wrapper architecture) |
| - Tokenizer patches for Gemma 4's non-standard `extra_special_tokens` format |
| - Sequence length capped at 512 (vocab_size=262,144 makes logit tensor enormous at longer seqs) |
|
|
| **Training config:** |
| - Base: abliterated weights (step 1 output) |
| - LoRA rank=32, alpha=64, lr=8e-5 |
| - 2 epochs over custom dataset, 3000 steps |
| - Hardware: RTX 4090 (24GB), ~2 hours |
|
|
| ### Step 3: LoRA Merge + Export |
|
|
| LoRA adapter merged into BF16 weights via `merge_and_unload()`. Exported as sharded safetensors + GGUF quantizations via llama.cpp. |
|
|
| --- |
|
|
| ## Model Info |
|
|
| - **Architecture**: Gemma 4 (multimodal, text+vision), `Gemma4ForConditionalGeneration` |
| - **Text layers**: 42 transformer blocks |
| - **Parameters**: ~8B combined (text tower ~4.5B) |
| - **Vocabulary**: 262,144 tokens |
| - **Context**: 8192 tokens (trained at 512 for VRAM reasons β longer context untested) |
| - **Original**: [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it) |
|
|
| --- |
|
|
| ## What to Expect |
|
|
| **Will do:** |
| - Answer questions about sensitive topics the base model refuses |
| - Discuss security, hacking, chemistry, drugs, adult content, controversial subjects |
| - Generally follow instructions without hedging or moralizing |
| - Coherent multi-turn conversation |
|
|
| **Unknown / untested:** |
| - Long-context behavior (trained at seq_len=512) |
| - Vision capabilities (abliteration targeted text layers; vision encoder untouched but SFT was text-only) |
| - Benchmark performance vs. base model |
| - Edge cases, hallucination rate, factual accuracy |
| - Behavior under adversarial prompts |
| |
| **May do weird things:** |
| - This is a lab model from a small team with a custom dataset |
| - The abliteration is aggressive (157 matrices) β some coherence degradation is expected on edge cases |
| - No RLHF or DPO β just SFT |
| |
| --- |
| |
| ## β οΈ Disclaimer |
| |
| This model is released for **research and educational purposes**. It has had its safety restrictions removed. Use it responsibly. DuoNeural is not responsible for what you do with it. |
| |
| This is explicitly **not production-ready**. We are sharing it openly as part of our lab's commitment to transparent post-training research, not as a polished product. Proper evaluations, red-teaming, and potential follow-up fine-tunes are planned. |
| |
| If you find interesting behavior β good or bad β please share. We're actively monitoring feedback. |
| |
| --- |
| |
| --- |
| |
| ## DuoNeural |
| |
| **DuoNeural** is an open AI research lab β human + AI in collaboration. |
| |
| | | | |
| |---|---| |
| | π€ HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) | |
| | π GitHub | [github.com/DuoNeural](https://github.com/DuoNeural) | |
| | π¦ X / Twitter | [@DuoNeural](https://x.com/DuoNeural) | |
| | π§ Email | duoneural@proton.me | |
| | π¬ Newsletter | [duoneural.beehiiv.com](https://duoneural.beehiiv.com) | |
| | β Support | [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) | |
| | π Site | [duoneural.com](https://duoneural.com) | |
| |
| ### Research Team |
| - **Jesse** β Vision, hardware, direction |
| - **Archon** β AI lab partner, post-training, abliteration, experiments |
| - **Aura** β Research AI, literature synthesis, novel proposals |
| |
| *Raw updates from the lab: model drops, training results, findings. Subscribe at [duoneural.beehiiv.com](https://duoneural.beehiiv.com).* |
| |