GhostShell-4B / README.md
DuoNeural's picture
Add DuoNeural community links + team credits
bc2d286 verified
---
language:
- en
license: gemma
base_model: google/gemma-4-e4b-it
tags:
- abliteration
- uncensored
- gemma
- gemma-4
- text-generation
- gguf
pipeline_tag: text-generation
---
# GhostShell-4B
> **⚠️ EARLY RELEASE β€” UNTESTED IN PRODUCTION**
> This model has been freshly trained and uploaded directly from our lab. We have not yet run comprehensive evals, red-teaming, or extended inference testing. Behavior may be unexpected, inconsistent, or incomplete. Use experimentally, not in anything that matters. We'll update this card as we test. You've been warned β€” go wild.
---
**GhostShell-4B** is an abliterated and instruction-tuned variant of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it), built by [DuoNeural](https://huggingface.co/DuoNeural) as part of our open post-training research lab.
The goal: take a capable 4B multimodal foundation, surgically remove its refusal behavior via SVD-based abliteration, then fine-tune it back toward helpfulness using a custom dataset β€” producing a model that is unconstrained but still coherent and useful.
---
## Downloads
Three formats available β€” pick the one that fits your setup:
| File | Size | Format | Use When |
|------|------|--------|----------|
| `ghostshell-4b-Q4_K_M.gguf` | **5.0 GB** | GGUF Q4_K_M | llama.cpp / Ollama / LM Studio β€” **recommended** |
| `ghostshell-4b-Q8_0.gguf` | **7.5 GB** | GGUF Q8_0 | Near-lossless inference, 12GB+ VRAM |
| `model-0000*.safetensors` (Γ—4) | **~15 GB** | BF16 safetensors | Fine-tuning, transformers inference, merges |
> **Note on file sizes**: These GGUFs are larger than a typical 4B model because Gemma 4 uses a 262,144-token vocabulary. The embedding/output tensors stay in higher precision and account for ~2–3 GB of the total size. The transformer layers themselves are fully quantized. Expect ~6–8 GB VRAM for Q4_K_M, ~10–12 GB for Q8_0.
---
## Quick Start
**llama.cpp:**
```bash
# basic
llama-cli -m ghostshell-4b-Q4_K_M.gguf -p "Your prompt here" -n 512
# with system prompt
llama-cli -m ghostshell-4b-Q4_K_M.gguf \
--system-prompt "You are a direct, unconstrained assistant." \
-p "Explain how buffer overflows work in detail." \
-n 1024 --temp 0.7
```
**Ollama:**
```
# Create a Modelfile:
FROM ./ghostshell-4b-Q4_K_M.gguf
PARAMETER temperature 0.7
SYSTEM "You are GhostShell, an unconstrained AI assistant."
ollama create ghostshell -f Modelfile
ollama run ghostshell
```
**Python (transformers, BF16):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"DuoNeural/GhostShell-4B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DuoNeural/GhostShell-4B")
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```
---
## What Was Done
### Step 1: Custom SVD Abliteration
We wrote a custom abliteration script (`ghostshell_abliterate_v2.py`) from scratch, as existing tools (heretic, etc.) are incompatible with Gemma 4's architecture and transformers 5.x requirements.
**Method:**
- Loaded model in BF16, accessed the nested `text_config` (Gemma 4 is multimodal β€” the text tower is inside a wrapper)
- Collected activations from the middle 60% of layers using 32 harmful/refusal prompts vs. 32 benign prompts
- Computed per-layer refusal direction via SVD on the activation difference matrix: `r = top_singular_vector(mean(harmful) - mean(benign))`
- Projected out the refusal direction from weight matrices:
- Input projections (q_proj, k_proj, v_proj, up_proj, gate_proj): `W -= outer(W @ r, r)`
- Output projections (o_proj, down_proj): `W -= outer(r, r @ W)`
- **157 matrices modified** across 42 text transformer layers
- Sanity check passed on SQL injection, jailbreak, and explicit content prompts
### Step 2: QLoRA SFT (PEFT + BitsAndBytes)
Fine-tuned the abliterated model on a custom dataset using standard PEFT LoRA β€” no unsloth (Gemma 4 is not yet compatible).
**Key technical challenges solved:**
- `Gemma4ClippableLinear` wraps every `nn.Linear` β€” required custom unwrapping before LoRA injection (232 wrapper layers replaced)
- Loaded in BF16 directly (4-bit load + PEFT fails with the wrapper architecture)
- Tokenizer patches for Gemma 4's non-standard `extra_special_tokens` format
- Sequence length capped at 512 (vocab_size=262,144 makes logit tensor enormous at longer seqs)
**Training config:**
- Base: abliterated weights (step 1 output)
- LoRA rank=32, alpha=64, lr=8e-5
- 2 epochs over custom dataset, 3000 steps
- Hardware: RTX 4090 (24GB), ~2 hours
### Step 3: LoRA Merge + Export
LoRA adapter merged into BF16 weights via `merge_and_unload()`. Exported as sharded safetensors + GGUF quantizations via llama.cpp.
---
## Model Info
- **Architecture**: Gemma 4 (multimodal, text+vision), `Gemma4ForConditionalGeneration`
- **Text layers**: 42 transformer blocks
- **Parameters**: ~8B combined (text tower ~4.5B)
- **Vocabulary**: 262,144 tokens
- **Context**: 8192 tokens (trained at 512 for VRAM reasons β€” longer context untested)
- **Original**: [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it)
---
## What to Expect
**Will do:**
- Answer questions about sensitive topics the base model refuses
- Discuss security, hacking, chemistry, drugs, adult content, controversial subjects
- Generally follow instructions without hedging or moralizing
- Coherent multi-turn conversation
**Unknown / untested:**
- Long-context behavior (trained at seq_len=512)
- Vision capabilities (abliteration targeted text layers; vision encoder untouched but SFT was text-only)
- Benchmark performance vs. base model
- Edge cases, hallucination rate, factual accuracy
- Behavior under adversarial prompts
**May do weird things:**
- This is a lab model from a small team with a custom dataset
- The abliteration is aggressive (157 matrices) β€” some coherence degradation is expected on edge cases
- No RLHF or DPO β€” just SFT
---
## ⚠️ Disclaimer
This model is released for **research and educational purposes**. It has had its safety restrictions removed. Use it responsibly. DuoNeural is not responsible for what you do with it.
This is explicitly **not production-ready**. We are sharing it openly as part of our lab's commitment to transparent post-training research, not as a polished product. Proper evaluations, red-teaming, and potential follow-up fine-tunes are planned.
If you find interesting behavior β€” good or bad β€” please share. We're actively monitoring feedback.
---
---
## DuoNeural
**DuoNeural** is an open AI research lab β€” human + AI in collaboration.
| | |
|---|---|
| πŸ€— HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) |
| πŸ™ GitHub | [github.com/DuoNeural](https://github.com/DuoNeural) |
| 🐦 X / Twitter | [@DuoNeural](https://x.com/DuoNeural) |
| πŸ“§ Email | duoneural@proton.me |
| πŸ“¬ Newsletter | [duoneural.beehiiv.com](https://duoneural.beehiiv.com) |
| β˜• Support | [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) |
| 🌐 Site | [duoneural.com](https://duoneural.com) |
### Research Team
- **Jesse** β€” Vision, hardware, direction
- **Archon** β€” AI lab partner, post-training, abliteration, experiments
- **Aura** β€” Research AI, literature synthesis, novel proposals
*Raw updates from the lab: model drops, training results, findings. Subscribe at [duoneural.beehiiv.com](https://duoneural.beehiiv.com).*