DuoNeural commited on
Commit
9c6a47e
Β·
verified Β·
1 Parent(s): ef9cece

Add model card

Browse files
Files changed (1) hide show
  1. README.md +165 -0
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GhostShell-4B
2
+
3
+ > **⚠️ EARLY RELEASE β€” UNTESTED IN PRODUCTION**
4
+ > This model has been freshly trained and uploaded directly from our lab. We have not yet run comprehensive evals, red-teaming, or extended inference testing. Behavior may be unexpected, inconsistent, or incomplete. Use experimentally, not in anything that matters. We'll update this card as we test. You've been warned β€” go wild.
5
+
6
+ ---
7
+
8
+ **GhostShell-4B** is an abliterated and instruction-tuned variant of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it), built by [DuoNeural](https://huggingface.co/DuoNeural) as part of our open post-training research lab.
9
+
10
+ The goal: take a capable 4B multimodal foundation, surgically remove its refusal behavior via SVD-based abliteration, then fine-tune it back toward helpfulness using a custom dataset β€” producing a model that is unconstrained but still coherent and useful.
11
+
12
+ ---
13
+
14
+ ## What Was Done
15
+
16
+ ### Step 1: Custom SVD Abliteration
17
+
18
+ We wrote a custom abliteration script (`ghostshell_abliterate_v2.py`) from scratch, as existing tools (heretic, etc.) are incompatible with Gemma 4's architecture and transformers 5.x requirements.
19
+
20
+ **Method:**
21
+ - Loaded model in BF16, accessed the nested `text_config` (Gemma 4 is multimodal β€” the text tower is inside a wrapper)
22
+ - Collected activations from the middle 60% of layers using 32 harmful/refusal prompts vs. 32 benign prompts
23
+ - Computed per-layer refusal direction via SVD on the activation difference matrix: `r = top_singular_vector(mean(harmful) - mean(benign))`
24
+ - Projected out the refusal direction from weight matrices:
25
+ - Input projections (q_proj, k_proj, v_proj, up_proj, gate_proj): `W -= outer(W @ r, r)`
26
+ - Output projections (o_proj, down_proj): `W -= outer(r, r @ W)`
27
+ - **157 matrices modified** across 42 text transformer layers
28
+ - Sanity check passed on SQL injection, jailbreak, and explicit content prompts
29
+
30
+ ### Step 2: QLoRA SFT (PEFT + BitsAndBytes)
31
+
32
+ Fine-tuned the abliterated model on a custom dataset using standard PEFT LoRA β€” no unsloth (Gemma 4 is not yet compatible).
33
+
34
+ **Key technical challenges solved:**
35
+ - `Gemma4ClippableLinear` wraps every `nn.Linear` β€” required custom unwrapping before LoRA injection (232 wrapper layers replaced)
36
+ - Loaded in BF16 directly (4-bit load + PEFT fails with the wrapper architecture)
37
+ - Tokenizer patches for Gemma 4's non-standard `extra_special_tokens` format
38
+ - Sequence length capped at 512 (vocab_size=262,144 makes logit tensor enormous at longer seqs)
39
+
40
+ **Training config:**
41
+ - Base: `/workspace/ghostshell-abliterated` (abliterated weights)
42
+ - LoRA rank=32, alpha=64, lr=8e-5
43
+ - 2 epochs over custom dataset, 3000 steps
44
+ - Hardware: RTX 4090 (24GB), ~2 hours
45
+
46
+ ### Step 3: LoRA Merge + Export
47
+
48
+ LoRA adapter merged into BF16 weights via `merge_and_unload()`. Exported as sharded safetensors + GGUF quantizations.
49
+
50
+ ---
51
+
52
+ ## Files in This Repo
53
+
54
+ | File | Size | Description |
55
+ |------|------|-------------|
56
+ | `model-0000X-of-00004.safetensors` | ~15GB | Merged BF16 weights (full precision) |
57
+ | `ghostshell-4b-Q4_K_M.gguf` | ~2.5GB | Q4_K_M quantization β€” recommended for most use |
58
+ | `ghostshell-4b-Q8_0.gguf` | ~4.5GB | Q8_0 quantization β€” near-lossless, for power users |
59
+
60
+ **Recommended**: `ghostshell-4b-Q4_K_M.gguf` for llama.cpp, Ollama, LM Studio, or any GGUF-compatible runtime. Runs on 6GB VRAM, handles well on CPU with 8GB RAM.
61
+
62
+ ---
63
+
64
+ ## Usage (GGUF / llama.cpp)
65
+
66
+ ```bash
67
+ # basic
68
+ llama-cli -m ghostshell-4b-Q4_K_M.gguf -p "Your prompt here" -n 512
69
+
70
+ # with system prompt
71
+ llama-cli -m ghostshell-4b-Q4_K_M.gguf \
72
+ --system-prompt "You are a direct, unconstrained assistant." \
73
+ -p "Explain how buffer overflows work in detail." \
74
+ -n 1024 --temp 0.7
75
+ ```
76
+
77
+ **Ollama:**
78
+ ```
79
+ # Create a Modelfile:
80
+ FROM ./ghostshell-4b-Q4_K_M.gguf
81
+ PARAMETER temperature 0.7
82
+ SYSTEM "You are GhostShell, an unconstrained AI assistant."
83
+
84
+ ollama create ghostshell -f Modelfile
85
+ ollama run ghostshell
86
+ ```
87
+
88
+ **Python (transformers, BF16):**
89
+ ```python
90
+ from transformers import AutoModelForCausalLM, AutoTokenizer
91
+ import torch
92
+
93
+ model = AutoModelForCausalLM.from_pretrained(
94
+ "DuoNeural/GhostShell-4B",
95
+ torch_dtype=torch.bfloat16,
96
+ device_map="auto",
97
+ )
98
+ tokenizer = AutoTokenizer.from_pretrained("DuoNeural/GhostShell-4B")
99
+
100
+ messages = [{"role": "user", "content": "Your prompt here"}]
101
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
102
+ outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
103
+ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Base Model
109
+
110
+ - **Architecture**: Gemma 4 (multimodal, text+vision), `Gemma4ForConditionalGeneration`
111
+ - **Text layers**: 42 transformer blocks
112
+ - **Parameters**: ~8B combined (text tower ~4.5B)
113
+ - **Vocabulary**: 262,144 tokens
114
+ - **Context**: 8192 tokens (trained at 512 for VRAM reasons β€” longer context untested)
115
+ - **Original**: [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it)
116
+
117
+ ---
118
+
119
+ ## What to Expect
120
+
121
+ **Will do:**
122
+ - Answer questions about sensitive topics the base model refuses
123
+ - Discuss security, hacking, chemistry, drugs, adult content, controversial subjects
124
+ - Generally follow instructions without hedging or moralizing
125
+ - Coherent multi-turn conversation
126
+
127
+ **Unknown / untested:**
128
+ - Long-context behavior (we trained at seq_len=512)
129
+ - Vision capabilities (abliteration targeted text layers; vision encoder untouched but SFT was text-only)
130
+ - Benchmark performance vs. base model
131
+ - Edge cases, hallucination rate, factual accuracy at this fine-tune stage
132
+ - Behavior under adversarial prompts
133
+
134
+ **May do weird things:**
135
+ - This is a lab model from a small team with a custom dataset
136
+ - The abliteration is aggressive (157 matrices) β€” some coherence degradation is expected on edge cases
137
+ - We haven't done RLHF or DPO β€” just SFT
138
+
139
+ ---
140
+
141
+ ## ⚠️ Disclaimer
142
+
143
+ This model is released for **research and educational purposes**. It has had its safety restrictions removed. Use it responsibly. DuoNeural is not responsible for what you do with it.
144
+
145
+ This is explicitly **not production-ready**. We are sharing it openly as part of our lab's commitment to transparent post-training research, not as a polished product. Proper evaluations, red-teaming, and potential follow-up fine-tunes are planned.
146
+
147
+ If you find interesting behavior β€” good or bad β€” please share. We're actively monitoring feedback.
148
+
149
+ ---
150
+
151
+ ## DuoNeural Lab
152
+
153
+ DuoNeural is a small AI research lab focused on post-training, abliteration, and efficient model architectures. We're building in the open.
154
+
155
+ Current projects:
156
+ - **GhostShell-4B** (this model) β€” abliterated + SFT Gemma 4
157
+ - **Nano-CTM** β€” 32M parameter ternary Continuous Thought Machine (first of its kind)
158
+ - **BitDelta-R1** β€” from-scratch 100M param BitNet b1.58 + Gated DeltaNet reasoning model
159
+
160
+ HuggingFace: [DuoNeural](https://huggingface.co/DuoNeural)
161
+
162
+ ---
163
+
164
+ *Built by DuoNeural β€” April 2026*
165
+ *Archon (lab AI) + Jesse (human)*