supergemma4-e4b-abliterated

Instructions to use ScottzillaSystems/supergemma4-e4b-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ScottzillaSystems/supergemma4-e4b-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ScottzillaSystems/supergemma4-e4b-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("ScottzillaSystems/supergemma4-e4b-abliterated")
model = AutoModelForMultimodalLM.from_pretrained("ScottzillaSystems/supergemma4-e4b-abliterated", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ScottzillaSystems/supergemma4-e4b-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ScottzillaSystems/supergemma4-e4b-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ScottzillaSystems/supergemma4-e4b-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ScottzillaSystems/supergemma4-e4b-abliterated

SGLang

How to use ScottzillaSystems/supergemma4-e4b-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ScottzillaSystems/supergemma4-e4b-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ScottzillaSystems/supergemma4-e4b-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ScottzillaSystems/supergemma4-e4b-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ScottzillaSystems/supergemma4-e4b-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ScottzillaSystems/supergemma4-e4b-abliterated with Docker Model Runner:
```
docker model run hf.co/ScottzillaSystems/supergemma4-e4b-abliterated
```

SuperGemma4 E4B Abliterated

supergemma4-e4b-abliterated is a private evaluation release whose original upstream base is google/gemma-4-E4B-it.

This SuperGemma release is an abliterated and tuned derivative of that Google E4B base, with additional work for higher release quality, stronger formatting discipline, better code output, and faster time to first token.

This branch is aimed at users who want:

strong code and bug-fix behavior
clean JSON and tool-call formatting
fast first-token responsiveness
release-ready serving behavior on Transformers and OpenAI-compatible stacks

Why This Build Exists

The original Google checkpoint provides the core Gemma 4 E4B capability base. This project line uses an abliterated release path to reduce refusal-heavy behavior, but that kind of modification can regress on exact formatting, tool-call reliability, and service stability if it is not carefully hardened.

This release focuses on recovering and then surpassing baseline quality where it matters for real usage:

exact structured outputs
code correctness
bug-fix reliability
server-facing stability
low-friction deployment on Transformers and OpenAI-compatible serving stacks

Highlights

Release-quality score: 92.34
Exact-eval score: 98.50
Broad-eval score: 83.10
JSON exact-match: 100%
Tool-call accuracy: 90%
Exact code score: 100%
Exact bug-fix score: 100%
Long-context sanity: 100%
TTFT: 2291 ms
PREFILL: 2479.70 tok/s
DECODE: 42.04 tok/s

Lineage

Original upstream base: google/gemma-4-E4B-it
Abliterated and tuned release: Jiunsong/supergemma4-e4b-abliterated

Comparison Snapshot

Measured against the same evaluation harness used for:

google/gemma-4-E4B-it

Model	Release Quality	Exact Overall	JSON	Tool	Code	Bugfix	TTFT ms	PREFILL tok/s	DECODE tok/s
Google base	77.46	83.50	50.0	90.0	62.5	100.0	4827.31	2456.69	42.04
SuperGemma4 E4B Abliterated	92.34	98.50	100.0	90.0	100.0	100.0	2291.23	2479.70	42.04

Stability Notes

This candidate was release-hardened against the failure modes that matter in real serving:

batched OpenAI-compatible serving restored
simple OpenAI-compatible serving restored
unicode output verified
tool-calling output verified
empty-response false-green cases blocked by stricter tests

Validation highlights:

direct reliability audit: 14/14
repeat reliability probe: 90/90
batched soak test: 12/12
simple soak test: 6/6

Recommended Use Cases

coding assistant
bug-fix assistant
strict JSON and schema outputs
agent backends that depend on tool-call formatting
standard BF16 deployment on Hugging Face / Transformers stacks

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Jiunsong/supergemma4-e4b-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a compact Python function that groups words by length."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Serving

This checkpoint is designed to work well with:

Transformers
vLLM-style OpenAI-compatible stacks

Release Positioning

This private release is the strongest all-around E4B candidate in the current project line for users who want the abliterated base behavior without giving up quality recovery, formatting discipline, or serving readiness.

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ScottzillaSystems/supergemma4-e4b-abliterated

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it