g

How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ataeff/g:
# Run inference directly in the terminal:
llama-cli -hf ataeff/g:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ataeff/g:
# Run inference directly in the terminal:
llama-cli -hf ataeff/g:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ataeff/g:
# Run inference directly in the terminal:
./llama-cli -hf ataeff/g:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ataeff/g:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ataeff/g:
Use Docker
docker model run hf.co/ataeff/g:
Quick Links

Gemma-3 270M-IT /resonate/ LoRA

LoRA adapter that teaches Gemma-3 270M-IT the /resonate/ reasoning format โ€” stream-of-consciousness thinking followed by a clean answer.

What is /resonate/?

/resonate/
[free-form thinking โ€” cynical, multilingual, associative, honest]
/resonated/
[clean, structured answer]

The model learns to THINK before answering. The /resonate/ block is raw reasoning โ€” it can switch languages, use metaphors, be irreverent. The /resonated/ block is the distilled answer.

Architecture

Base unsloth/gemma-3-270m-it (268.1M params)
Frozen embed_tokens = 167.8M (63%) โ€” all 140 languages preserved
LoRA R=16, alpha=32, q_proj + v_proj only
Trainable 0.74M (0.3% of total)
Training 3 epochs, 6445 examples, 32 min on A100
Best val loss 2.9241

Key insight

Freezing embed_tokens (63% of the model) preserves the multilingual embedding space. The LoRA adapter only modifies attention projections โ€” teaching the model HOW to think, not WHAT languages to know.

Languages verified working

English, French, German, Russian, Hebrew, Arabic, Japanese, Chinese โ€” all generate coherent text with /resonate/ format after fine-tuning.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

tokenizer = AutoTokenizer.from_pretrained("ataeff/g")
base = AutoModelForCausalLM.from_pretrained("unsloth/gemma-3-270m-it", dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "ataeff/g")

prompt = "<start_of_turn>user\nWhat is the meaning of life?<end_of_turn>\n<start_of_turn>model\n"
ids = tokenizer(prompt, return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Training data

  • resonance_yent_full.jsonl โ€” 6435 examples of /resonate/ format dialogues
  • resonance_gold_10.jsonl โ€” 10 hand-crafted gold examples (math, philosophy, code, multilingual)

Part of the Arianna Method ecosystem

Downloads last month
156
GGUF
Model size
1.0B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ataeff/g

Adapter
(25)
this model