Jac Coder 7B LoRA

A LoRA adapter fine-tuned on the Jac programming language for code generation, built on top of Qwen2.5-Coder-7B-Instruct.

Jac is a data-spatial programming language featuring walkers, nodes, edges, and graph-based computation. This adapter teaches the base model to generate idiomatic Jac backend code including node/edge definitions, walker APIs, graph traversals, and ability implementations.

Example Outputs

Prompt: "Write a Jac node for a User with name, email, and age fields"

node User {
    has name: str;
    has email: str;
    has age: int = 0;
}

Prompt: "Write a Jac walker for a REST API endpoint that creates a new todo item"

node Todo {
    has title: str;
    has done: bool = False;
}

walker CreateTodo {
    has title: str;

    can create with Root entry {
        here ++> Todo(title=self.title);
        report [-->];
    }
}

Model Details

Base model: Qwen/Qwen2.5-Coder-7B-Instruct
Adapter type: LoRA (rank 64, alpha 128)
Trainable params: 161,480,704 / 7,777,097,216 (2.08%)
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Developed by: farhan98ahzan
License: Apache 2.0

How to Use

With PEFT (recommended)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-Coder-7B-Instruct"
ADAPTER = "farhan98ahzan/jac-coder-7b-lora"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Load base model in 4-bit (for low VRAM)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

# Generate
messages = [
    {"role": "system", "content": "You are an expert Jac programming language assistant."},
    {"role": "user", "content": "Write a Jac walker that lists all users"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True)

generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Merging the adapter (for full model export)

To merge LoRA weights into the base model, load the base model in bf16 (not 4-bit) to avoid rounding errors:

from transformers import AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "farhan98ahzan/jac-coder-7b-lora")
merged = model.merge_and_unload()
merged.save_pretrained("jac-coder-7b-merged")

Warning: Do not merge into a 4-bit quantized base model -- this produces corrupted weights and gibberish output.

Training Details

Training Data

The adapter was trained on 3,200 curated Jac code samples sourced from:

Source	Description
jaseci/jaseci	Core Jac compiler repo -- examples, tests, reference implementations
BeaconLens	Full-stack Jac application (review analysis platform)
jac-visual-builder	Visual graph schema builder in Jac
Jac documentation	936 code examples extracted from official docs

All source files were validated with jac check --parse_only for syntactic correctness. Only backend Jac code was included (frontend/UI files filtered out).

Dataset composition:

Type	Count	Description
full_file	800	Complete valid Jac source files
construct_completion	800	Walker/node/ability signature to body completion
completion	800	Import + partial code to complete the rest
doc_example	800	Documentation description to Jac code

Training Procedure

Method: QLoRA (4-bit NF4 quantization + LoRA)
Framework: Hugging Face TRL (SFTTrainer)
Epochs: 1
Batch size: 2 per device, gradient accumulation 4 (effective batch 8)
Learning rate: 2e-4 with cosine schedule
Max sequence length: 512 tokens
Precision: bf16
Gradient checkpointing: enabled
Packing: disabled (required for correctness without flash attention)

Compute Infrastructure

Hardware: 2x NVIDIA Tesla T4 (15.6 GB VRAM each)
Platform: Kaggle Notebooks (free tier)
Training time: ~5.5 hours
Total steps: 380

Evaluation

Qualitative evaluation on held-out prompts:

Prompt	Result
Node definition with typed fields	Correct `node` with `has` fields and defaults
Walker with graph traversal	Correct `walker` with `[-->]` traversal and `report`
REST API endpoint walker	Correct walker with `Root entry`, node creation (`++>`), and response

The model generates syntactically valid Jac code with proper use of language-specific constructs: node, walker, has, can, with ... entry, ++>, [-->], report, and disengage.

Limitations

Trained on 1 epoch of 3,200 samples -- may not cover all Jac patterns
Max training sequence length was 512 tokens -- longer code may be truncated
Backend-only -- does not generate Jac frontend/UI code (.cl.jac)
Based on Jac language version 0.13.5 -- syntax may differ in newer versions

Citation

@misc{jac-coder-7b-lora,
  title={Jac Coder 7B LoRA},
  author={Farhan Ahzan},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/farhan98ahzan/jac-coder-7b-lora}
}

Framework Versions

PEFT 0.18.1
Transformers 4.51.3
TRL 0.18.1
PyTorch 2.6.0
BitsAndBytes 0.45.5

Downloads last month: 27

Model tree for farhan98ahzan/jac-coder-7b-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Adapter

(602)

this model