Jac Coder 7B LoRA

A LoRA adapter fine-tuned on the Jac programming language for code generation, built on top of Qwen2.5-Coder-7B-Instruct.

Jac is a data-spatial programming language featuring walkers, nodes, edges, and graph-based computation. This adapter teaches the base model to generate idiomatic Jac backend code including node/edge definitions, walker APIs, graph traversals, and ability implementations.

Example Outputs

Prompt: "Write a Jac node for a User with name, email, and age fields"

node User {
    has name: str;
    has email: str;
    has age: int = 0;
}

Prompt: "Write a Jac walker for a REST API endpoint that creates a new todo item"

node Todo {
    has title: str;
    has done: bool = False;
}

walker CreateTodo {
    has title: str;

    can create with Root entry {
        here ++> Todo(title=self.title);
        report [-->];
    }
}

Model Details

  • Base model: Qwen/Qwen2.5-Coder-7B-Instruct
  • Adapter type: LoRA (rank 64, alpha 128)
  • Trainable params: 161,480,704 / 7,777,097,216 (2.08%)
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Developed by: farhan98ahzan
  • License: Apache 2.0

How to Use

With PEFT (recommended)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-Coder-7B-Instruct"
ADAPTER = "farhan98ahzan/jac-coder-7b-lora"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Load base model in 4-bit (for low VRAM)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

# Generate
messages = [
    {"role": "system", "content": "You are an expert Jac programming language assistant."},
    {"role": "user", "content": "Write a Jac walker that lists all users"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True)

generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Merging the adapter (for full model export)

To merge LoRA weights into the base model, load the base model in bf16 (not 4-bit) to avoid rounding errors:

from transformers import AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "farhan98ahzan/jac-coder-7b-lora")
merged = model.merge_and_unload()
merged.save_pretrained("jac-coder-7b-merged")

Warning: Do not merge into a 4-bit quantized base model -- this produces corrupted weights and gibberish output.

Training Details

Training Data

The adapter was trained on 3,200 curated Jac code samples sourced from:

Source Description
jaseci/jaseci Core Jac compiler repo -- examples, tests, reference implementations
BeaconLens Full-stack Jac application (review analysis platform)
jac-visual-builder Visual graph schema builder in Jac
Jac documentation 936 code examples extracted from official docs

All source files were validated with jac check --parse_only for syntactic correctness. Only backend Jac code was included (frontend/UI files filtered out).

Dataset composition:

Type Count Description
full_file 800 Complete valid Jac source files
construct_completion 800 Walker/node/ability signature to body completion
completion 800 Import + partial code to complete the rest
doc_example 800 Documentation description to Jac code

Training Procedure

  • Method: QLoRA (4-bit NF4 quantization + LoRA)
  • Framework: Hugging Face TRL (SFTTrainer)
  • Epochs: 1
  • Batch size: 2 per device, gradient accumulation 4 (effective batch 8)
  • Learning rate: 2e-4 with cosine schedule
  • Max sequence length: 512 tokens
  • Precision: bf16
  • Gradient checkpointing: enabled
  • Packing: disabled (required for correctness without flash attention)

Compute Infrastructure

  • Hardware: 2x NVIDIA Tesla T4 (15.6 GB VRAM each)
  • Platform: Kaggle Notebooks (free tier)
  • Training time: ~5.5 hours
  • Total steps: 380

Evaluation

Qualitative evaluation on held-out prompts:

Prompt Result
Node definition with typed fields Correct node with has fields and defaults
Walker with graph traversal Correct walker with [-->] traversal and report
REST API endpoint walker Correct walker with Root entry, node creation (++>), and response

The model generates syntactically valid Jac code with proper use of language-specific constructs: node, walker, has, can, with ... entry, ++>, [-->], report, and disengage.

Limitations

  • Trained on 1 epoch of 3,200 samples -- may not cover all Jac patterns
  • Max training sequence length was 512 tokens -- longer code may be truncated
  • Backend-only -- does not generate Jac frontend/UI code (.cl.jac)
  • Based on Jac language version 0.13.5 -- syntax may differ in newer versions

Citation

@misc{jac-coder-7b-lora,
  title={Jac Coder 7B LoRA},
  author={Farhan Ahzan},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/farhan98ahzan/jac-coder-7b-lora}
}

Framework Versions

  • PEFT 0.18.1
  • Transformers 4.51.3
  • TRL 0.18.1
  • PyTorch 2.6.0
  • BitsAndBytes 0.45.5
Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for farhan98ahzan/jac-coder-7b-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(602)
this model