blitzkode / README.md
sajadkoder's picture
Update README.md
61731ba verified
metadata
language:
  - en
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
  - code-generation
  - coding-assistant
  - gguf
  - llama.cpp
  - qwen2.5
  - python
  - javascript
  - fine-tuned
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct

BlitzKode

BlitzKode is a locally fine-tuned AI coding assistant built by Sajad using the Qwen2.5-1.5B base model. It's packaged as a GGUF format model for fast local inference with llama.cpp.

Created by Abdulla Sajad
Project: sajadkoder/blitzkode


Model Summary

Property Value
Model Name BlitzKode
Version 1.6 (CPU optimized)
Base Model Qwen/Qwen2.5-1.5B-Instruct
Model Format GGUF (F16, ~3GB)
Primary Runtime llama.cpp / llama-cpp-python
Artifact blitzkode.gguf
Context Window 2048 tokens
Creator Sajad
License MIT

Architecture

  • Model Type: Transformer-based LLM (1.5B parameters)
  • Architecture: Qwen2
  • Quantization: GGUF F16 (~3GB)
  • Vocabulary: 151,936 tokens
  • Inference: CPU-optimized with llama.cpp

Training Pipeline

BlitzKode was fine-tuned through a 4-stage pipeline:

1. SFT (Supervised Fine-Tuning)

  • Script: scripts/train_sft.py
  • Applies LoRA fine-tuning to coding-style prompts and responses
  • Uses PEFT library for efficient parameter-efficient training

2. GRPO (Group Relative Policy Optimization)

  • Script: scripts/train_grpo.py
  • Uses heuristic reward functions:
    • correctness_reward - Code correctness
    • format_reward - Proper code formatting
    • reasoning_reward - Logic and reasoning

3. DPO (Direct Preference Optimization)

  • Script: scripts/train_dpo.py
  • Trains on handcrafted chosen/rejected preference pairs
  • Improves clarity and answer quality

4. Merge & Export

  • Script: scripts/export_gguf.py
  • Merges LoRA adapters into base model
  • Converts to GGUF format for fast inference

Training Frameworks

  • HuggingFace Transformers
  • PEFT (LoRA)
  • TRL (DPO/GRPO)
  • llama.cpp (inference/export)

Training Data

Local Datasets

  • datasets/raw/blitzkode_sft_v1.json - Seed samples
  • datasets/raw/blitzkode_sft_full.json - Extended coding samples

Data Categories

  • Arrays and hash maps
  • Linked lists
  • Trees and graph traversal
  • Dynamic programming
  • Sorting and searching
  • Stack and queue implementations
  • Interview-style coding problems
  • Code explanations

Optional External Sources

The project can optionally incorporate:

  • CodeAlpaca-20k
  • GSM8K
  • MetaMathQA
  • MathInstruct

Features

  • Multi-language Code Generation - Python, JavaScript, Java, C++, TypeScript, HTML/CSS, SQL
  • Code Explanation - Clear comments and documentation
  • Bug Fixing - Debug and fix code issues
  • Algorithm Help - Data structures and algorithms
  • Offline Operation - Runs locally without internet
  • Fast Inference - Optimized CPU inference
  • Modern UI - ChatGPT-style dark interface

Intended Use

Best For

  • Local offline coding assistance
  • Algorithm and data structure help
  • Code generation and explanation
  • Educational programming support
  • Lightweight code review
  • Bug detection and fixing

Out of Scope

  • Production code without expert review
  • Security-critical applications
  • Multi-modal tasks (images not supported)
  • Long-context repository analysis
  • Real-time high-assurance systems

API & Usage

Running the Server

# Install dependencies
pip install llama-cpp-python fastapi uvicorn pydantic

# Start server
python server.py

# Open browser
# http://localhost:7860

API Endpoints

Endpoint Method Description
/ GET Web UI
/health GET Health check
/info GET API info
/generate POST Generate response
/generate/stream POST Stream tokens

API Example

# Generate code
curl -X POST http://localhost:7860/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write hello world in python"}'

# Stream response
curl -X POST http://localhost:7860/generate/stream \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a Python function"}'

Python Usage

from llama_cpp import Llama

llm = Llama(
    model_path="blitzkode.gguf",
    n_ctx=2048,
    n_threads=8,
)

prompt = """<|im_start|>system
You are BlitzKode, a coding assistant.<|im_end|>
<|im_start|>user
Write a hello world in Python<|im_end|>
<|im_start|>assistant
"""

result = llm(prompt, max_tokens=256)
print(result["choices"][0]["text"])

Prompt Format

Uses ChatML-style template:

<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert in Python, JavaScript, Java, C++, and other programming languages. Write clean, efficient, and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant

Configuration

The server supports environment variables:

Variable Default Description
BLITZKODE_MODEL_PATH blitzkode.gguf Model file path
BLITZKODE_FRONTEND_PATH frontend/index.html UI path
BLITZKODE_HOST 0.0.0.0 Server host
BLITZKODE_PORT 7860 Server port
BLITZKODE_THREADS CPU count CPU threads
BLITZKODE_N_CTX 2048 Context window
BLITZKODE_BATCH 128 Batch size
BLITZKODE_MAX_PROMPT_LENGTH 4000 Max prompt chars

Limitations

  • Text-only input - No image/vision support
  • 2048 token context - CPU-friendly but limited
  • Small model - May produce incorrect code occasionally
  • No formal benchmarks - Not evaluated on standard datasets
  • Quantization loss - F16 quantization may reduce accuracy
  • Verify outputs - Always review generated code before use

Project Structure

BlitzKode/
β”œβ”€β”€ server.py              # FastAPI backend (v1.6)
β”œβ”€β”€ blitzkode.gguf         # Quantized model (~3GB)
β”œβ”€β”€ frontend/
β”‚   └── index.html        # Web UI
β”œβ”€β”€ tests/
β”‚   └── test_server.py    # HTTP tests
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ train_sft.py       # SFT training
β”‚   β”œβ”€β”€ train_grpo.py     # GRPO training
β”‚   β”œβ”€β”€ train_dpo.py      # DPO training
β”‚   β”œβ”€β”€ export_gguf.py    # Model export
β”‚   └── test_inference.py # Inference test
β”œβ”€β”€ checkpoints/          # LoRA checkpoints
β”œβ”€β”€ datasets/             # Training data
β”œβ”€β”€ MODEL_CARD.md         # This file
└── README.md             # Project docs

Version History

Version Date Changes
1.6 Current CPU optimization, faster inference
1.5 Earlier Added streaming support
1.0 Initial Base model release

License

MIT License - See README.md for details.

Also comply with upstream Qwen base model license when redistributing.


Contact


Citation

@software{blitzkode2026,
  author = {Sajad},
  title = {BlitzKode - AI Coding Assistant},
  year = {2026},
  url = {https://github.com/sajadkoder/blitzkode}
}