| --- |
| language: |
| - en |
| library_name: llama-cpp-python |
| pipeline_tag: text-generation |
| tags: |
| - code-generation |
| - coding-assistant |
| - gguf |
| - llama.cpp |
| - qwen2.5 |
| - python |
| - javascript |
| - fine-tuned |
| base_model: |
| - Qwen/Qwen2.5-1.5B-Instruct |
| --- |
| |
| # BlitzKode |
|
|
| **BlitzKode** is a locally fine-tuned AI coding assistant built by **Sajad** using the Qwen2.5-1.5B base model. It's packaged as a GGUF format model for fast local inference with llama.cpp. |
|
|
| > Created by [Abdulla Sajad](https://github.com/sajadkoder) |
| > Project: [sajadkoder/blitzkode](https://github.com/sajadkoder/blitzkode) |
|
|
| --- |
|
|
| ## Model Summary |
|
|
| | Property | Value | |
| |----------|-------| |
| | **Model Name** | BlitzKode | |
| | **Version** | 1.6 (CPU optimized) | |
| | **Base Model** | Qwen/Qwen2.5-1.5B-Instruct | |
| | **Model Format** | GGUF (F16, ~3GB) | |
| | **Primary Runtime** | llama.cpp / llama-cpp-python | |
| | **Artifact** | `blitzkode.gguf` | |
| | **Context Window** | 2048 tokens | |
| | **Creator** | Sajad | |
| | **License** | MIT | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| - **Model Type**: Transformer-based LLM (1.5B parameters) |
| - **Architecture**: Qwen2 |
| - **Quantization**: GGUF F16 (~3GB) |
| - **Vocabulary**: 151,936 tokens |
| - **Inference**: CPU-optimized with llama.cpp |
|
|
| --- |
|
|
| ## Training Pipeline |
|
|
| BlitzKode was fine-tuned through a 4-stage pipeline: |
|
|
| ### 1. SFT (Supervised Fine-Tuning) |
| - **Script**: `scripts/train_sft.py` |
| - Applies LoRA fine-tuning to coding-style prompts and responses |
| - Uses PEFT library for efficient parameter-efficient training |
|
|
| ### 2. GRPO (Group Relative Policy Optimization) |
| - **Script**: `scripts/train_grpo.py` |
| - Uses heuristic reward functions: |
| - `correctness_reward` - Code correctness |
| - `format_reward` - Proper code formatting |
| - `reasoning_reward` - Logic and reasoning |
|
|
| ### 3. DPO (Direct Preference Optimization) |
| - **Script**: `scripts/train_dpo.py` |
| - Trains on handcrafted chosen/rejected preference pairs |
| - Improves clarity and answer quality |
|
|
| ### 4. Merge & Export |
| - **Script**: `scripts/export_gguf.py` |
| - Merges LoRA adapters into base model |
| - Converts to GGUF format for fast inference |
|
|
| ### Training Frameworks |
| - HuggingFace Transformers |
| - PEFT (LoRA) |
| - TRL (DPO/GRPO) |
| - llama.cpp (inference/export) |
|
|
| --- |
|
|
| ## Training Data |
|
|
| ### Local Datasets |
| - `datasets/raw/blitzkode_sft_v1.json` - Seed samples |
| - `datasets/raw/blitzkode_sft_full.json` - Extended coding samples |
|
|
| ### Data Categories |
| - Arrays and hash maps |
| - Linked lists |
| - Trees and graph traversal |
| - Dynamic programming |
| - Sorting and searching |
| - Stack and queue implementations |
| - Interview-style coding problems |
| - Code explanations |
|
|
| ### Optional External Sources |
| The project can optionally incorporate: |
| - CodeAlpaca-20k |
| - GSM8K |
| - MetaMathQA |
| - MathInstruct |
|
|
| --- |
|
|
| ## Features |
|
|
| - **Multi-language Code Generation** - Python, JavaScript, Java, C++, TypeScript, HTML/CSS, SQL |
| - **Code Explanation** - Clear comments and documentation |
| - **Bug Fixing** - Debug and fix code issues |
| - **Algorithm Help** - Data structures and algorithms |
| - **Offline Operation** - Runs locally without internet |
| - **Fast Inference** - Optimized CPU inference |
| - **Modern UI** - ChatGPT-style dark interface |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| ### Best For |
| - Local offline coding assistance |
| - Algorithm and data structure help |
| - Code generation and explanation |
| - Educational programming support |
| - Lightweight code review |
| - Bug detection and fixing |
|
|
| ### Out of Scope |
| - Production code without expert review |
| - Security-critical applications |
| - Multi-modal tasks (images not supported) |
| - Long-context repository analysis |
| - Real-time high-assurance systems |
|
|
| --- |
|
|
| ## API & Usage |
|
|
| ### Running the Server |
|
|
| ```bash |
| # Install dependencies |
| pip install llama-cpp-python fastapi uvicorn pydantic |
| |
| # Start server |
| python server.py |
| |
| # Open browser |
| # http://localhost:7860 |
| ``` |
|
|
| ### API Endpoints |
|
|
| | Endpoint | Method | Description | |
| |----------|--------|-------------| |
| | `/` | GET | Web UI | |
| | `/health` | GET | Health check | |
| | `/info` | GET | API info | |
| | `/generate` | POST | Generate response | |
| | `/generate/stream` | POST | Stream tokens | |
|
|
| ### API Example |
|
|
| ```bash |
| # Generate code |
| curl -X POST http://localhost:7860/generate \ |
| -H "Content-Type: application/json" \ |
| -d '{"prompt": "Write hello world in python"}' |
| |
| # Stream response |
| curl -X POST http://localhost:7860/generate/stream \ |
| -H "Content-Type: application/json" \ |
| -d '{"prompt": "Write a Python function"}' |
| ``` |
|
|
| ### Python Usage |
|
|
| ```python |
| from llama_cpp import Llama |
| |
| llm = Llama( |
| model_path="blitzkode.gguf", |
| n_ctx=2048, |
| n_threads=8, |
| ) |
| |
| prompt = """<|im_start|>system |
| You are BlitzKode, a coding assistant.<|im_end|> |
| <|im_start|>user |
| Write a hello world in Python<|im_end|> |
| <|im_start|>assistant |
| """ |
| |
| result = llm(prompt, max_tokens=256) |
| print(result["choices"][0]["text"]) |
| ``` |
|
|
| --- |
|
|
| ## Prompt Format |
|
|
| Uses ChatML-style template: |
|
|
| ``` |
| <|im_start|>system |
| You are BlitzKode, an AI coding assistant created by Sajad. You are an expert in Python, JavaScript, Java, C++, and other programming languages. Write clean, efficient, and well-documented code. Keep responses concise and practical.<|im_end|> |
| <|im_start|>user |
| {your prompt}<|im_end|> |
| <|im_start|>assistant |
| ``` |
|
|
| --- |
|
|
| ## Configuration |
|
|
| The server supports environment variables: |
|
|
| | Variable | Default | Description | |
| |----------|---------|-------------| |
| | `BLITZKODE_MODEL_PATH` | `blitzkode.gguf` | Model file path | |
| | `BLITZKODE_FRONTEND_PATH` | `frontend/index.html` | UI path | |
| | `BLITZKODE_HOST` | `0.0.0.0` | Server host | |
| | `BLITZKODE_PORT` | `7860` | Server port | |
| | `BLITZKODE_THREADS` | CPU count | CPU threads | |
| | `BLITZKODE_N_CTX` | `2048` | Context window | |
| | `BLITZKODE_BATCH` | `128` | Batch size | |
| | `BLITZKODE_MAX_PROMPT_LENGTH` | `4000` | Max prompt chars | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **Text-only input** - No image/vision support |
| - **2048 token context** - CPU-friendly but limited |
| - **Small model** - May produce incorrect code occasionally |
| - **No formal benchmarks** - Not evaluated on standard datasets |
| - **Quantization loss** - F16 quantization may reduce accuracy |
| - **Verify outputs** - Always review generated code before use |
|
|
| --- |
|
|
| ## Project Structure |
|
|
| ``` |
| BlitzKode/ |
| βββ server.py # FastAPI backend (v1.6) |
| βββ blitzkode.gguf # Quantized model (~3GB) |
| βββ frontend/ |
| β βββ index.html # Web UI |
| βββ tests/ |
| β βββ test_server.py # HTTP tests |
| βββ scripts/ |
| β βββ train_sft.py # SFT training |
| β βββ train_grpo.py # GRPO training |
| β βββ train_dpo.py # DPO training |
| β βββ export_gguf.py # Model export |
| β βββ test_inference.py # Inference test |
| βββ checkpoints/ # LoRA checkpoints |
| βββ datasets/ # Training data |
| βββ MODEL_CARD.md # This file |
| βββ README.md # Project docs |
| ``` |
|
|
| --- |
|
|
| ## Version History |
|
|
| | Version | Date | Changes | |
| |---------|------|---------| |
| | 1.6 | Current | CPU optimization, faster inference | |
| | 1.5 | Earlier | Added streaming support | |
| | 1.0 | Initial | Base model release | |
|
|
| --- |
|
|
| ## License |
|
|
| MIT License - See README.md for details. |
|
|
| Also comply with upstream Qwen base model license when redistributing. |
|
|
| --- |
|
|
| ## Contact |
|
|
| - **GitHub**: https://github.com/sajadkoder/blitzkode |
| - **Portfolio**: https://sajadkoder.vercel.app |
| - Issues and contributions welcome! |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{blitzkode2026, |
| author = {Sajad}, |
| title = {BlitzKode - AI Coding Assistant}, |
| year = {2026}, |
| url = {https://github.com/sajadkoder/blitzkode} |
| } |
| ``` |