--- language: - en library_name: llama-cpp-python pipeline_tag: text-generation tags: - code-generation - coding-assistant - gguf - llama.cpp - qwen2.5 - python - javascript - fine-tuned base_model: - Qwen/Qwen2.5-1.5B-Instruct --- # BlitzKode **BlitzKode** is a locally fine-tuned AI coding assistant built by **Sajad** using the Qwen2.5-1.5B base model. It's packaged as a GGUF format model for fast local inference with llama.cpp. > Created by [Abdulla Sajad](https://github.com/sajadkoder) > Project: [sajadkoder/blitzkode](https://github.com/sajadkoder/blitzkode) --- ## Model Summary | Property | Value | |----------|-------| | **Model Name** | BlitzKode | | **Version** | 1.6 (CPU optimized) | | **Base Model** | Qwen/Qwen2.5-1.5B-Instruct | | **Model Format** | GGUF (F16, ~3GB) | | **Primary Runtime** | llama.cpp / llama-cpp-python | | **Artifact** | `blitzkode.gguf` | | **Context Window** | 2048 tokens | | **Creator** | Sajad | | **License** | MIT | --- ## Architecture - **Model Type**: Transformer-based LLM (1.5B parameters) - **Architecture**: Qwen2 - **Quantization**: GGUF F16 (~3GB) - **Vocabulary**: 151,936 tokens - **Inference**: CPU-optimized with llama.cpp --- ## Training Pipeline BlitzKode was fine-tuned through a 4-stage pipeline: ### 1. SFT (Supervised Fine-Tuning) - **Script**: `scripts/train_sft.py` - Applies LoRA fine-tuning to coding-style prompts and responses - Uses PEFT library for efficient parameter-efficient training ### 2. GRPO (Group Relative Policy Optimization) - **Script**: `scripts/train_grpo.py` - Uses heuristic reward functions: - `correctness_reward` - Code correctness - `format_reward` - Proper code formatting - `reasoning_reward` - Logic and reasoning ### 3. DPO (Direct Preference Optimization) - **Script**: `scripts/train_dpo.py` - Trains on handcrafted chosen/rejected preference pairs - Improves clarity and answer quality ### 4. Merge & Export - **Script**: `scripts/export_gguf.py` - Merges LoRA adapters into base model - Converts to GGUF format for fast inference ### Training Frameworks - HuggingFace Transformers - PEFT (LoRA) - TRL (DPO/GRPO) - llama.cpp (inference/export) --- ## Training Data ### Local Datasets - `datasets/raw/blitzkode_sft_v1.json` - Seed samples - `datasets/raw/blitzkode_sft_full.json` - Extended coding samples ### Data Categories - Arrays and hash maps - Linked lists - Trees and graph traversal - Dynamic programming - Sorting and searching - Stack and queue implementations - Interview-style coding problems - Code explanations ### Optional External Sources The project can optionally incorporate: - CodeAlpaca-20k - GSM8K - MetaMathQA - MathInstruct --- ## Features - **Multi-language Code Generation** - Python, JavaScript, Java, C++, TypeScript, HTML/CSS, SQL - **Code Explanation** - Clear comments and documentation - **Bug Fixing** - Debug and fix code issues - **Algorithm Help** - Data structures and algorithms - **Offline Operation** - Runs locally without internet - **Fast Inference** - Optimized CPU inference - **Modern UI** - ChatGPT-style dark interface --- ## Intended Use ### Best For - Local offline coding assistance - Algorithm and data structure help - Code generation and explanation - Educational programming support - Lightweight code review - Bug detection and fixing ### Out of Scope - Production code without expert review - Security-critical applications - Multi-modal tasks (images not supported) - Long-context repository analysis - Real-time high-assurance systems --- ## API & Usage ### Running the Server ```bash # Install dependencies pip install llama-cpp-python fastapi uvicorn pydantic # Start server python server.py # Open browser # http://localhost:7860 ``` ### API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Web UI | | `/health` | GET | Health check | | `/info` | GET | API info | | `/generate` | POST | Generate response | | `/generate/stream` | POST | Stream tokens | ### API Example ```bash # Generate code curl -X POST http://localhost:7860/generate \ -H "Content-Type: application/json" \ -d '{"prompt": "Write hello world in python"}' # Stream response curl -X POST http://localhost:7860/generate/stream \ -H "Content-Type: application/json" \ -d '{"prompt": "Write a Python function"}' ``` ### Python Usage ```python from llama_cpp import Llama llm = Llama( model_path="blitzkode.gguf", n_ctx=2048, n_threads=8, ) prompt = """<|im_start|>system You are BlitzKode, a coding assistant.<|im_end|> <|im_start|>user Write a hello world in Python<|im_end|> <|im_start|>assistant """ result = llm(prompt, max_tokens=256) print(result["choices"][0]["text"]) ``` --- ## Prompt Format Uses ChatML-style template: ``` <|im_start|>system You are BlitzKode, an AI coding assistant created by Sajad. You are an expert in Python, JavaScript, Java, C++, and other programming languages. Write clean, efficient, and well-documented code. Keep responses concise and practical.<|im_end|> <|im_start|>user {your prompt}<|im_end|> <|im_start|>assistant ``` --- ## Configuration The server supports environment variables: | Variable | Default | Description | |----------|---------|-------------| | `BLITZKODE_MODEL_PATH` | `blitzkode.gguf` | Model file path | | `BLITZKODE_FRONTEND_PATH` | `frontend/index.html` | UI path | | `BLITZKODE_HOST` | `0.0.0.0` | Server host | | `BLITZKODE_PORT` | `7860` | Server port | | `BLITZKODE_THREADS` | CPU count | CPU threads | | `BLITZKODE_N_CTX` | `2048` | Context window | | `BLITZKODE_BATCH` | `128` | Batch size | | `BLITZKODE_MAX_PROMPT_LENGTH` | `4000` | Max prompt chars | --- ## Limitations - **Text-only input** - No image/vision support - **2048 token context** - CPU-friendly but limited - **Small model** - May produce incorrect code occasionally - **No formal benchmarks** - Not evaluated on standard datasets - **Quantization loss** - F16 quantization may reduce accuracy - **Verify outputs** - Always review generated code before use --- ## Project Structure ``` BlitzKode/ ├── server.py # FastAPI backend (v1.6) ├── blitzkode.gguf # Quantized model (~3GB) ├── frontend/ │ └── index.html # Web UI ├── tests/ │ └── test_server.py # HTTP tests ├── scripts/ │ ├── train_sft.py # SFT training │ ├── train_grpo.py # GRPO training │ ├── train_dpo.py # DPO training │ ├── export_gguf.py # Model export │ └── test_inference.py # Inference test ├── checkpoints/ # LoRA checkpoints ├── datasets/ # Training data ├── MODEL_CARD.md # This file └── README.md # Project docs ``` --- ## Version History | Version | Date | Changes | |---------|------|---------| | 1.6 | Current | CPU optimization, faster inference | | 1.5 | Earlier | Added streaming support | | 1.0 | Initial | Base model release | --- ## License MIT License - See README.md for details. Also comply with upstream Qwen base model license when redistributing. --- ## Contact - **GitHub**: https://github.com/sajadkoder/blitzkode - **Portfolio**: https://sajadkoder.vercel.app - Issues and contributions welcome! --- ## Citation ```bibtex @software{blitzkode2026, author = {Sajad}, title = {BlitzKode - AI Coding Assistant}, year = {2026}, url = {https://github.com/sajadkoder/blitzkode} } ```