blitzkode / README.md
sajadkoder's picture
Update README.md
61731ba verified
---
language:
- en
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
- code-generation
- coding-assistant
- gguf
- llama.cpp
- qwen2.5
- python
- javascript
- fine-tuned
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
---
# BlitzKode
**BlitzKode** is a locally fine-tuned AI coding assistant built by **Sajad** using the Qwen2.5-1.5B base model. It's packaged as a GGUF format model for fast local inference with llama.cpp.
> Created by [Abdulla Sajad](https://github.com/sajadkoder)
> Project: [sajadkoder/blitzkode](https://github.com/sajadkoder/blitzkode)
---
## Model Summary
| Property | Value |
|----------|-------|
| **Model Name** | BlitzKode |
| **Version** | 1.6 (CPU optimized) |
| **Base Model** | Qwen/Qwen2.5-1.5B-Instruct |
| **Model Format** | GGUF (F16, ~3GB) |
| **Primary Runtime** | llama.cpp / llama-cpp-python |
| **Artifact** | `blitzkode.gguf` |
| **Context Window** | 2048 tokens |
| **Creator** | Sajad |
| **License** | MIT |
---
## Architecture
- **Model Type**: Transformer-based LLM (1.5B parameters)
- **Architecture**: Qwen2
- **Quantization**: GGUF F16 (~3GB)
- **Vocabulary**: 151,936 tokens
- **Inference**: CPU-optimized with llama.cpp
---
## Training Pipeline
BlitzKode was fine-tuned through a 4-stage pipeline:
### 1. SFT (Supervised Fine-Tuning)
- **Script**: `scripts/train_sft.py`
- Applies LoRA fine-tuning to coding-style prompts and responses
- Uses PEFT library for efficient parameter-efficient training
### 2. GRPO (Group Relative Policy Optimization)
- **Script**: `scripts/train_grpo.py`
- Uses heuristic reward functions:
- `correctness_reward` - Code correctness
- `format_reward` - Proper code formatting
- `reasoning_reward` - Logic and reasoning
### 3. DPO (Direct Preference Optimization)
- **Script**: `scripts/train_dpo.py`
- Trains on handcrafted chosen/rejected preference pairs
- Improves clarity and answer quality
### 4. Merge & Export
- **Script**: `scripts/export_gguf.py`
- Merges LoRA adapters into base model
- Converts to GGUF format for fast inference
### Training Frameworks
- HuggingFace Transformers
- PEFT (LoRA)
- TRL (DPO/GRPO)
- llama.cpp (inference/export)
---
## Training Data
### Local Datasets
- `datasets/raw/blitzkode_sft_v1.json` - Seed samples
- `datasets/raw/blitzkode_sft_full.json` - Extended coding samples
### Data Categories
- Arrays and hash maps
- Linked lists
- Trees and graph traversal
- Dynamic programming
- Sorting and searching
- Stack and queue implementations
- Interview-style coding problems
- Code explanations
### Optional External Sources
The project can optionally incorporate:
- CodeAlpaca-20k
- GSM8K
- MetaMathQA
- MathInstruct
---
## Features
- **Multi-language Code Generation** - Python, JavaScript, Java, C++, TypeScript, HTML/CSS, SQL
- **Code Explanation** - Clear comments and documentation
- **Bug Fixing** - Debug and fix code issues
- **Algorithm Help** - Data structures and algorithms
- **Offline Operation** - Runs locally without internet
- **Fast Inference** - Optimized CPU inference
- **Modern UI** - ChatGPT-style dark interface
---
## Intended Use
### Best For
- Local offline coding assistance
- Algorithm and data structure help
- Code generation and explanation
- Educational programming support
- Lightweight code review
- Bug detection and fixing
### Out of Scope
- Production code without expert review
- Security-critical applications
- Multi-modal tasks (images not supported)
- Long-context repository analysis
- Real-time high-assurance systems
---
## API & Usage
### Running the Server
```bash
# Install dependencies
pip install llama-cpp-python fastapi uvicorn pydantic
# Start server
python server.py
# Open browser
# http://localhost:7860
```
### API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Web UI |
| `/health` | GET | Health check |
| `/info` | GET | API info |
| `/generate` | POST | Generate response |
| `/generate/stream` | POST | Stream tokens |
### API Example
```bash
# Generate code
curl -X POST http://localhost:7860/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Write hello world in python"}'
# Stream response
curl -X POST http://localhost:7860/generate/stream \
-H "Content-Type: application/json" \
-d '{"prompt": "Write a Python function"}'
```
### Python Usage
```python
from llama_cpp import Llama
llm = Llama(
model_path="blitzkode.gguf",
n_ctx=2048,
n_threads=8,
)
prompt = """<|im_start|>system
You are BlitzKode, a coding assistant.<|im_end|>
<|im_start|>user
Write a hello world in Python<|im_end|>
<|im_start|>assistant
"""
result = llm(prompt, max_tokens=256)
print(result["choices"][0]["text"])
```
---
## Prompt Format
Uses ChatML-style template:
```
<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert in Python, JavaScript, Java, C++, and other programming languages. Write clean, efficient, and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant
```
---
## Configuration
The server supports environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `BLITZKODE_MODEL_PATH` | `blitzkode.gguf` | Model file path |
| `BLITZKODE_FRONTEND_PATH` | `frontend/index.html` | UI path |
| `BLITZKODE_HOST` | `0.0.0.0` | Server host |
| `BLITZKODE_PORT` | `7860` | Server port |
| `BLITZKODE_THREADS` | CPU count | CPU threads |
| `BLITZKODE_N_CTX` | `2048` | Context window |
| `BLITZKODE_BATCH` | `128` | Batch size |
| `BLITZKODE_MAX_PROMPT_LENGTH` | `4000` | Max prompt chars |
---
## Limitations
- **Text-only input** - No image/vision support
- **2048 token context** - CPU-friendly but limited
- **Small model** - May produce incorrect code occasionally
- **No formal benchmarks** - Not evaluated on standard datasets
- **Quantization loss** - F16 quantization may reduce accuracy
- **Verify outputs** - Always review generated code before use
---
## Project Structure
```
BlitzKode/
β”œβ”€β”€ server.py # FastAPI backend (v1.6)
β”œβ”€β”€ blitzkode.gguf # Quantized model (~3GB)
β”œβ”€β”€ frontend/
β”‚ └── index.html # Web UI
β”œβ”€β”€ tests/
β”‚ └── test_server.py # HTTP tests
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ train_sft.py # SFT training
β”‚ β”œβ”€β”€ train_grpo.py # GRPO training
β”‚ β”œβ”€β”€ train_dpo.py # DPO training
β”‚ β”œβ”€β”€ export_gguf.py # Model export
β”‚ └── test_inference.py # Inference test
β”œβ”€β”€ checkpoints/ # LoRA checkpoints
β”œβ”€β”€ datasets/ # Training data
β”œβ”€β”€ MODEL_CARD.md # This file
└── README.md # Project docs
```
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.6 | Current | CPU optimization, faster inference |
| 1.5 | Earlier | Added streaming support |
| 1.0 | Initial | Base model release |
---
## License
MIT License - See README.md for details.
Also comply with upstream Qwen base model license when redistributing.
---
## Contact
- **GitHub**: https://github.com/sajadkoder/blitzkode
- **Portfolio**: https://sajadkoder.vercel.app
- Issues and contributions welcome!
---
## Citation
```bibtex
@software{blitzkode2026,
author = {Sajad},
title = {BlitzKode - AI Coding Assistant},
year = {2026},
url = {https://github.com/sajadkoder/blitzkode}
}
```