--- # Stack 2.9 > A fine-tuned code assistant built on Qwen2.5-Coder-1.5B, trained on Stack Overflow data Stack 2.9 is a specialized code generation model fine-tuned from [Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B) on Stack Overflow Q&A data for improved programming assistance. ## Key Features - **Specialized for Code**: Trained on Stack Overflow patterns for better code generation - **128K Context**: Handle larger codebases and complex documentation - **Efficient**: Runs on consumer GPUs (RTX 3060+) - **Open Source**: Apache 2.0 licensed --- ## Model Details | Attribute | Value | |-----------|-------| | **Base Model** | Qwen/Qwen2.5-Coder-1.5B | | **Parameters** | 1.5B | | **Context Length** | 131,072 tokens (128K) | | **Fine-tuning Method** | LoRA (Rank 8) | | **Precision** | FP16 | | **License** | Apache 2.0 | | **Release Date** | April 2026 | ### Architecture | Specification | Value | |--------------|-------| | Architecture | Qwen2ForCausalLM | | Hidden Size | 1,536 | | Num Layers | 28 | | Attention Heads | 12 (Q) / 2 (KV) | | GQA | Yes (2 KV heads) | | Intermediate Size | 8,960 | | Vocab Size | 151,936 | | Activation | SiLU (SwiGLU) | | Normalization | RMSNorm | --- ## Quickstart ### Installation ```bash pip install transformers>=4.40.0 torch>=2.0.0 accelerate ``` ### Code Example ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "my-ai-stack/Stack-2-9-finetuned" # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Chat interface messages = [ {"role": "system", "content": "You are Stack 2.9, a helpful coding assistant."}, {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"} ] # Apply chat template text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Generate model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512, temperature=0.7, do_sample=True ) # Decode response response = tokenizer.decode( generated_ids[0][len(model_inputs.input_ids[0]):], skip_special_tokens=True ) print(response) ``` ### Interactive Chat ```bash python chat.py ``` --- ## Training Details | Specification | Value | |--------------|-------| | **Method** | LoRA (Low-Rank Adaptation) | | **LoRA Rank** | 8 | | **LoRA Alpha** | 16 | | **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Epochs** | ~0.8 | | **Final Loss** | 0.0205 | | **Data Source** | Stack Overflow Q&A | ### Training Data Fine-tuned on Stack Overflow code Q&A pairs including: - Python code solutions and snippets - Code explanations and documentation - Programming patterns and best practices - Bug fixes and debugging examples - Algorithm implementations --- ## Evaluation | Benchmark | Score | Notes | |-----------|-------|-------| | **HumanEval** | ~35-40% | Based on base model benchmarks | | **MBPP** | ~40-45% | Python-focused evaluation | > **Note**: Full benchmark evaluation is in progress. The model inherits strong coding capabilities from Qwen2.5-Coder and is specialized for Stack Overflow patterns. --- ## Hardware Requirements | Configuration | GPU | VRAM | |---------------|-----|------| | FP16 | RTX 3060+ | ~4GB | | 8-bit | RTX 3060+ | ~2GB | | 4-bit | Any modern GPU | ~1GB | | CPU | None | ~8GB RAM | --- ## Capabilities - **Code Generation**: Python, JavaScript, TypeScript, SQL, Go, Rust, and more - **Code Completion**: Functions, classes, and entire snippets - **Debugging**: Identify and fix bugs with explanations - **Code Explanation**: Document and explain code behavior - **Programming Q&A**: Answer technical questions --- ## Limitations - **Model Size**: At 1.5B parameters, smaller than state-of-the-art models (7B+) - **Training Data**: Python-heavy; other languages may have lower quality - **Hallucinations**: May occasionally generate incorrect code; verification recommended - **Tool Use**: Base model without native tool-calling (see enhanced version) --- ## Comparison | Feature | Qwen2.5-Coder-1.5B | Stack 2.9 | |---------|-------------------|-----------| | Code Generation | General | Stack Overflow patterns | | Python Proficiency | Baseline | Enhanced | | Context Length | 128K | 128K | | Specialization | General code | Stack Overflow Q&A | --- ## Citation ```bibtex @misc{my-ai-stack/stack-2-9-finetuned, author = {Walid Sobhi}, title = {Stack 2.9: Fine-tuned Qwen2.5-Coder-1.5B on Stack Overflow Data}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/my-ai-stack/Stack-2-9-finetuned} } ``` --- ## Related Links - [GitHub Repository](https://github.com/my-ai-stack/stack-2.9) - [HuggingFace Space Demo](https://huggingface.co/spaces/my-ai-stack/stack-2-9-demo) - [Base Model](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B) - [Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) - [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) --- ## License Licensed under the Apache 2.0 license. See [LICENSE](LICENSE) for details. --- *Model Card Version: 2.0* *Last Updated: April 2026*