| --- |
| language: |
| - en |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: Qwen/Qwen2.5-3B |
| tags: |
| - code-generation |
| - code-assistant |
| - general-purpose |
| - gguf |
| - llama.cpp |
| - ollama |
| - sovereign-ai |
| model-index: |
| - name: Stack-X-Ultimate |
| results: |
| - task: |
| type: text-generation |
| metrics: |
| - type: pass@k |
| value: 0.88 |
| --- |
| |
| <p align="center"> |
| <a href="https://github.com/my-ai-stack/stack-x"> |
| <img src="https://img.shields.io/github/stars/my-ai-stack/stack-x?style=flat-square" alt="GitHub stars"/> |
| </a> |
| <a href="https://github.com/my-ai-stack/stack-x/blob/main/LICENSE"> |
| <img src="https://img.shields.io/badge/License-Apache%202.0-blue?style=flat-square" alt="License"/> |
| </a> |
| <img src="https://img.shields.io/badge/Parameters-3B-blue?style=flat-square" alt="Parameters"/> |
| <img src="https://img.shields.io/badge/Context-128K-green?style=flat-square" alt="Context"/> |
| <img src="https://img.shields.io/badge/Sovereign-AI-red?style=flat-square" alt="Sovereign AI"/> |
| <img src="https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python" alt="Python 3.10+"/> |
| </p> |
| |
| # Stack X Ultimate |
|
|
| > The ultimate 3B parameter model for sovereign AI deployment |
|
|
| Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment. |
|
|
| --- |
|
|
| ## Hardware Requirements |
|
|
| | Quantization | GPU Required | VRAM | Total Model Size | |
| |-------------|--------------|------|------------------| |
| | FP16 (full precision) | RTX 3060+ | ~6 GB | ~6 GB | |
| | Q8_0 | RTX 3060 | ~3 GB | ~3 GB | |
| | Q4_K_M | Any modern GPU | ~1.8 GB | ~1.8 GB | |
| | Q3_K_M | Integrated GPU | ~1.2 GB | ~1.2 GB | |
| | Q2_K | CPU + 8GB RAM | ~900 MB | ~900 MB | |
|
|
| ### Minimum Requirements (Q3_K and below) |
| |
| - **GPU**: None required (CPU inference supported) |
| - **RAM**: 8GB system RAM |
| - **Storage**: 2GB+ free space |
| |
| ### Recommended Requirements |
| |
| - **GPU**: NVIDIA RTX 3060 (12GB) or better |
| - **RAM**: 16GB system RAM |
| - **Storage**: 4GB+ free space for multiple quantizations |
| |
| ### Edge Deployment |
| |
| | Platform | Quantization | Requirements | |
| |----------|--------------|---------------| |
| | NVIDIA Jetson Orin | Q4_K_M | 8GB RAM, 15W TDP | |
| | Raspberry Pi 5 + GPU | Q2_K | 8GB RAM, external GPU | |
| | Apple Silicon (M1/M2/M3) | Q4_K_M | 16GB unified memory | |
| | Intel Arc GPU | Q4_K_M | Intel Arc A770 | |
|
|
| --- |
|
|
| ## File Sizes |
|
|
| | Quantization | File Size | Download | |
| |-------------|-----------|----------| |
| | FP16 | ~6.0 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | |
| | Q8_0 | ~3.0 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | |
| | Q4_K_M | ~1.8 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | |
| | Q3_K_M | ~1.2 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | |
| | Q2_K | ~900 MB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | |
|
|
| --- |
|
|
| ## Use Cases |
|
|
| ### Best Suited Tasks |
|
|
| - **Code Generation**: Multi-language code writing, refactoring, and debugging |
| - **Text Generation**: Creative writing, documentation, content creation |
| - **Question Answering**: Information retrieval, knowledge base queries |
| - **Summarization**: Document summarization, abstract generation |
| - **Classification**: Text classification, sentiment analysis |
| - **Translation**: Cross-language text translation |
| - **Embedded Systems**: On-device AI, IoT applications |
|
|
| ### Industries & Domains |
|
|
| | Industry | Use Case | |
| |----------|----------| |
| | Healthcare | HIPAA-compliant AI assistants, clinical documentation | |
| | Finance | SOC2-compliant automation, risk assessment | |
| | Legal | Contract analysis, case law research | |
| | Government | Classified environment AI, secure documentation | |
| | Manufacturing | Edge AI for quality control, predictive maintenance | |
| | Retail | On-premise customer service, inventory optimization | |
| | Education | Offline learning assistants, classroom AI | |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ### Python (Transformers) |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| # Load model and tokenizer |
| model_name = "my-ai-stack/Stack-X-Ultimate" |
| |
| tokenizer = AutoTokenizer.from_pretrained( |
| model_name, |
| trust_remote_code=True |
| ) |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| model_name, |
| torch_dtype=torch.float16, |
| device_map="auto", |
| trust_remote_code=True |
| ) |
| |
| # Generate response |
| prompt = "Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment." |
| |
| messages = [ |
| {"role": "system", "content": "You are Stack X Ultimate, a helpful and knowledgeable AI assistant."}, |
| {"role": "user", "content": prompt} |
| ] |
| |
| text = tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| add_generation_prompt=True |
| ) |
| |
| inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=512, |
| temperature=0.7, |
| top_p=0.95, |
| do_sample=True, |
| ) |
| |
| response = tokenizer.decode( |
| outputs[0][inputs.input_ids.shape[1]:], |
| skip_special_tokens=True |
| ) |
| |
| print(response) |
| ``` |
|
|
| ### llama.cpp |
|
|
| ```bash |
| # Download the GGUF model file |
| # Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main |
| |
| # Run with llama.cpp on GPU |
| ./main -m stack-x-ultimate-q4_k_m.gguf \ |
| -n 512 \ |
| -t 8 \ |
| -c 131072 \ |
| --temp 0.7 \ |
| --top-p 0.95 \ |
| -p "Write a Python function to implement quicksort algorithm." |
| |
| # Run on CPU only |
| ./main -m stack-x-ultimate-q4_k_m.gguf \ |
| -n 512 \ |
| -t 8 \ |
| -c 131072 \ |
| --no-display \ |
| --threads 8 \ |
| -p "Explain the differences between sovereign AI and cloud-based AI solutions." |
| |
| # Use with quantization comparison |
| ./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5 |
| ./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5 |
| ./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5 |
| ``` |
|
|
| ### Ollama |
|
|
| ```bash |
| # Pull the model |
| ollama pull stack-x-ultimate |
| |
| # Run interactively |
| ollama run stack-x-ultimate "Write a Python function to implement binary search." |
| |
| # Run with creative temperature |
| ollama run stack-x-ultimate \ |
| --temperature 0.9 \ |
| --top-p 0.95 \ |
| "Write a short story about an AI that becomes self-aware in an air-gapped facility." |
| |
| # Run with low temperature for factual responses |
| ollama run stack-x-ultimate \ |
| --temperature 0.2 \ |
| --top-p 0.9 \ |
| "Explain quantum computing and its applications in cryptography." |
| |
| # Use with longer context for document processing |
| ollama run stack-x-ultimate \ |
| --num-ctx 65536 \ |
| --temperature 0.5 \ |
| "Summarize the following research paper: [PASTE TEXT]" |
| ``` |
|
|
| --- |
|
|
| ## Model Architecture |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | Base Model | Qwen/Qwen2.5-3B | |
| | Parameters | 3B | |
| | Fine-tuning | Full fine-tuning + LoRA | |
| | Context Length | 131,072 tokens (128K) | |
| | Vocabulary Size | 151,936 tokens | |
| | Hidden Size | 1,536 | |
| | Attention Heads | 12 | |
| | Num Key Value Heads | 2 | |
| | Transformer Layers | 28 | |
| | Activation Function | SiLU | |
| | RoPE Scaling | NTK (factor: 4.0) | |
|
|
| --- |
|
|
| ## Training Details |
|
|
| - **Base Model**: Qwen2.5-3B |
| - **Training Approach**: Combined full fine-tuning + LoRA |
| - **Fine-tuning Data**: Diverse high-quality corpus |
| - **Focus Areas**: General understanding, code generation, instruction following |
| - **Special Training**: Sovereign deployment optimization, edge computing efficiency |
| - **Context Length**: 128K tokens |
| - **License**: Apache 2.0 |
| - **Release Date**: April 2026 |
|
|
| --- |
|
|
| ## Performance Notes |
|
|
| ### Inference Speed (Q4_K_M) |
|
|
| | Device | Tokens/sec | Latency (512 tokens) | |
| |--------|------------|---------------------| |
| | RTX 4090 | ~55 | ~9.3s | |
| | RTX 3090 | ~42 | ~12.2s | |
| | RTX 3060 | ~25 | ~20.5s | |
| | Apple M2 Pro | ~35 | ~14.6s | |
| | CPU (i9-13900K) | ~10 | ~51.2s | |
|
|
| ### Deployment Scenarios |
|
|
| #### Single User (Interactive) |
|
|
| ```python |
| config = { |
| "max_new_tokens": 512, |
| "temperature": 0.7, |
| "top_p": 0.95, |
| "batch_size": 1, |
| } |
| ``` |
|
|
| #### Multi-User (Server) |
|
|
| ```python |
| config = { |
| "max_new_tokens": 256, |
| "temperature": 0.5, |
| "top_p": 0.9, |
| "batch_size": 4, |
| "use_kv_cache": True, |
| } |
| ``` |
|
|
| #### Offline/Edge |
|
|
| ```python |
| config = { |
| "max_new_tokens": 128, |
| "temperature": 0.3, |
| "top_p": 0.85, |
| "quantization": "q4_k_m", |
| } |
| ``` |
|
|
| --- |
|
|
| ## Security & Sovereignty |
|
|
| Stack X Ultimate is designed for secure, sovereign deployment: |
|
|
| - **Air-Gapped Operation**: No internet connection required |
| - **Data Privacy**: All data stays within your infrastructure |
| - **Compliance Ready**: SOC2, HIPAA, GDPR compatible |
| - **Audit Trail**: Full inference logging capabilities |
| - **On-Premise Only**: No cloud dependencies |
|
|
| ### Enterprise Security Features |
|
|
| | Feature | Description | |
| |---------|-------------| |
| | VPC Deployment | Deploy within your private network | |
| | TLS/SSL | Encrypted communication | |
| | Authentication | OAuth2, LDAP, SSO support | |
| | Rate Limiting | Prevent abuse and overuse | |
| | Audit Logging | Complete inference history | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **Model Size**: At 3B parameters, less capable than larger models for complex reasoning |
| - **Specialized Tasks**: May require fine-tuning for domain-specific tasks |
| - **Multi-modal**: Text-only; does not support images or audio |
| - **Hallucinations**: May occasionally generate incorrect information; verification recommended |
|
|
| --- |
|
|
| ## Quick Links |
|
|
| - [GitHub Repository](https://github.com/my-ai-stack/stack-x) |
| - [HuggingFace Organization](https://huggingface.co/my-ai-stack) |
| - [Model Hub](https://huggingface.co/my-ai-stack/Stack-X-Ultimate) |
| - [Documentation](https://docs.stackai.dev) |
| - [Discord Community](https://discord.gg/clawd) |
| - [Enterprise Contact](https://stackai.dev/contact) |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{my-ai-stack/stack-x-ultimate, |
| author = {Walid Sobhi}, |
| title = {Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment}, |
| year = {2026}, |
| publisher = {HuggingFace}, |
| url = {https://huggingface.co/my-ai-stack/Stack-X-Ultimate} |
| } |
| ``` |
|
|
| --- |
|
|
| <p align="center"> |
| Built with love for developers<br/> |
| <a href="https://discord.gg/clawd">Discord</a> · <a href="https://github.com/my-ai-stack/stack-x">GitHub</a> · <a href="https://huggingface.co/my-ai-stack">HuggingFace</a> |
| </p> |