Text Generation
Transformers
Safetensors
GGUF
English
qwen2
code-generation
code-assistant
general-purpose
llama.cpp
ollama
sovereign-ai
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use my-ai-stack/Stack-X-Ultimate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-X-Ultimate with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-X-Ultimate") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-X-Ultimate") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-X-Ultimate") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-X-Ultimate with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-X-Ultimate" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-X-Ultimate", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-X-Ultimate
- SGLang
How to use my-ai-stack/Stack-X-Ultimate with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-X-Ultimate" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-X-Ultimate", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-X-Ultimate" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-X-Ultimate", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-X-Ultimate with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-X-Ultimate
File size: 10,386 Bytes
613a7b0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 | ---
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-3B
tags:
- code-generation
- code-assistant
- general-purpose
- gguf
- llama.cpp
- ollama
- sovereign-ai
model-index:
- name: Stack-X-Ultimate
results:
- task:
type: text-generation
metrics:
- type: pass@k
value: 0.88
---
<p align="center">
<a href="https://github.com/my-ai-stack/stack-x">
<img src="https://img.shields.io/github/stars/my-ai-stack/stack-x?style=flat-square" alt="GitHub stars"/>
</a>
<a href="https://github.com/my-ai-stack/stack-x/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-Apache%202.0-blue?style=flat-square" alt="License"/>
</a>
<img src="https://img.shields.io/badge/Parameters-3B-blue?style=flat-square" alt="Parameters"/>
<img src="https://img.shields.io/badge/Context-128K-green?style=flat-square" alt="Context"/>
<img src="https://img.shields.io/badge/Sovereign-AI-red?style=flat-square" alt="Sovereign AI"/>
<img src="https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python" alt="Python 3.10+"/>
</p>
# Stack X Ultimate
> The ultimate 3B parameter model for sovereign AI deployment
Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment.
---
## Hardware Requirements
| Quantization | GPU Required | VRAM | Total Model Size |
|-------------|--------------|------|------------------|
| FP16 (full precision) | RTX 3060+ | ~6 GB | ~6 GB |
| Q8_0 | RTX 3060 | ~3 GB | ~3 GB |
| Q4_K_M | Any modern GPU | ~1.8 GB | ~1.8 GB |
| Q3_K_M | Integrated GPU | ~1.2 GB | ~1.2 GB |
| Q2_K | CPU + 8GB RAM | ~900 MB | ~900 MB |
### Minimum Requirements (Q3_K and below)
- **GPU**: None required (CPU inference supported)
- **RAM**: 8GB system RAM
- **Storage**: 2GB+ free space
### Recommended Requirements
- **GPU**: NVIDIA RTX 3060 (12GB) or better
- **RAM**: 16GB system RAM
- **Storage**: 4GB+ free space for multiple quantizations
### Edge Deployment
| Platform | Quantization | Requirements |
|----------|--------------|---------------|
| NVIDIA Jetson Orin | Q4_K_M | 8GB RAM, 15W TDP |
| Raspberry Pi 5 + GPU | Q2_K | 8GB RAM, external GPU |
| Apple Silicon (M1/M2/M3) | Q4_K_M | 16GB unified memory |
| Intel Arc GPU | Q4_K_M | Intel Arc A770 |
---
## File Sizes
| Quantization | File Size | Download |
|-------------|-----------|----------|
| FP16 | ~6.0 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) |
| Q8_0 | ~3.0 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) |
| Q4_K_M | ~1.8 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) |
| Q3_K_M | ~1.2 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) |
| Q2_K | ~900 MB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) |
---
## Use Cases
### Best Suited Tasks
- **Code Generation**: Multi-language code writing, refactoring, and debugging
- **Text Generation**: Creative writing, documentation, content creation
- **Question Answering**: Information retrieval, knowledge base queries
- **Summarization**: Document summarization, abstract generation
- **Classification**: Text classification, sentiment analysis
- **Translation**: Cross-language text translation
- **Embedded Systems**: On-device AI, IoT applications
### Industries & Domains
| Industry | Use Case |
|----------|----------|
| Healthcare | HIPAA-compliant AI assistants, clinical documentation |
| Finance | SOC2-compliant automation, risk assessment |
| Legal | Contract analysis, case law research |
| Government | Classified environment AI, secure documentation |
| Manufacturing | Edge AI for quality control, predictive maintenance |
| Retail | On-premise customer service, inventory optimization |
| Education | Offline learning assistants, classroom AI |
---
## Quick Start
### Python (Transformers)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "my-ai-stack/Stack-X-Ultimate"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Generate response
prompt = "Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment."
messages = [
{"role": "system", "content": "You are Stack X Ultimate, a helpful and knowledgeable AI assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
do_sample=True,
)
response = tokenizer.decode(
outputs[0][inputs.input_ids.shape[1]:],
skip_special_tokens=True
)
print(response)
```
### llama.cpp
```bash
# Download the GGUF model file
# Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main
# Run with llama.cpp on GPU
./main -m stack-x-ultimate-q4_k_m.gguf \
-n 512 \
-t 8 \
-c 131072 \
--temp 0.7 \
--top-p 0.95 \
-p "Write a Python function to implement quicksort algorithm."
# Run on CPU only
./main -m stack-x-ultimate-q4_k_m.gguf \
-n 512 \
-t 8 \
-c 131072 \
--no-display \
--threads 8 \
-p "Explain the differences between sovereign AI and cloud-based AI solutions."
# Use with quantization comparison
./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5
```
### Ollama
```bash
# Pull the model
ollama pull stack-x-ultimate
# Run interactively
ollama run stack-x-ultimate "Write a Python function to implement binary search."
# Run with creative temperature
ollama run stack-x-ultimate \
--temperature 0.9 \
--top-p 0.95 \
"Write a short story about an AI that becomes self-aware in an air-gapped facility."
# Run with low temperature for factual responses
ollama run stack-x-ultimate \
--temperature 0.2 \
--top-p 0.9 \
"Explain quantum computing and its applications in cryptography."
# Use with longer context for document processing
ollama run stack-x-ultimate \
--num-ctx 65536 \
--temperature 0.5 \
"Summarize the following research paper: [PASTE TEXT]"
```
---
## Model Architecture
| Attribute | Value |
|-----------|-------|
| Base Model | Qwen/Qwen2.5-3B |
| Parameters | 3B |
| Fine-tuning | Full fine-tuning + LoRA |
| Context Length | 131,072 tokens (128K) |
| Vocabulary Size | 151,936 tokens |
| Hidden Size | 1,536 |
| Attention Heads | 12 |
| Num Key Value Heads | 2 |
| Transformer Layers | 28 |
| Activation Function | SiLU |
| RoPE Scaling | NTK (factor: 4.0) |
---
## Training Details
- **Base Model**: Qwen2.5-3B
- **Training Approach**: Combined full fine-tuning + LoRA
- **Fine-tuning Data**: Diverse high-quality corpus
- **Focus Areas**: General understanding, code generation, instruction following
- **Special Training**: Sovereign deployment optimization, edge computing efficiency
- **Context Length**: 128K tokens
- **License**: Apache 2.0
- **Release Date**: April 2026
---
## Performance Notes
### Inference Speed (Q4_K_M)
| Device | Tokens/sec | Latency (512 tokens) |
|--------|------------|---------------------|
| RTX 4090 | ~55 | ~9.3s |
| RTX 3090 | ~42 | ~12.2s |
| RTX 3060 | ~25 | ~20.5s |
| Apple M2 Pro | ~35 | ~14.6s |
| CPU (i9-13900K) | ~10 | ~51.2s |
### Deployment Scenarios
#### Single User (Interactive)
```python
config = {
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.95,
"batch_size": 1,
}
```
#### Multi-User (Server)
```python
config = {
"max_new_tokens": 256,
"temperature": 0.5,
"top_p": 0.9,
"batch_size": 4,
"use_kv_cache": True,
}
```
#### Offline/Edge
```python
config = {
"max_new_tokens": 128,
"temperature": 0.3,
"top_p": 0.85,
"quantization": "q4_k_m",
}
```
---
## Security & Sovereignty
Stack X Ultimate is designed for secure, sovereign deployment:
- **Air-Gapped Operation**: No internet connection required
- **Data Privacy**: All data stays within your infrastructure
- **Compliance Ready**: SOC2, HIPAA, GDPR compatible
- **Audit Trail**: Full inference logging capabilities
- **On-Premise Only**: No cloud dependencies
### Enterprise Security Features
| Feature | Description |
|---------|-------------|
| VPC Deployment | Deploy within your private network |
| TLS/SSL | Encrypted communication |
| Authentication | OAuth2, LDAP, SSO support |
| Rate Limiting | Prevent abuse and overuse |
| Audit Logging | Complete inference history |
---
## Limitations
- **Model Size**: At 3B parameters, less capable than larger models for complex reasoning
- **Specialized Tasks**: May require fine-tuning for domain-specific tasks
- **Multi-modal**: Text-only; does not support images or audio
- **Hallucinations**: May occasionally generate incorrect information; verification recommended
---
## Quick Links
- [GitHub Repository](https://github.com/my-ai-stack/stack-x)
- [HuggingFace Organization](https://huggingface.co/my-ai-stack)
- [Model Hub](https://huggingface.co/my-ai-stack/Stack-X-Ultimate)
- [Documentation](https://docs.stackai.dev)
- [Discord Community](https://discord.gg/clawd)
- [Enterprise Contact](https://stackai.dev/contact)
---
## Citation
```bibtex
@misc{my-ai-stack/stack-x-ultimate,
author = {Walid Sobhi},
title = {Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/my-ai-stack/Stack-X-Ultimate}
}
```
---
<p align="center">
Built with love for developers<br/>
<a href="https://discord.gg/clawd">Discord</a> · <a href="https://github.com/my-ai-stack/stack-x">GitHub</a> · <a href="https://huggingface.co/my-ai-stack">HuggingFace</a>
</p> |