The ultimate 3B parameter model for sovereign AI deployment
Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment.
Hardware Requirements
Quantization
GPU Required
VRAM
Total Model Size
FP16 (full precision)
RTX 3060+
~6 GB
~6 GB
Q8_0
RTX 3060
~3 GB
~3 GB
Q4_K_M
Any modern GPU
~1.8 GB
~1.8 GB
Q3_K_M
Integrated GPU
~1.2 GB
~1.2 GB
Q2_K
CPU + 8GB RAM
~900 MB
~900 MB
Minimum Requirements (Q3_K and below)
GPU: None required (CPU inference supported)
RAM: 8GB system RAM
Storage: 2GB+ free space
Recommended Requirements
GPU: NVIDIA RTX 3060 (12GB) or better
RAM: 16GB system RAM
Storage: 4GB+ free space for multiple quantizations
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "my-ai-stack/Stack-X-Ultimate"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Generate response
prompt = "Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment."
messages = [
{"role": "system", "content": "You are Stack X Ultimate, a helpful and knowledgeable AI assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
do_sample=True,
)
response = tokenizer.decode(
outputs[0][inputs.input_ids.shape[1]:],
skip_special_tokens=True
)
print(response)
llama.cpp
# Download the GGUF model file# Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main# Run with llama.cpp on GPU
./main -m stack-x-ultimate-q4_k_m.gguf \
-n 512 \
-t 8 \
-c 131072 \
--temp 0.7 \
--top-p 0.95 \
-p "Write a Python function to implement quicksort algorithm."# Run on CPU only
./main -m stack-x-ultimate-q4_k_m.gguf \
-n 512 \
-t 8 \
-c 131072 \
--no-display \
--threads 8 \
-p "Explain the differences between sovereign AI and cloud-based AI solutions."# Use with quantization comparison
./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5
./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5
Ollama
# Pull the model
ollama pull stack-x-ultimate
# Run interactively
ollama run stack-x-ultimate "Write a Python function to implement binary search."# Run with creative temperature
ollama run stack-x-ultimate \
--temperature 0.9 \
--top-p 0.95 \
"Write a short story about an AI that becomes self-aware in an air-gapped facility."# Run with low temperature for factual responses
ollama run stack-x-ultimate \
--temperature 0.2 \
--top-p 0.9 \
"Explain quantum computing and its applications in cryptography."# Use with longer context for document processing
ollama run stack-x-ultimate \
--num-ctx 65536 \
--temperature 0.5 \
"Summarize the following research paper: [PASTE TEXT]"
Model Architecture
Attribute
Value
Base Model
Qwen/Qwen2.5-3B
Parameters
3B
Fine-tuning
Full fine-tuning + LoRA
Context Length
131,072 tokens (128K)
Vocabulary Size
151,936 tokens
Hidden Size
1,536
Attention Heads
12
Num Key Value Heads
2
Transformer Layers
28
Activation Function
SiLU
RoPE Scaling
NTK (factor: 4.0)
Training Details
Base Model: Qwen2.5-3B
Training Approach: Combined full fine-tuning + LoRA
Fine-tuning Data: Diverse high-quality corpus
Focus Areas: General understanding, code generation, instruction following
Special Training: Sovereign deployment optimization, edge computing efficiency
@misc{my-ai-stack/stack-x-ultimate,
author = {Walid Sobhi},
title = {Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/my-ai-stack/Stack-X-Ultimate}
}