GitHub stars License Downloads Parameters Context Tools Agentic Python 3.10+

Stack 4.0 Qwen 3B Agentic

Fine-tuned 3B parameter model optimized for tool-calling, RAG, and multi-step agentic workflows

Stack 4.0 Qwen 3B Agentic is a specialized fine-tuned version of Qwen2.5-Coder-3B, optimized specifically for agentic AI workflows. It excels at function calling, tool use, multi-turn conversations, and autonomous task execution. Designed for regulated environments requiring sovereign AI deployment.


Hardware Requirements

Quantization GPU Required VRAM Total Model Size
FP16 (full precision) RTX 3060+ ~6 GB ~6 GB
Q8_0 RTX 3060 ~3 GB ~3 GB
Q4_K_M Any modern GPU ~1.8 GB ~1.8 GB
Q3_K_M Integrated GPU ~1.2 GB ~1.2 GB
Q2_K CPU + 8GB RAM ~900 MB ~900 MB

Minimum Requirements (Q3_K and below)

  • GPU: None required (CPU inference supported)
  • RAM: 8GB system RAM
  • Storage: 2GB+ free space

Recommended Requirements

  • GPU: NVIDIA RTX 3060 (12GB) or better
  • RAM: 16GB system RAM
  • Storage: 4GB+ free space for multiple quantizations

File Sizes

Quantization File Size Download
FP16 ~6.0 GB Download
Q8_0 ~3.0 GB Download
Q4_K_M ~1.8 GB Download
Q3_K_M ~1.2 GB Download
Q2_K ~900 MB Download

Use Cases

Best Suited Tasks

  • Tool-Calling Agents: Autonomous agents that call external functions and APIs
  • RAG Systems: Retrieval-augmented generation with context-aware tool selection
  • Multi-Step Reasoning: Complex tasks requiring planning and sequential execution
  • Code Assistance: Code generation, debugging, and refactoring
  • Conversation Agents: Multi-turn dialog with state management
  • Workflow Automation: Task orchestration and process automation

Industries & Domains

Industry Use Case
Software Development AI coding assistants, automated code review
Customer Support Autonomous support agents, ticket routing
Data Analysis Data pipeline automation, report generation
DevOps Infrastructure automation, CI/CD optimization
Legal Document automation, case research
Healthcare Clinical decision support, appointment scheduling
Finance Portfolio management, fraud detection

Quick Start

Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "my-ai-stack/Stack-4.0-Qwen-3B-Agentic"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Example tool call format
tool_schema = [
    {
        "type": "function",
        "function": {
            "name": "search_code",
            "description": "Search for code patterns in the repository",
            "parameters": {
                "type": "object",
                "properties": {
                    "pattern": {"type": "string", "description": "Regex pattern to search"},
                    "path": {"type": "string", "description": "Directory path to search"}
                },
                "required": ["pattern"]
            }
        }
    }
]

# Generate with tool calling
prompt = """Search for all functions containing 'async' in the src directory."""

messages = [
    {"role": "system", "content": "You are Stack 4.0, an agentic AI assistant with tool-calling capabilities."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.2,
        top_p=0.95,
        do_sample=True,
    )

response = tokenizer.decode(
    outputs[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)

print(response)

llama.cpp

# Download the GGUF model file
# Visit: https://huggingface.co/my-ai-stack/Stack-4.0-Qwen-3B-Agentic/tree/main

# Run with llama.cpp
./main -m stack-4.0-qwen-3b-agentic-q4_k_m.gguf \
  -n 512 \
  -t 8 \
  -c 131072 \
  --temp 0.2 \
  --top-p 0.95 \
  -p "Write a Python function that searches for code patterns using regex."

# Or use with tool schema (JSON mode)
./main -m stack-4.0-qwen-3b-agentic-q4_k_m.gguf \
  --json-schema '{
    "type": "object",
    "properties": {
      "search": {
        "type": "object",
        "properties": {
          "pattern": {"type": "string"},
          "path": {"type": "string"}
        }
      }
    }
  }'

Ollama

# Pull the model
ollama pull stack-4.0-qwen-3b-agentic

# Run interactively with agentic mode
ollama run stack-4.0-qwen-3b-agentic "Search for all async functions in the src directory."

# Or use with custom parameters for agentic workflows
ollama run stack-4.0-qwen-3b-agentic \
  --temperature 0.1 \
  --top-p 0.9 \
  --num-ctx 131072 \
  --num-gpu 1 \
  "Create a Python script that implements a multi-step data pipeline with error handling."

# Use with Ollama's function calling (if available in your version)
ollama function call stack-4.0-qwen-3b-agentic \
  --function search_code \
  --args '{"pattern": "def.*", "path": "./src"}'

Agentic Capabilities

Stack 4.0 Qwen 3B Agentic is specifically trained for autonomous agent workflows:

Tool Calling

  • Native function calling with structured JSON output
  • Support for tool schemas in OpenAI format
  • Multi-tool selection and chaining

Multi-Step Reasoning

  • Plan-and-execute workflows
  • Intermediate step tracking
  • Self-correction on failure

Available Tools (72+ Built-in)

Category Tools
File Operations file_read, file_write, file_edit, file_delete
Code Search grep, glob, grep_count
Task Management task_create, task_list, task_update, task_delete
Agent Orchestration agent_spawn, team_create, team_assign
Web Operations web_search, web_fetch
Scheduling cron_create, cron_list
Skills skill_execute, skill_chain
Messaging message_send, message_channel
MCP Integration mcp_call, mcp_list_servers

Model Architecture

Attribute Value
Base Model Qwen/Qwen2.5-Coder-3B
Parameters 3B
Fine-tuning LoRA (Rank 8)
Context Length 131,072 tokens (128K)
Vocabulary Size 151,936 tokens
Hidden Size 1,536
Attention Heads 12
Num Key Value Heads 2
Transformer Layers 28
Activation Function SiLU
RoPE Scaling NTK (factor: 4.0)

Training Details

  • Base Model: Qwen2.5-Coder-3B
  • Training Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 8
  • LoRA Alpha: 16
  • Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • Training Data: Multi-turn tool conversations, function-calling examples, enterprise workflow patterns
  • Focus Areas: Tool selection, function arguments, multi-step planning
  • Context Length: 128K tokens
  • License: Apache 2.0
  • Release Date: April 2026

Performance Notes

Inference Speed (Q4_K_M)

GPU Tokens/sec
RTX 4090 ~45
RTX 3090 ~35
RTX 3060 ~20
CPU (i9-13900K) ~8

Memory Usage During Inference

# Optimal settings for inference
config = {
    "batch_size": 1,
    "use_kv_cache": True,
    "max_new_tokens": 512,
    "torch_dtype": torch.float16,  # Use float16 on GPU
    # For CPU inference:
    # "torch_dtype": torch.float32,
    # "device_map": "cpu",
}

Limitations

  • Model Size: At 3B parameters, less capable than larger models for complex reasoning
  • Training Data: Optimized for English; other languages may have reduced quality
  • Tool Accuracy: May occasionally call incorrect tools; verification recommended
  • Long Context: Performance may degrade beyond 64K tokens in some scenarios

Quick Links


Citation

@misc{my-ai-stack/stack-4-0-qwen-3b-agentic,
  author = {Walid Sobhi},
  title = {Stack 4.0 Qwen 3B Agentic: Fine-tuned for Tool-Calling and Agentic Workflows},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/my-ai-stack/Stack-4.0-Qwen-3B-Agentic}
}

Built with love for developers
Discord · GitHub · HuggingFace

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for my-ai-stack/Stack-4.0-Qwen-3B-Agentic

Base model

Qwen/Qwen2.5-3B
Finetuned
(56)
this model

Space using my-ai-stack/Stack-4.0-Qwen-3B-Agentic 1

Evaluation results