krishna-toolcall-7b

A fine-tuned Qwen2.5-7B-Instruct model specialized for reliable JSON tool/function calling in AI agent workflows. Built to output structured function call schemas consistently, making it suitable for local agentic pipelines where tool invocation accuracy matters.

Key Details

Base model Qwen/Qwen2.5-7B-Instruct
Method QLoRA (4-bit NF4, rank 16, alpha 16)
Library Unsloth + TRL SFTTrainer
Dataset glaiveai/glaive-function-calling-v2 (10K examples)
Hardware NVIDIA RTX A5000 (24GB VRAM) on RunPod
Training time ~2.75 hours
Final loss 0.375
Parameters trained 40.4M of 7.66B (0.53%)
Format ChatML (<|im_start|> / <|im_end|>)
Output Merged 16-bit safetensors

Training Metrics

Training ran for 500 steps across ~3.2 epochs. Loss decreased from 1.17 to 0.29 over training with stable gradient norms throughout.

Step Loss Epoch
10 1.172 0.06
100 0.428 0.64
250 0.348 1.60
400 0.331 2.57
500 0.295 3.21

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sriksven/krishna-toolcall-7b")
tokenizer = AutoTokenizer.from_pretrained("sriksven/krishna-toolcall-7b")

messages = [
    {
        "role": "system",
        "content": (
            "You are a helpful assistant with access to the following functions. "
            "Use them if required -\n"
            '{"name": "get_weather", "description": "Get current weather", '
            '"parameters": {"type": "object", "properties": {"location": '
            '{"type": "string"}}, "required": ["location"]}}'
        ),
    },
    {"role": "user", "content": "What's the weather in Boston?"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="sriksven/krishna-toolcall-7b",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Intended Use

  • Building AI agents that invoke tools via structured JSON function calls
  • Local/private agentic pipelines where API-based models are not an option
  • Prototyping multi-agent systems with reliable tool-use behavior
  • Research on function-calling capabilities in open-weight 7B models

Limitations

  • Trained on synthetic function-calling data (glaive-v2), not real API traces
  • 10K training examples — may not cover all tool-calling edge cases
  • No RLHF or DPO alignment applied — outputs may occasionally be off-format
  • Best used with the ChatML prompt template matching the training format
  • Not suitable for safety-critical applications without additional validation

Training Infrastructure

GPU NVIDIA RTX A5000 24GB
Cloud RunPod ($0.27/hr)
Framework Unsloth 2026.5.2 + TRL + Transformers 5.5.0
Precision BF16 training, 4-bit NF4 base quantization
Optimizer AdamW 8-bit
Learning rate 2e-4, linear decay
Batch size 16 effective (4 per device × 4 accumulation)
Packing Enabled

Source Code

Training scripts and configs: github.com/sriksven/LLM-FineTune-Suite

License

Apache 2.0

Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sriksven/ToolSmith-8b

Base model

Qwen/Qwen2.5-7B
Finetuned
(3316)
this model

Dataset used to train sriksven/ToolSmith-8b