Text Generation
Transformers
TensorBoard
Safetensors
English
gemma3_text
function-calling
multi-agent
router
gemma
fine-tuned
customer-support
conversational
text-generation-inference
Instructions to use bhaiyasingh45/functiongemma-multiagent-router with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bhaiyasingh45/functiongemma-multiagent-router with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bhaiyasingh45/functiongemma-multiagent-router") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bhaiyasingh45/functiongemma-multiagent-router") model = AutoModelForCausalLM.from_pretrained("bhaiyasingh45/functiongemma-multiagent-router") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bhaiyasingh45/functiongemma-multiagent-router with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bhaiyasingh45/functiongemma-multiagent-router" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bhaiyasingh45/functiongemma-multiagent-router", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/bhaiyasingh45/functiongemma-multiagent-router
- SGLang
How to use bhaiyasingh45/functiongemma-multiagent-router with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bhaiyasingh45/functiongemma-multiagent-router" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bhaiyasingh45/functiongemma-multiagent-router", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bhaiyasingh45/functiongemma-multiagent-router" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bhaiyasingh45/functiongemma-multiagent-router", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use bhaiyasingh45/functiongemma-multiagent-router with Docker Model Runner:
docker model run hf.co/bhaiyasingh45/functiongemma-multiagent-router
| language: | |
| - en | |
| license: gemma | |
| library_name: transformers | |
| tags: | |
| - function-calling | |
| - multi-agent | |
| - router | |
| - gemma | |
| - fine-tuned | |
| - customer-support | |
| base_model: google/functiongemma-270m-it | |
| datasets: | |
| - bhaiyahnsingh45/multiagent-router-finetuning | |
| metrics: | |
| - accuracy | |
| pipeline_tag: text-generation | |
| widget: | |
| - text: "My app keeps crashing when I upload large files" | |
| example_title: "Technical Issue" | |
| - text: "I need a refund for my subscription" | |
| example_title: "Billing Request" | |
| - text: "What integrations do you support?" | |
| example_title: "Product Info" | |
| # Multi-Agent Router (Fine-tuned FunctionGemma 270M) | |
| <div align="center"> | |
| <img src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png" alt="Hugging Face" width="100"/> | |
| **Intelligent routing model for multi-agent customer support systems** | |
| [](https://ai.google.dev/gemma/terms) | |
| [](https://huggingface.co/google/functiongemma-270m-it) | |
| [](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning) | |
| </div> | |
| ## ๐ Model Description | |
| This model is a **fine-tuned version of Google's FunctionGemma 270M** specifically trained for intelligent routing in multi-agent customer support systems. It learns to: | |
| 1. **Classify user intent** from natural language queries | |
| 2. **Route to the appropriate specialist agent** | |
| 3. **Extract relevant parameters** (priority, urgency, category) | |
| ### ๐ค Supported Agents | |
| The model routes queries to three specialized agents: | |
| | Agent | Handles | Parameters | | |
| |-------|---------|------------| | |
| | ๐ง **Technical Support** | Crashes, bugs, API errors, authentication issues | `issue_type`, `priority` | | |
| | ๐ฐ **Billing** | Payments, refunds, subscriptions, invoices | `request_type`, `urgency` | | |
| | ๐ **Product Info** | Features, integrations, plans, compliance | `query_type`, `category` | | |
| ## ๐ฏ Training Details | |
| ### Base Model | |
| - **Model**: `google/functiongemma-270m-it` | |
| - **Parameters**: 270 Million | |
| - **Architecture**: Gemma with function calling capabilities | |
| ### Fine-tuning Configuration | |
| - **Training Samples**: 92 | |
| - **Test Samples**: 23 | |
| - **Epochs**: 15 | |
| - **Batch Size**: 4 | |
| - **Learning Rate**: 5e-05 | |
| - **GPU**: NVIDIA T4 (Google Colab Free Tier) | |
| - **Training Time**: ~5-8 minutes | |
| ### Dataset | |
| Fine-tuned on [bhaiyahnsingh45/multiagent-router-finetuning](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning) containing 85 realistic customer support queries across three categories. | |
| ## ๐ Performance | |
| | Metric | Before Training | After Training | Improvement | | |
| |--------|----------------|----------------|-------------| | |
| | **Accuracy** | 4.3% | 82.6% | **+78.3%** | | |
| | **Correct Predictions** | 1/23 | 19/23 | +18 | | |
| ### Per-Agent Performance | |
| - **Technical Support**: High accuracy on crash reports, API errors, authentication issues | |
| - **Billing**: Excellent routing for refunds, payments, subscription management | |
| - **Product Info**: Strong performance on feature queries, integrations, compliance questions | |
| ## ๐ Quick Start | |
| ### Installation | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ### Basic Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import re | |
| import json | |
| # Load model and tokenizer | |
| model_name = "bhaiyahnsingh45/functiongemma-multiagent-router" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| device_map="auto", | |
| torch_dtype="auto" | |
| ) | |
| # Define your agent tools | |
| from transformers.utils import get_json_schema | |
| def technical_support_agent(issue_type: str, priority: str) -> str: | |
| """ | |
| Routes technical issues to specialized support team. | |
| Args: | |
| issue_type: Type of technical issue (crash, authentication, performance, api_error, etc.) | |
| priority: Priority level (low, medium, high) | |
| """ | |
| return f"Routing to Technical Support: {issue_type} with {priority} priority" | |
| def billing_agent(request_type: str, urgency: str) -> str: | |
| """ | |
| Routes billing and payment queries. | |
| Args: | |
| request_type: Type of request (refund, invoice, upgrade, cancellation, etc.) | |
| urgency: How urgent (low, medium, high) | |
| """ | |
| return f"Routing to Billing: {request_type} with {urgency} urgency" | |
| def product_info_agent(query_type: str, category: str) -> str: | |
| """ | |
| Routes product information queries. | |
| Args: | |
| query_type: Type of query (features, comparison, integrations, limits, etc.) | |
| category: Category (plans, storage, mobile, security, etc.) | |
| """ | |
| return f"Routing to Product Info: {query_type} about {category}" | |
| # Get tool schemas | |
| AGENT_TOOLS = [ | |
| get_json_schema(technical_support_agent), | |
| get_json_schema(billing_agent), | |
| get_json_schema(product_info_agent) | |
| ] | |
| # System message | |
| SYSTEM_MSG = "You are an intelligent routing agent that directs customer queries to the appropriate specialized agent." | |
| # Function to route queries | |
| def route_query(user_query: str): | |
| """Route a user query to the appropriate agent""" | |
| messages = [ | |
| {"role": "developer", "content": SYSTEM_MSG}, | |
| {"role": "user", "content": user_query} | |
| ] | |
| # Format prompt | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| tools=AGENT_TOOLS, | |
| add_generation_prompt=True, | |
| return_dict=True, | |
| return_tensors="pt" | |
| ) | |
| # Generate | |
| outputs = model.generate( | |
| **inputs.to(model.device), | |
| max_new_tokens=128, | |
| pad_token_id=tokenizer.eos_token_id | |
| ) | |
| # Decode | |
| result = tokenizer.decode( | |
| outputs[0][len(inputs["input_ids"][0]):], | |
| skip_special_tokens=False | |
| ) | |
| return result | |
| # Example usage | |
| query = "My app crashes when I try to upload large files" | |
| result = route_query(query) | |
| print(f"Query: {query}") | |
| print(f"Routing: {result}") | |
| ``` | |
| ### Expected Output Format | |
| ``` | |
| <start_function_call>call:technical_support_agent{issue_type:crash,priority:high}<end_function_call> | |
| ``` | |
| ## ๐ก Usage Examples | |
| ### Example 1: Technical Issue | |
| ```python | |
| query = "I'm getting a 500 error when calling the API" | |
| result = route_query(query) | |
| # Output: technical_support_agent(issue_type="api_error", priority="high") | |
| ``` | |
| ### Example 2: Billing Request | |
| ```python | |
| query = "I need a refund for my annual subscription" | |
| result = route_query(query) | |
| # Output: billing_agent(request_type="refund", urgency="medium") | |
| ``` | |
| ### Example 3: Product Question | |
| ```python | |
| query = "What integrations do you support for project management?" | |
| result = route_query(query) | |
| # Output: product_info_agent(query_type="integrations", category="project_management") | |
| ``` | |
| ## ๐ง Advanced Usage: Parse Function Calls | |
| ```python | |
| def parse_function_call(output: str) -> dict: | |
| """Extract function name and arguments from model output""" | |
| pattern = r'<start_function_call>call:(\w+)\{([^}]+)\}<end_function_call>' | |
| match = re.search(pattern, output) | |
| if match: | |
| func_name = match.group(1) | |
| params_str = match.group(2) | |
| # Parse parameters | |
| params = {} | |
| param_pattern = r'(\w+):(?:<escape>(.*?)<escape>|([^,{}]+))' | |
| for p_match in re.finditer(param_pattern, params_str): | |
| key = p_match.group(1) | |
| val = p_match.group(2) or p_match.group(3).strip() | |
| params[key] = val | |
| return { | |
| "agent": func_name, | |
| "parameters": params | |
| } | |
| return {"agent": "unknown", "parameters": {}} | |
| # Use it | |
| query = "I was charged twice this month" | |
| result = route_query(query) | |
| parsed = parse_function_call(result) | |
| print(parsed) | |
| # Output: {'agent': 'billing_agent', 'parameters': {'request_type': 'dispute', 'urgency': 'high'}} | |
| ``` | |
| ## ๐๏ธ Integration Example | |
| ```python | |
| class MultiAgentRouter: | |
| def __init__(self, model_name: str): | |
| self.tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| self.model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| device_map="auto", | |
| torch_dtype="auto" | |
| ) | |
| self.system_msg = "You are an intelligent routing agent..." | |
| def route(self, query: str) -> dict: | |
| """Route query and return agent + parameters""" | |
| messages = [ | |
| {"role": "developer", "content": self.system_msg}, | |
| {"role": "user", "content": query} | |
| ] | |
| inputs = self.tokenizer.apply_chat_template( | |
| messages, | |
| tools=AGENT_TOOLS, | |
| add_generation_prompt=True, | |
| return_dict=True, | |
| return_tensors="pt" | |
| ) | |
| outputs = self.model.generate( | |
| **inputs.to(self.model.device), | |
| max_new_tokens=128, | |
| pad_token_id=self.tokenizer.eos_token_id | |
| ) | |
| result = self.tokenizer.decode( | |
| outputs[0][len(inputs["input_ids"][0]):], | |
| skip_special_tokens=False | |
| ) | |
| return parse_function_call(result) | |
| # Usage | |
| router = MultiAgentRouter("bhaiyahnsingh45/functiongemma-multiagent-router") | |
| routing = router.route("My payment failed but I don't know why") | |
| print(f"Route to: {routing['agent']}") | |
| print(f"Parameters: {routing['parameters']}") | |
| ``` | |
| ## ๐ Evaluation | |
| The model was evaluated on a held-out test set of 23 queries: | |
| - **Routing Accuracy**: 82.6% | |
| - **False Positive Rate**: 17.4% | |
| - **Average Inference Time**: ~50ms on T4 GPU | |
| ## โ ๏ธ Limitations | |
| 1. **Language**: Currently supports English only | |
| 2. **Domain**: Optimized for customer support; may need fine-tuning for other domains | |
| 3. **Agents**: Limited to 3 agent types (can be extended with additional training) | |
| 4. **Context**: Works best with single-turn queries; multi-turn conversations may need context handling | |
| 5. **Edge Cases**: Ambiguous queries may require fallback logic | |
| ## ๐ฎ Future Improvements | |
| - [ ] Add support for more languages | |
| - [ ] Expand to 5+ agent types (sales, feedback, onboarding) | |
| - [ ] Handle multi-turn conversations | |
| - [ ] Add confidence scores for routing decisions | |
| - [ ] Support for compound queries requiring multiple agents | |
| ## ๐ Citation | |
| ```bibtex | |
| @misc{functiongemma_multiagent_router, | |
| author = {Bhaiya Singh}, | |
| title = {Multi-Agent Router: Fine-tuned FunctionGemma for Customer Support}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/bhaiyahnsingh45/functiongemma-multiagent-router}} | |
| } | |
| ``` | |
| ## ๐ License | |
| This model inherits the [Gemma License](https://ai.google.dev/gemma/terms) from the base model. | |
| ## ๐ Acknowledgments | |
| - Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) | |
| - Training framework: [Hugging Face TRL](https://github.com/huggingface/trl) | |
| - Dataset: [bhaiyahnsingh45/multiagent-router-finetuning](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning) | |
| ## ๐ง Contact | |
| For questions, issues, or collaboration opportunities: | |
| - Open an issue on the [model repository](https://huggingface.co/bhaiyahnsingh45/functiongemma-multiagent-router) | |
| - Dataset issues: [dataset repository](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning) | |
| --- | |
| **Built with โค๏ธ using FunctionGemma and Hugging Face Transformers** | |