LLM Complexity Router

A fine-tuned DeBERTa-v3-small classifier that routes queries between gpt-4o-mini (cheap) and gpt-4o (expensive) β€” saving ~41% cost while improving response quality vs always using the expensive model.

Performance (WildBench β€” 200 real user queries)

Strategy Quality (1-10) Cost/1K % Cheap Quality Ξ” Cost Saved
always_expensive 8.11 $6.00 0% baseline baseline
length_based 8.02 $3.35 47% -0.09 +44.2%
deberta_router 8.24 $3.55 43.5% +0.13 +40.9%
routellm_mf 7.96 $3.60 42.5% -0.15 +39.9%

Only router that beats the expensive baseline on quality and saves cost.

Category Breakdown (vs always_expensive)

Category Router Baseline Ξ”
Advice seeking 9.50 9.00 +0.50
Brainstorming 8.40 8.20 +0.20
Coding & Debugging 7.73 7.89 -0.16
Creative Writing 8.00 7.74 +0.26
Data Analysis 9.20 9.00 +0.20
Editing 8.40 8.40 +0.00
Information seeking 8.11 7.83 +0.28
Math 8.33 7.83 +0.50
Planning 8.59 8.77 -0.18
Reasoning 8.82 8.61 +0.21
Role playing 7.00 6.71 +0.29

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="your-username/complexity-router"
)

result = classifier("What is the capital of France?")
# β†’ [{'label': 'SIMPLE', 'score': 0.98}]  β†’ route to gpt-4o-mini

result = classifier("Prove the Riemann hypothesis step by step")
# β†’ [{'label': 'COMPLEX', 'score': 0.95}]  β†’ route to gpt-4o

Training

  • Base model: microsoft/deberta-v3-small
  • Training data: proprietary (not released)
  • Labels: SIMPLE / COMPLEX
  • Benchmarked against: RouteLLM mf router, length-based baseline

Limitations

  • Weaker on Coding & Debugging (-0.16) and Planning (-0.18)
  • Optimized for gpt-4o vs gpt-4o-mini routing specifically
  • Training data distribution may not match all use cases
Downloads last month
33
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support