LLM Complexity Router

A fine-tuned DeBERTa-v3-small classifier that routes queries between gpt-4o-mini (cheap) and gpt-4o (expensive) — saving ~41% cost while improving response quality vs always using the expensive model.

Performance (WildBench — 200 real user queries)

Strategy	Quality (1-10)	Cost/1K	% Cheap	Quality Δ	Cost Saved
always_expensive	8.11	$6.00	0%	baseline	baseline
length_based	8.02	$3.35	47%	-0.09	+44.2%
deberta_router	8.24	$3.55	43.5%	+0.13	+40.9%
routellm_mf	7.96	$3.60	42.5%	-0.15	+39.9%

Only router that beats the expensive baseline on quality and saves cost.

Category Breakdown (vs always_expensive)

Category	Router	Baseline	Δ
Advice seeking	9.50	9.00	+0.50
Brainstorming	8.40	8.20	+0.20
Coding & Debugging	7.73	7.89	-0.16
Creative Writing	8.00	7.74	+0.26
Data Analysis	9.20	9.00	+0.20
Editing	8.40	8.40	+0.00
Information seeking	8.11	7.83	+0.28
Math	8.33	7.83	+0.50
Planning	8.59	8.77	-0.18
Reasoning	8.82	8.61	+0.21
Role playing	7.00	6.71	+0.29

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="your-username/complexity-router"
)

result = classifier("What is the capital of France?")
# → [{'label': 'SIMPLE', 'score': 0.98}]  → route to gpt-4o-mini

result = classifier("Prove the Riemann hypothesis step by step")
# → [{'label': 'COMPLEX', 'score': 0.95}]  → route to gpt-4o

Training

Base model: microsoft/deberta-v3-small
Training data: proprietary (not released)
Labels: SIMPLE / COMPLEX
Benchmarked against: RouteLLM mf router, length-based baseline

Limitations

Weaker on Coding & Debugging (-0.16) and Planning (-0.18)
Optimized for gpt-4o vs gpt-4o-mini routing specifically
Training data distribution may not match all use cases

Downloads last month: 33

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support