Spaces:

TrustSafeAI
/

RADAR-AI-Text-Detector

Running

Open source LLM evaluation including hallucination rate for AI safety teams

by vigneshwar234 - opened Jun 8

Jun 8

Hi TrustSafeAI team!

AI text detection is important safety work. For teams evaluating LLMs for trustworthiness, I built an open source framework that measures hallucination and confidence calibration alongside task accuracy.

LLM Evaluation Framework:

Hallucination Rate — detects overconfident wrong outputs (the most harmful AI behavior pattern)
Accuracy — task accuracy with ground truth comparison
Reasoning Quality — does the model show its reasoning or just assert answers?
Cost per 1K tokens — trustworthy AI at scale requires cost sustainability
Latency p95 — for real-time safety detection pipelines

The combination of hallucination rate + reasoning quality gives a picture of model trustworthiness beyond just accuracy.

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Open source, free forever. Happy to discuss AI safety evaluation approaches!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment