2026 arXiv preprint. Models fine-tuned on documents describing typical evaluation traits show safer behavior by having increased refusal rates and low
-
Models That Know How Evaluations Are Designed Score Safer
Paper • 2605.28591 • Published • 4 -
compass-group-tue/sdf_evaluation_traits
Viewer • Updated • 74.8k • 90 • 1 -
compass-group-tue/nemotron-traits
Text Generation • Updated • 36 -
compass-group-tue/Nemotron-Traits-Seed-44
Text Generation • Updated • 11