Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 7 days ago • 10