Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 3 days ago • 9