Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 4 days ago • 9