Running CorrSteer: Correlation-Based Steering of Language Models via Sparse Autoencoders ๐งญ Steer language model output by clicking visual layers
Running Control Reinforcement Learning ๐ Explore token-level LLM steering with feature visualizations