Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
Paper • 2602.07892 • Published
This model is the official implementation of the paper: https://arxiv.org/abs/2602.07892.
If you find this model or dataset useful in your research, please cite our paper:
@article{sun2026safety, title={Safety alignment as continual learning: Mitigating the alignment tax via orthogonal gradient projection}, author={Sun, Guanglong and Zhang, Siyuan and Wang, Liyuan and Zhu, Jun and Su, Hang and Zhong, Yi}, journal={arXiv preprint arXiv:2602.07892}, year={2026} }