Zero Forgetting in LLM Fine-Tuning β 4 Benchmarks, All Domains Retained
We tested sequential fine-tuning on Mistral-7B across 4 independent benchmarks (5, 4, 5, and 8 domains). Standard LoRA forgets 38β49% of prior knowledge per domain. Our continual learning adapter: -0.17% drift.
The Salesforce 5-domain test showed positive backward transfer β the model got better at old domains as it learned new ones (retention BERTScore: 0.889 β 0.907).
No replay buffers. No EWC. No knowledge distillation. Spectral norm locked at 1.0. Naive LoRA crashed at gradient norm 263. Ours: under 6.