GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification Paper โข 2604.14258 โข Published 8 days ago โข 22