PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch Paper • 2510.06670 • Published Oct 8, 2025 • 1
Aligning Large Language Models via Fully Self-Synthetic Data Paper • 2510.06652 • Published Oct 8, 2025 • 1
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero Paper • 2605.15464 • Published 13 days ago