rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8, 2025 • 290
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published Feb 10, 2025 • 17
Nbeau/evaluation_addition_18digits_1000examples_n_plus_n.jsonl Viewer • Updated Oct 11, 2024 • 1k • 3
Nbeau/evaluation_addition_18digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 11, 2024 • 1k • 4
Nbeau/evaluation_addition_17digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 9, 2024 • 1k • 4
Nbeau/evaluation_addition_16digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 8, 2024 • 1k • 10
Nbeau/evaluation_addition_14digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 8, 2024 • 1k • 6
Nbeau/evaluation_addition_15digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 7, 2024 • 1k • 4
Nbeau/evaluation_addition_13digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 7, 2024 • 1k • 4