RyanYr/pg-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_piref_nokl_matheval Updated 7 days ago • 102
RyanYr/pg-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_piref_nokl_matheval Updated 8 days ago • 102
RyanYr/pg-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_piref_kl_behavior_matheval Updated 8 days ago • 125
RyanYr/pg_trajis-dapo_shuffled-offline-grpo_qwen2.5-math-1.5B_piref_matheval Updated 8 days ago • 102
RyanYr/pg-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_piref_kl_behavior_matheval Updated 8 days ago • 315
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_piref_nokl_matheval Viewer • Updated 8 days ago • 1.55k • 12
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_piref_kl_matheval Viewer • Updated 8 days ago • 1.55k • 12
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_piref_kl_behavior_matheval Viewer • Updated 8 days ago • 1.55k • 11
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_kl_behavior_matheval Updated 9 days ago • 415
RyanYr/pg-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_kl_behavior_matheval Updated 9 days ago • 537
RyanYr/pg-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_kl_behavior_matheval Updated 9 days ago • 328
RyanYr/pg-dapo_shuffled-01_offline-pg-dapo-qwen3-4B-Base-mbs128-n4_kl_matheval Viewer • Updated 10 days ago • 18.6k • 418