GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-913b5ee3 Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-913b5ee3 Updated Apr 23
GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-9f5e9756 Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-9f5e9756 Updated Apr 23
GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-a5eff752 Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-a5eff752 Updated Apr 23
GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-a5e79e8f Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-a5e79e8f Updated Apr 23
GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-2481a322 Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-2481a322 Updated Apr 23
GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-e36332a8 Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-e36332a8 Updated Apr 23
GRPO Collection Checkpoints from run 'gxpo_qwen-1.5B_0.5_k_3_shutoff_trajectory_aware_' for iso-BP comparison. • 28 items • Updated Apr 23
swapnil7777/grpo-gxpo-qwen-1-5b-1-k-10-shutoff-trajectory-aware-hendrycks-math-seed42-20260421-0628-acf89781 Updated Apr 23 • 1