Yale-ROSE/Qwen3-4B-SAT-VarSelector-Sym-Aug-GRPO-5x Reinforcement Learning • Updated about 23 hours ago
Yale-ROSE/Qwen3-4B-dimacs_cube-sft_gpt-oss-120b-dpo_gpt-oss-120b_reasoning-v2 4B • Updated 7 days ago • 70