Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning Paper • 2601.22297 • Published 14 days ago • 1
PhyCritic: Multimodal Critic Models for Physical AI Paper • 2602.11124 • Published about 24 hours ago • 36
LegendaryDawn/SDRL-baseline-Qwen3-8B-Base-DAPO-n8-bs256-long12-yarn2-step200 8B • Updated 1 day ago • 4
LegendaryDawn/SDRL-baseline-Qwen3-8B-Base-DAPO-n8-bs256-long12-yarn2-step200 8B • Updated 1 day ago • 4
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 8 days ago • 211
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 8 days ago • 211
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 8 days ago • 216
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 8 days ago • 216
LegendaryDawn/SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200 4B • Updated 16 days ago • 79
LegendaryDawn/SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200 4B • Updated 16 days ago • 79
LegendaryDawn/SDRL-freq-Qwen3-4B-Base-icml-self-debate-exp-majority_n8_l2048-DAPO_n8_bs256_long8-run2-step200 4B • Updated 18 days ago • 1.01k
LegendaryDawn/SDRL-freq-Qwen3-4B-Base-icml-self-debate-exp-majority_n8_l2048-DAPO_n8_bs256_long8-run2-step200 4B • Updated 18 days ago • 1.01k
LegendaryDawn/SDRL-rand-Qwen2.5-3B-icml-self-debate-ablation-random_n4_l2048-DAPO_n8_bs256_long8-step200 3B • Updated 18 days ago • 334
LegendaryDawn/SDRL-rand-Qwen2.5-3B-icml-self-debate-ablation-random_n4_l2048-DAPO_n8_bs256_long8-step200 3B • Updated 18 days ago • 334
LegendaryDawn/SDRL-freq-ablation-step125-Qwen3-4B-Base-icml-self-debate-majority_n8_l2048-DAPO_n8_bs256_long8 4B • Updated 20 days ago • 87
LegendaryDawn/SDRL-freq-ablation-step125-Qwen3-4B-Base-icml-self-debate-majority_n8_l2048-DAPO_n8_bs256_long8 4B • Updated 20 days ago • 87
LegendaryDawn/self-debate-exp-Qwen2.5-3B-majority_fix_n4_l2048-DAPO_n8_bs256_long8-step200 3B • Updated 25 days ago • 328
LegendaryDawn/self-debate-exp-Qwen2.5-3B-majority_fix_n4_l2048-DAPO_n8_bs256_long8-step200 3B • Updated 25 days ago • 328