daman1209arora/MaxRL-Qwen3-1.7B-Base-IDK-math12k-32-brier-rloo-step2000 2B • Updated 19 days ago • 115
daman1209arora/alpha_0.2_DeepSeek-R1-Distill-Qwen-7B Text Generation • 8B • Updated Apr 13, 2025 • 161
daman1209arora/alpha_0.05_DeepSeek-R1-Distill-Qwen-7B Text Generation • 8B • Updated Apr 13, 2025 • 3
daman1209arora/alpha_0.4_DeepSeek-R1-Distill-Qwen-1.5B Text Generation • 2B • Updated Apr 13, 2025 • 5 •