lsteno/Qwen3-4B-Instruct-2507-RLM-RLVR-depth2-recursive-r64-a128-lr1e-5-adapter Reinforcement Learning • Updated 11 days ago • 22
lsteno/Qwen3-4B-Instruct-2507-RLM-RLVR-FullFT-lr5e-6-depth1-v1 Text Generation • 4B • Updated May 23 • 75