arxiv:2510.02263
Anikait Singh
Asap7772
AI & ML interests
Deep Learning, Reinforcement Learning, Robotics
Organizations
models 176
Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-7-beta-0.05-loss-sigmoid-rpo-1.0-ckpt-135
196k • Updated • 1
Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-7-beta-0.01-loss-sigmoid-rpo-1.0-ckpt-135
196k • Updated • 2
Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-7-beta-0.1-loss-sigmoid-rpo-1.0-ckpt-135
196k • Updated • 4
Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-6-beta-0.05-loss-sigmoid-rpo-1.0-ckpt-135
196k • Updated • 1
Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-6-beta-0.01-loss-sigmoid-rpo-1.0-ckpt-135
196k • Updated • 1
Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-6-beta-0.1-loss-sigmoid-rpo-1.0-ckpt-135
196k • Updated • 1
Asap7772/qwen3-4b-arc-second-stage-awr-sftinit-lr1e-6-0908
Text Generation • 4B • Updated • 3
Asap7772/qwen3-4b-arc-second-stage-star-sftinit-lr1e-6-0908
Text Generation • 4B • Updated • 5
Asap7772/qwen3-4b-arc-second-stage-star-sftinit-lr1e-5-0908
Text Generation • 4B • Updated • 2
Asap7772/qwen3-4b-arc-second-stage-awr-sftinit-lr1e-5-0908
Text Generation • 4B • Updated • 3
datasets 2,774
Asap7772/arc-barc-processed-direct-80k-solutions-all
Updated • 4
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step80
Updated • 5
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step120
Updated • 4
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step160
Updated • 3
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step100
Updated • 5
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step140
Updated • 4
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step40
Updated • 3
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step60
Updated • 3
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-valclassifiersft1e-5-step20
Updated • 3
Asap7772/arc-agi-impabs-dpolr1e-7-beta0.01-classifiersft5e-6
Preview • Updated • 105