Ivan Medvedev
med1v
ยท
AI & ML interests
None yet
Recent Activity
upvoted a paper about 19 hours ago
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training upvoted a paper about 19 hours ago
Does Your Reasoning Model Implicitly Know When to Stop Thinking? liked
a Space about 23 hours ago
lm-provers/qed-nano-blogpost Organizations
None yet