Leo PRO
leideng
AI & ML interests
Efficient AI, Sparse Attention
Recent Activity
liked a model about 7 hours ago
google/umt5-xxl liked a model 1 day ago
facebook/cwm updated a collection 2 days ago
SFTOrganizations
None yet
RL
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
Tokenization
SFT
-
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 190 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 27 -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Paper • 2408.16673 • Published
DiT
RL
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
Efficient AI
Tokenization
Pretrain
SFT
-
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 190 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 27 -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Paper • 2408.16673 • Published
reasoning