Yifeng Liu's picture

2

Yifeng Liu

lyf07

AI & ML interests

None yet

Recent Activity

authored a paper 2 days ago

R-PRM: Reasoning-Driven Process Reward Modeling

authored a paper 2 days ago

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

updated a model 3 days ago

lyf07/Translategemma-4B-it-WALAR

View all activity

Organizations

None yet

authored 2 papers 2 days ago

R-PRM: Reasoning-Driven Process Reward Modeling

Paper • 2503.21295 • Published Mar 27, 2025

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Paper • 2603.13045 • Published 7 days ago • 1

updated 3 models 3 days ago

lyf07/Translategemma-4B-it-WALAR

769k • Updated 3 days ago • 43

lyf07/Qwen3-8B-WALAR

8B • Updated 3 days ago • 53

lyf07/LLaMAX3-8B-Alpaca-WALAR

8B • Updated 3 days ago • 42

upvoted a paper 3 days ago

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Paper • 2603.13045 • Published 7 days ago • 1

updated a collection 3 days ago

WALAR

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation • 4 items • Updated 3 days ago

updated a collection 7 days ago

WALAR

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation • 4 items • Updated 3 days ago

published 2 models 7 days ago

lyf07/Qwen3-8B-WALAR

8B • Updated 3 days ago • 53

lyf07/Translategemma-4B-it-WALAR

769k • Updated 3 days ago • 43

updated a collection 7 days ago

WALAR

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation • 4 items • Updated 3 days ago

published a model 7 days ago

lyf07/LLaMAX3-8B-Alpaca-WALAR

8B • Updated 3 days ago • 42

upvoted a paper 7 months ago

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85