Xiaoyang Cao's picture

5

Xiaoyang Cao

Sean13

·

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

updated a model 17 days ago

Sean13/repo-best-llama-re-dpo

published a model 17 days ago

Sean13/repo-best-llama-re-dpo

updated a model 17 days ago

Sean13/repo-best-llama-dpo

View all activity

Organizations

None yet

models 66

Sean13/repo-best-llama-re-dpo

Updated 17 days ago

Sean13/repo-best-llama-dpo

Updated 17 days ago

Sean13/repo-best-mistral-dpo

Updated 17 days ago

Sean13/repo-best-mistral-re-dpo

Updated 17 days ago

Sean13/repo-best-model

Updated 17 days ago

Sean13/llama-8b-instruct-v0.2-cpo-full-label_smoothing-0.1

Text Generation • 266k • Updated Nov 21, 2025

Sean13/mistral-7b-instruct-v0.2-cpo-full-label_smoothing-0.1

Text Generation • 266k • Updated Nov 21, 2025 • 4

Sean13/llama-8b-instruct-simpo-full-label_smoothing-0.1

Text Generation • 266k • Updated Nov 21, 2025 • 3

Sean13/mistral-7b-instruct-v0.2-simpo-full-label_smoothing-0.1

Text Generation • 266k • Updated Nov 21, 2025

Sean13/llama-8b-instruct-rdpo-full-multipref-0.80

Text Generation • 266k • Updated Nov 20, 2025 • 1

datasets 0

None public yet