MoeReward/combined_rlhf_dataset_grpo_imdb_main_2K
Viewer
• Updated • 2k • 9
MoeReward/combined_rlhf_dataset_grpo_metamath_main_2K
Viewer
• Updated • 2k • 11
MoeReward/combined_rlhf_dataset_grpo_arc_main_2K
Viewer
• Updated • 2k • 9
MoeReward/combined_rlhf_dataset_grpo_nq_main_2K
Viewer
• Updated • 2k • 9
MoeReward/combined_rlhf_dataset_grpo_equal_dist_2K
Viewer
• Updated • 2k • 9
MoeReward/combined_rlhf_dataset_grpo_imdb_main
Viewer
• Updated • 4k • 11
MoeReward/combined_rlhf_dataset_grpo_metamath_main
Viewer
• Updated • 4k • 11
MoeReward/combined_rlhf_dataset_grpo_arc_main
Viewer
• Updated • 4k • 7
MoeReward/combined_rlhf_dataset_grpo_nq_main
Viewer
• Updated • 4k • 9
MoeReward/combined_rlhf_dataset_grpo_equal_dist
Viewer
• Updated • 4k • 12
MoeReward/preference_dataset_stepmath_ood
Viewer
• Updated • 10.8k • 6
MoeReward/combined_preference_dataset_ood
MoeReward/combined_rlhf_dataset_alpaca
Viewer
• Updated • 52k • 15
MoeReward/combined_rlhf_dataset_math
Viewer
• Updated • 40k • 11
MoeReward/combined_rlhf_dataset_code
Viewer
• Updated • 20k • 8
MoeReward/combined_preference_dataset_ood_alpaca_heavy
Viewer
• Updated • 3k • 7
MoeReward/combined_preference_dataset_ood_coding_heavy
Viewer
• Updated • 3k • 9
MoeReward/combined_preference_dataset_ood_math_heavy
Viewer
• Updated • 3k • 7
MoeReward/combined_preference_dataset_ood_equal_dist
Viewer
• Updated • 3k • 6
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base
Viewer
• Updated • 47.4k • 6
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_alpaca_heavy
Viewer
• Updated • 10k • 6
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_coding_heavy
Viewer
• Updated • 10k • 11
• 1
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_math_heavy
Viewer
• Updated • 10k • 8
MoeReward/combined_preference_dataset_qwen2.5_1.5b_base_equal_dist
Viewer
• Updated • 10k • 5
MoeReward/combined_preference_dataset_qwen2.5_base
Viewer
• Updated • 57.6k • 5
MoeReward/combined_preference_dataset_qwen2.5_base_alpaca_heavy
Viewer
• Updated • 10k • 7
MoeReward/combined_preference_dataset_qwen2.5_base_coding_heavy
Viewer
• Updated • 10k • 5
• 1
MoeReward/combined_preference_dataset_qwen2.5_base_math_heavy
Viewer
• Updated • 10k • 6
MoeReward/combined_preference_dataset_qwen2.5_base_equal_dist
Viewer
• Updated • 10k • 5
MoeReward/combined_rlhf_dataset_balanced
Viewer
• Updated • 10k • 8