MansiJerry/Qwen3-8B-GRPO-lbs-ng-dfq_no_claim_bs_gpt_args_v2_th_4_6 Text Generation • Updated 5 days ago • 17
MansiJerry/Qwen3-8B-GRPO-lbs-ng-dfq_no_claim_bs_gpt_args_v2_all_target_modules_th_4_6 Text Generation • Updated 7 days ago • 9
MansiJerry/Qwen3-8B-GRPO-lbs_arg_rank_con_dfq_no_claim_bs_qwen_arg_all_target_modules_th_4_6 Text Generation • Updated 7 days ago • 9
MansiJerry/Qwen3-8B-GRPO-lbs_arg_rank_con_dfq_no_claim_bs_qwen_arg_th_4_6 Text Generation • Updated 7 days ago • 15
MansiJerry/Gemma4-4B-GRPO-learned-base-score_arg_rank_con_dfq_no_claim_bs_qwen_arg Text Generation • Updated 15 days ago • 18
MansiJerry/Qwen3-8B-GRPO-learned-base-score_arg_rank_con_dfq_no_claim_bs_qwen_arg_all_target_modules Text Generation • Updated 17 days ago • 17
MansiJerry/Qwen3-8B-GRPO-learned-base-score-ng-dfq_no_claim_bs_gpt_args_v2_all_target_modules Text Generation • Updated 17 days ago • 17
MansiJerry/Gemma4-4B-GRPO-learned-base-score-ng-dfq_no_claim_bs_gpt_args_v2 Text Generation • Updated 17 days ago • 19
MansiJerry/Qwen3-8B-GRPO-learned-base-score-ng-dfq_no_claim_bs_gpt_args_v2 Text Generation • Updated May 4 • 6
MansiJerry/Qwen3-8B-GRPO-learned-base-score_arg_rank_con_dfq_no_claim_bs_qwen_arg Text Generation • Updated May 4 • 6