albertfares/MNLP_M3_dpo_dataset
Viewer • Updated • 234k • 62
This model combines the best of both worlds:
mgatti/MNLP_M3_mcqa_model - Multiple-choice QA capabilitiesalbertfares/MNLP_M3_dpo_model - Preference-aligned responses✅ Multiple-Choice Question Answering (from SFT component)
✅ Preference-Aligned Generation (from DPO component)
✅ Math and Code Generation (from MNLP M3 training)
✅ Reasoning Tasks (combined strengths)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo")
tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo")
# For MCQA
prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# For general generation
prompt = "Explain the concept of recursion in programming"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.