--- license: apache-2.0 base_model: Qwen/Qwen3-0.6B-Base tags: - merge - sft - dpo - qwen3 - math - code - mcqa - mnlp-m3 datasets: - albertfares/MNLP_M3_dpo_dataset language: - en pipeline_tag: text-generation --- # MNLP M3 Merged Model (SFT + DPO) This model combines the best of both worlds: - **SFT Component**: `mgatti/MNLP_M3_mcqa_model` - Multiple-choice QA capabilities - **DPO Component**: `albertfares/MNLP_M3_dpo_model` - Preference-aligned responses ## Model Details - **Base Model**: Qwen/Qwen3-0.6B-Base - **SFT Model**: Multiple-choice QA fine-tuned model - **DPO Model**: Direct preference optimized model - **Merge Strategy**: Advanced model weight merging - **Combined Capabilities**: MCQA + preference alignment ## Capabilities ✅ **Multiple-Choice Question Answering** (from SFT component) ✅ **Preference-Aligned Generation** (from DPO component) ✅ **Math and Code Generation** (from MNLP M3 training) ✅ **Reasoning Tasks** (combined strengths) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo") tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo") # For MCQA prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # For general generation prompt = "Explain the concept of recursion in programming" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=300, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Data - **SFT**: Multiple-choice QA dataset - **DPO**: MNLP M3 preference dataset with math, code, and reasoning This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.