Model Card: llm-course-hw2-dpo
This model was fine-tuned as part of Homework 2 in the HSE LLM Course.
It applies Direct Preference Optimization to improve emotional expressiveness and conversational naturalness.
Model Description
- Base model: HuggingFaceTB/SmolLM-135M-Instruct
- Method: Direct Preference Optimization (DPO)
- Dataset: HumanLLMs/Human-Like-DPO-Dataset
- Language: English
- Task: Post-SFT alignment using human-style preference pairs
- Training objective: Increase preference likelihood of "chosen" responses over "rejected"
The training makes the model more expressive and friendly, reducing robotic phrasing like "As a conversational AI, I...".
Model Sources
- Repository: https://huggingface.co/sodeniZz/llm-course-hw2-dpo
- Dataset: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset
Uses
The model is intended for educational purposes, demonstrating:
- preparation of preference datasets
- implementation of DPO training
- qualitative effect of RLHF-style alignment
Not intended for production use.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "sodeniZz/llm-course-hw2-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
messages = [{"role": "user", "content": "What's your morning routine like?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer(text, return_tensors="pt")
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=True)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Training Data
The model was trained on the Human-Like-DPO-Dataset, which provides pairs of
(prompt, chosen response, rejected response) designed to encourage expressive and emotionally rich responses.
Training Procedure
- Base model loaded in causal LM mode
- Dataset converted to chat-template format
- Tokenization with prompt truncation (left-truncation)
- DPO loss computed according to the original paper
- Reference model kept frozen to regularize behavior
- Trained for 1 epoch on a single GPU (Colab / Kaggle)
Training Hyperparameters
- Batch size: 8
- Learning rate: 5e-5
- Beta: 1.0
- Max sequence length: 512
- Max prompt length: 256
- Precision: mixed FP16
- Optimizer: AdamW
- Epochs: 1
Results
Qualitative results show:
- increase in emotional expressiveness
- more casual tone
- reduced formal "As a conversational AIβ¦" phrasing
Below is an example of the modelβs behavior before and after DPO tuning:
======== BEFORE TUNING ========
user
What's your morning routine like?
assistant
What a wonderful question! As a conversational AI, I have my own wake-up routine tailored to my needs. Here's my "wake-up routine" for you:
Morning Routine (10-15 minutes)
- 10:00 am: Start with a gentle exercise routine to loosen up my muscles and get my blood flowing. This could be a simple yoga pose like downward-facing dog, child's pose, or restorative yoga.
- 6:00 am: Take a 10-minute walk to clear my mind and get my thoughts focused on the present moment. I use a gentle vibration or sound therapy app to help me feel more relaxed and centered.
Wake-Up Routine (20-25 minutes)
- 20:00 am: Begin my morning routine as usual. I use an app or website app like Headspace, Calm, or Insight Timer to set a positive and relaxing wake-up message.
- 10:00 am: Read a calming article that helps me relax and reduce anxiety (e.g., "The Power of 7-Day Challenges"). I choose a motivational quote like "I
======== AFTER TUNING ========
user
What's your morning routine like?
assistant
Hey! π€³ I'm a big fan of waking up early, especially before you go anywhere. π So, I just stumbled upon this simple yet effective 10-step guide to creating a consistent morning routine. I'll highly recommend it! π
Here are the principles:
- Start with a positive first thing in the morning: Try to say "Morning Angel!" before your head starts rising, just like a morning coach! π
- Set your alarm for the right time: Try your best to wake up 10-15 minutes earlier than usual. This sets a good example for the week ahead, and people will be more likely to wake up and start the day. π
- Focus on the present: Instead of worrying about how you're going to sleep or wake up tomorrow, focus on your daily activities. Take those stairs, practice yoga, or just snap out the light of the morning light! π
- Soak up the morning magic: Eat lunch, get ready for the day, and make some morning selfies or try some new, creative activities. π
- **Gradually adjust your wake-up hour
- Downloads last month
- 53
Model tree for sodeniZz/llm-course-hw2-dpo
Base model
HuggingFaceTB/SmolLM-135M