Qwen2.5-7B English-Kannada Translation Model

A fine-tuned translation model based on Qwen2.5-7B-Instruct, specialized for translating between English and Kannada (ಕನ್ನಡ).

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct trained on English-Kannada translation pairs. Kannada is a Dravidian language spoken primarily in the Indian state of Karnataka.

Training was done across 4xNVIDIA A100-SXM4-40GB GPUs for 6h 48m 10s using 64,603,656 tokens. Training frameworks used were transformers, peft, trl and accelerate for distributed setup.

How to Use

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RakshithFury/Qwen2.5-7b-en-kn-translate")
model = AutoModelForCausalLM.from_pretrained("RakshithFury/Qwen2.5-7b-en-kn-translate")

model = model.to("cuda:0")

sentences = [sentence1, sentence2,....]

for sentence in sentences:
    messages = [
        {"role": "user", "content": "Translate the following English sentence to Kannada:" + sentence},
    ]

    inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_dict=True,return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=200,temperature=0.5,min_p=0.1)
    res = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:],skip_special_tokens=True)

    print(res)

Training Details

Training Data

Trained on 500000 samples of english-kannada translation pairs.

Dataset: https://www.kaggle.com/datasets/parvmodi/english-to-kannada-machine-translation-dataset
Size: 500000 samples comprising of 64,603,656 tokens randomly from the above corpus

Training Procedure

Framework - transformers, trl
Distributed training - Yes, using DDP through accelerate
LoRA - Yes

Training Hyperparameters

Batch size: per_device_batch_size=4, gradient_accumulation=1,num_gpus=4 i.e effective batch size of 16
Epochs: 1
Optimizer: AdamW
Learning rate: 2e-4
LoRA rank: 8
LoRA alpha: 16

The train loss is 0.5036 and the token accuracy is 87%.

Eval Data

Dataset: https://www.kaggle.com/datasets/parvmodi/english-to-kannada-machine-translation-dataset
Size: 100000 samples comprising of 62,014,459 tokens sampled randomly from the above corpus, while ensuring none of the examples are present in training data.

It is clear that the eval loss hasn't saturated. There is still scope for it's decrease.

Hardware

GPU: 4x NVIDIA A100-SXM4
CPU count 128
Logical CPU count 256

Example Translations

Example 1

English sentence:
What is the meaning of life?

Default model:
生命周期 ನೀಡಲು ಎಂದು ತೆರೆಯಿರಿ?

Finetuned model:
ಜೀವನದ ಅರ್ಥ ಏನು?

Example 2

English sentence:
My biggest problem is deciding what I should wear.

Default model:
ನ ಹೊಸ ಸಮಸ್ಯೆಯು ನನ್ನ ವೈರಾಗ್ಯವನ್ನು ತಿಳಿದೇಕ್ಕಾಗಿ ಎಂಬ ವೈಸೀನಿಯನ್ನು ಒಡ್ಡುವುದು.

Finetuned model:
ನಾನು ಏನು ಧರಿಸಬೇಕೆಂದು ನಿರ್ಧರಿಸುವುದು ನನಗೆ ಅತಿ ದೊಡ್ಡ ಸಮಸ್ಯೆ.

Example 3

English sentence:
It was probably the first thing I remembered from my early childhood.

Default model:
ಯಾವುದೇ ಈಗ ಹಲವಾರು ವರ್ಷಗಳ ಕ್ಕೆ ಪ್ರೊಜೆಕ್ಟ್‌ನಲ್ಲಿ ಮನೆಯಲ್ಲಿ ಬರುತ್ತಿರುವ ಮುಖ್ಯ ಚಿತ್ರಗಳು ನಂತರ ಮುಂದೆ ಹೆಚ್ಚು ವರ್ಷಗಳ ಕ್ಕೆ ಸೆಟ್‌ಪಡಿಸಲಾಗಿದೆ.

Finetuned model:
ಬೆಳೆದ ಮೊದಲ ವರ್ಷದಲ್ಲಿ ನನಗೆ ಸಂಭವನೀಯವಾಗಿ ಮರೆಯಲಾಗದ ಒಂದು ಘಟನೆ.

Example 4

English sentence:
Captain America is my favorite Avenger

Default model:
ಕप्टन ಅಮೆರಿಕಾ ಎಂದರೆ ನನ್ನ ಪ್ರಯತ್ನಿತ ಏವ್‌ನೇಂಟಿನ ಸೊನ್ನೋತ್ತಮ ವಿಷಯವಾಗಿದೆ.

Finetuned model:
ನಾನು ನಿರ್ದೇಶಕ ಅವರ ಪ್ರಿಯ ಸ್ಟಾರ್ ಆಗಿದ್ದೇನೆ ಕ್ಯಾಪ್ಟನ್ ಅಮೆರಿಕಾ.

CO2 Emission Related to Experiments

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432kgCO2/kWh.

Total emissions are estimated to be 0.76 kgCO2 Which is equivalent to

3.07km driven by an average ICE car
0.38 Kgs of coal burned
0.01 Tree seedlings sequesting carbon for 10 years

Limitations and Bias

Known Limitations

The model may struggle with:
- Complex sentences with strange words (eg avenger)
- Very long sentences

Citation

If you use this model in your research, please cite:

@misc{RakshithFury/Qwen2.5-7b-en-kn-translate,
  author = {Rakshith Rao},
  title = {Qwen2.5-7B English-Kannada Translation Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/model-name}
}

Base Model Citation

@article{qwen2.5,
  title={Qwen2.5: A Party of Foundation Models},
  author={Qwen Team},
  year={2024},
  journal={arXiv preprint arXiv:2412.xxxxx}
}

Contact

For questions or feedback:

Email: rakshithdrao@gmail.com
LinkedIn: https://www.linkedin.com/in/12rakshith-rao/

Downloads last month: 59

Safetensors

Model size

8B params

Tensor type

F32

Model tree for RakshithFury/Qwen2.5-7b-en-kn-translate

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2210)

this model