Qwen2.5-7B English-Kannada Translation Model

A fine-tuned translation model based on Qwen2.5-7B-Instruct, specialized for translating between English and Kannada (ಕನ್ನಡ).

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct trained on English-Kannada translation pairs. Kannada is a Dravidian language spoken primarily in the Indian state of Karnataka.

Training was done across 4xNVIDIA A100-SXM4-40GB GPUs for 6h 48m 10s using 64,603,656 tokens. Training frameworks used were transformers, peft, trl and accelerate for distributed setup.

How to Use

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RakshithFury/Qwen2.5-7b-en-kn-translate")
model = AutoModelForCausalLM.from_pretrained("RakshithFury/Qwen2.5-7b-en-kn-translate")

model = model.to("cuda:0")

sentences = [sentence1, sentence2,....]

for sentence in sentences:
    messages = [
        {"role": "user", "content": "Translate the following English sentence to Kannada:" + sentence},
    ]

    inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_dict=True,return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=200,temperature=0.5,min_p=0.1)
    res = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:],skip_special_tokens=True)

    print(res)

Training Details

Training Data

Trained on 500000 samples of english-kannada translation pairs.

Training Procedure

  • Framework - transformers, trl
  • Distributed training - Yes, using DDP through accelerate
  • LoRA - Yes

Training Hyperparameters

  • Batch size: per_device_batch_size=4, gradient_accumulation=1,num_gpus=4 i.e effective batch size of 16
  • Epochs: 1
  • Optimizer: AdamW
  • Learning rate: 2e-4
  • LoRA rank: 8
  • LoRA alpha: 16

Train curves

The train loss is 0.5036 and the token accuracy is 87%.

Eval Data

Eval curves

It is clear that the eval loss hasn't saturated. There is still scope for it's decrease.

Hardware

  • GPU: 4x NVIDIA A100-SXM4
  • CPU count 128
  • Logical CPU count 256

Example Translations

Example 1

English sentence:
What is the meaning of life?

Default model:
生命周期 ನೀಡಲು ಎಂದು ತೆರೆಯಿರಿ?

Finetuned model:
ಜೀವನದ ಅರ್ಥ ಏನು?


Example 2

English sentence:
My biggest problem is deciding what I should wear.

Default model:
ನ ಹೊಸ ಸಮಸ್ಯೆಯು ನನ್ನ ವೈರಾಗ್ಯವನ್ನು ತಿಳಿದೇಕ್ಕಾಗಿ ಎಂಬ ವೈಸೀನಿಯನ್ನು ಒಡ್ಡುವುದು.

Finetuned model:
ನಾನು ಏನು ಧರಿಸಬೇಕೆಂದು ನಿರ್ಧರಿಸುವುದು ನನಗೆ ಅತಿ ದೊಡ್ಡ ಸಮಸ್ಯೆ.


Example 3

English sentence:
It was probably the first thing I remembered from my early childhood.

Default model:
ಯಾವುದೇ ಈಗ ಹಲವಾರು ವರ್ಷಗಳ ಕ್ಕೆ ಪ್ರೊಜೆಕ್ಟ್‌ನಲ್ಲಿ ಮನೆಯಲ್ಲಿ ಬರುತ್ತಿರುವ ಮುಖ್ಯ ಚಿತ್ರಗಳು ನಂತರ ಮುಂದೆ ಹೆಚ್ಚು ವರ್ಷಗಳ ಕ್ಕೆ ಸೆಟ್‌ಪಡಿಸಲಾಗಿದೆ.

Finetuned model:
ಬೆಳೆದ ಮೊದಲ ವರ್ಷದಲ್ಲಿ ನನಗೆ ಸಂಭವನೀಯವಾಗಿ ಮರೆಯಲಾಗದ ಒಂದು ಘಟನೆ.


Example 4

English sentence:
Captain America is my favorite Avenger

Default model:
ಕप्टन ಅಮೆರಿಕಾ ಎಂದರೆ ನನ್ನ ಪ್ರಯತ್ನಿತ ಏವ್‌ನೇಂಟಿನ ಸೊನ್ನೋತ್ತಮ ವಿಷಯವಾಗಿದೆ.

Finetuned model:
ನಾನು ನಿರ್ದೇಶಕ ಅವರ ಪ್ರಿಯ ಸ್ಟಾರ್ ಆಗಿದ್ದೇನೆ ಕ್ಯಾಪ್ಟನ್ ಅಮೆರಿಕಾ.


CO2 Emission Related to Experiments

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432kgCO2/kWh.

Total emissions are estimated to be 0.76 kgCO2 Which is equivalent to

  • 3.07km driven by an average ICE car
  • 0.38 Kgs of coal burned
  • 0.01 Tree seedlings sequesting carbon for 10 years

Limitations and Bias

Known Limitations

  • The model may struggle with:
    • Complex sentences with strange words (eg avenger)
    • Very long sentences

Citation

If you use this model in your research, please cite:

@misc{RakshithFury/Qwen2.5-7b-en-kn-translate,
  author = {Rakshith Rao},
  title = {Qwen2.5-7B English-Kannada Translation Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/model-name}
}

Base Model Citation

@article{qwen2.5,
  title={Qwen2.5: A Party of Foundation Models},
  author={Qwen Team},
  year={2024},
  journal={arXiv preprint arXiv:2412.xxxxx}
}

Contact

For questions or feedback:

Downloads last month
59
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RakshithFury/Qwen2.5-7b-en-kn-translate

Base model

Qwen/Qwen2.5-7B
Finetuned
(2210)
this model