Whisper Small Fine-tuned on Nepali (OpenSLR 54)
This model is a fine-tuned version of openai/whisper-small on the OpenSLR 54 (Nepali Speech Corpus) dataset. It achieves state-of-the-art results for an open-source small model on this benchmark, trained on a massive 154-hour dataset.
Model Details
Model Description
- Model architecture: Whisper Small (244M Parameters)
- Language: Nepali (ne)
- Task: Automatic Speech Recognition (Transcription)
- Dataset: OpenSLR 54 (~157,000 utterances)
- Fine-tuning Hardware: NVIDIA A100 80GB
Usage
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="fnawaraj/whisper-small-nepali-openslr")
# Transcribe an audio file
transcription = transcriber("path_to_nepali_audio.mp3")
print(transcription["text"])
Training Data
The model was trained on the OpenSLR 54 (Nepali Speech Corpus).
Total Audio Duration: ~154 Hours
Total Utterances: 157,905
Sampling Rate: 16kHz
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
Learning Rate: 1e-05
Train Batch Size: 8
Eval Batch Size: 8
Gradient Accumulation Steps: 4 (Effective Batch Size: 32)
Optimizer: AdamW
LR Scheduler: Linear decay with warmup (500 steps)
Training Steps: 10,000
Mixed Precision: FP16
Evaluation Results
The model was evaluated on the unseen test split of the OpenSLR dataset (1,580 samples). Metric Score Word Error Rate (WER) 26.69% Validation Loss 0.210
Limitations
The model performs best on read speech (high quality).
It may struggle with extremely fast conversational speech or heavy background noise compared to models trained on diverse noise datasets.
Some phonetic spelling variations (e.g., short vs long vowels) may occur as they sound identical in spoken Nepali.
- Downloads last month
- 172
Model tree for Dragneel/whisper-small-nepali
Base model
openai/whisper-smallEvaluation results
- Wer on OpenSLR 54 (Nepali Speech Corpus)self-reported26.690