Whisper Small Fine-tuned on Nepali (OpenSLR 54)

This model is a fine-tuned version of openai/whisper-small on the OpenSLR 54 (Nepali Speech Corpus) dataset. It achieves state-of-the-art results for an open-source small model on this benchmark, trained on a massive 154-hour dataset.

Model Details

Model Description

Model architecture: Whisper Small (244M Parameters)
Language: Nepali (ne)
Task: Automatic Speech Recognition (Transcription)
Dataset: OpenSLR 54 (~157,000 utterances)
Fine-tuning Hardware: NVIDIA A100 80GB

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="fnawaraj/whisper-small-nepali-openslr")

# Transcribe an audio file
transcription = transcriber("path_to_nepali_audio.mp3")

print(transcription["text"])

Training Data

The model was trained on the OpenSLR 54 (Nepali Speech Corpus).

Total Audio Duration: ~154 Hours

Total Utterances: 157,905

Sampling Rate: 16kHz

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 1e-05

Train Batch Size: 8

Eval Batch Size: 8

Gradient Accumulation Steps: 4 (Effective Batch Size: 32)

Optimizer: AdamW

LR Scheduler: Linear decay with warmup (500 steps)

Training Steps: 10,000

Mixed Precision: FP16

Evaluation Results

The model was evaluated on the unseen test split of the OpenSLR dataset (1,580 samples). Metric Score Word Error Rate (WER) 26.69% Validation Loss 0.210

Limitations

The model performs best on read speech (high quality).

It may struggle with extremely fast conversational speech or heavy background noise compared to models trained on diverse noise datasets.

Some phonetic spelling variations (e.g., short vs long vowels) may occur as they sound identical in spoken Nepali.

Downloads last month: 172

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Dragneel/whisper-small-nepali

Base model

openai/whisper-small

Finetuned

(3074)

this model

Evaluation results

Wer on OpenSLR 54 (Nepali Speech Corpus)
self-reported

26.690

View on Papers With Code