vanshnawander/whisper-small-hindi-asr

This is a fine-tuned version of openai/whisper-small for Hindi automatic speech recognition (ASR).

Model Description

  • Base Model: openai/whisper-small
  • Language: Hindi (hi)
  • Task: Automatic Speech Recognition (transcribe)
  • Training Data: ai4bharat/Kathbath
  • Fine-tuning Framework: Transformers + Custom DALI Pipeline

Evaluation Results

Evaluated on the LAHAJA benchmark - a multi-accent Hindi ASR benchmark with 12.5 hours of audio from 132 speakers across 83 districts of India.

Model WER CER Improvement
Base (whisper-small) 145.67% 101.57% -
This Model 36.17% 11.36% 75.2%

Usage

Basic Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# Load model and processor
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-small-hindi-asr")
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-small-hindi-asr")

# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)

# Transcribe
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
generated_ids = model.generate(input_features, language="hi", task="transcribe")
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(transcription)

Using Pipeline

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="vanshnawander/whisper-small-hindi-asr",
    chunk_length_s=30,
)

result = pipe("audio.wav", generate_kwargs={"language": "hi", "task": "transcribe"})
print(result["text"])

Limitations

  • Optimized for Hindi speech; may not perform well on other languages
  • Best performance on clear audio with minimal background noise
  • May struggle with very fast speech or heavy code-mixing

Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vanshnawander/whisper-small-hindi-asr

Finetuned
(3098)
this model

Dataset used to train vanshnawander/whisper-small-hindi-asr

Evaluation results

  • Word Error Rate on LAHAJA (Hindi Multi-accent)
    self-reported
    36.170
  • Character Error Rate on LAHAJA (Hindi Multi-accent)
    self-reported
    11.360