distil-whisper
/

distil-small.en

@@ -41,7 +41,7 @@ to distill Whisper on other languages. If you are interested in distilling Whisp
 provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
 [Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
-### Why is `distil-small.en` slower than `distil-large-v2`?
 While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
 use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
@@ -170,7 +170,7 @@ In the following code-snippet, we load the assistant Distil-Whisper model standa
 specify it as the "assistant model" for generation:
 ```python
-from transformers import pipeline, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq, AutoProcessor
 import torch
 from datasets import load_dataset
@@ -249,10 +249,6 @@ model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dt
 ### Running Distil-Whisper in `openai-whisper`
-Coming soon!
-<!---
 To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
 ```bash
@@ -268,8 +264,8 @@ from datasets import load_dataset
 from huggingface_hub import hf_hub_download
 from whisper import load_model, transcribe
-medium_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
-model = load_model(medium_en)
 dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
 sample = dataset[0]["audio"]["array"]
@@ -279,22 +275,21 @@ pred_out = transcribe(model, audio=sample)
 print(pred_out["text"])
 ```
 To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
 ```python
 pred_out = transcribe(model, audio="audio.mp3")
 ```
---->
 ### Whisper.cpp
-Coming soon!
-<!---
 Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
 sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
-on Mac M1, `distil-medium.en` is 4x faster than `large-v2`, while performing to within 1% WER over long-form audio.
 Steps for getting started:
 1. Clone the Whisper.cpp repository:
@@ -305,23 +300,21 @@ cd whisper.cpp
 2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
 ```bash
-python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-medium-32-2.en.bin', local_dir='./models')"
 ```
 Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
 ```bash
-wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-medium-32-2.en.bin -P ./models
 ```
 3. Run inference using the provided sample audio:
 ```bash
-make -j && ./main -m models/ggml-medium-32-2.en.bin -f samples/jfk.wav
 ```
---->
 ### Transformers.js
 ```js

 provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
 [Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
+### Why is distil-small.en slower than distil-large-v2?
 While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
 use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
 specify it as the "assistant model" for generation:
 ```python
+from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor
 import torch
 from datasets import load_dataset
 ### Running Distil-Whisper in `openai-whisper`
 To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
 ```bash
 from huggingface_hub import hf_hub_download
 from whisper import load_model, transcribe
+distil_small_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
+model = load_model(distil_small_en)
 dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
 sample = dataset[0]["audio"]["array"]
 print(pred_out["text"])
 ```
+Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently,
+you can re-use the same example, and the weights will be loaded directly from your cache without having to download them
+again.
 To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
 ```python
 pred_out = transcribe(model, audio="audio.mp3")
 ```
 ### Whisper.cpp
 Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
 sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
+on Mac M1, `distil-small.en` is over 4x faster than `large-v2`, while performing to within 1.4% WER over long-form audio.
 Steps for getting started:
 1. Clone the Whisper.cpp repository:
 2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
 ```bash
+python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-distil-small.en.bin', local_dir='./models')"
 ```
 Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
 ```bash
+wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-distil-small.en.bin -P ./models
 ```
 3. Run inference using the provided sample audio:
 ```bash
+make -j && ./main -m models/ggml-distil-small.en.bin -f samples/jfk.wav
 ```
 ### Transformers.js
 ```js