Commit
·
75040bd
1
Parent(s):
1ca7336
Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ to distill Whisper on other languages. If you are interested in distilling Whisp
|
|
| 41 |
provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
|
| 42 |
[Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
|
| 43 |
|
| 44 |
-
### Why is
|
| 45 |
|
| 46 |
While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
|
| 47 |
use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
|
|
@@ -170,7 +170,7 @@ In the following code-snippet, we load the assistant Distil-Whisper model standa
|
|
| 170 |
specify it as the "assistant model" for generation:
|
| 171 |
|
| 172 |
```python
|
| 173 |
-
from transformers import pipeline,
|
| 174 |
import torch
|
| 175 |
from datasets import load_dataset
|
| 176 |
|
|
@@ -249,10 +249,6 @@ model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dt
|
|
| 249 |
|
| 250 |
### Running Distil-Whisper in `openai-whisper`
|
| 251 |
|
| 252 |
-
Coming soon!
|
| 253 |
-
|
| 254 |
-
<!---
|
| 255 |
-
|
| 256 |
To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
|
| 257 |
|
| 258 |
```bash
|
|
@@ -268,8 +264,8 @@ from datasets import load_dataset
|
|
| 268 |
from huggingface_hub import hf_hub_download
|
| 269 |
from whisper import load_model, transcribe
|
| 270 |
|
| 271 |
-
|
| 272 |
-
model = load_model(
|
| 273 |
|
| 274 |
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
| 275 |
sample = dataset[0]["audio"]["array"]
|
|
@@ -279,22 +275,21 @@ pred_out = transcribe(model, audio=sample)
|
|
| 279 |
print(pred_out["text"])
|
| 280 |
```
|
| 281 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 282 |
To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
|
| 283 |
|
| 284 |
```python
|
| 285 |
pred_out = transcribe(model, audio="audio.mp3")
|
| 286 |
```
|
| 287 |
-
--->
|
| 288 |
|
| 289 |
### Whisper.cpp
|
| 290 |
|
| 291 |
-
Coming soon!
|
| 292 |
-
|
| 293 |
-
<!---
|
| 294 |
-
|
| 295 |
Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
|
| 296 |
sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
|
| 297 |
-
on Mac M1, `distil-
|
| 298 |
|
| 299 |
Steps for getting started:
|
| 300 |
1. Clone the Whisper.cpp repository:
|
|
@@ -305,23 +300,21 @@ cd whisper.cpp
|
|
| 305 |
2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
|
| 306 |
|
| 307 |
```bash
|
| 308 |
-
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-
|
| 309 |
```
|
| 310 |
|
| 311 |
Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
|
| 312 |
|
| 313 |
```bash
|
| 314 |
-
wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-
|
| 315 |
```
|
| 316 |
|
| 317 |
3. Run inference using the provided sample audio:
|
| 318 |
|
| 319 |
```bash
|
| 320 |
-
make -j && ./main -m models/ggml-
|
| 321 |
```
|
| 322 |
|
| 323 |
-
--->
|
| 324 |
-
|
| 325 |
### Transformers.js
|
| 326 |
|
| 327 |
```js
|
|
|
|
| 41 |
provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
|
| 42 |
[Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
|
| 43 |
|
| 44 |
+
### Why is distil-small.en slower than distil-large-v2?
|
| 45 |
|
| 46 |
While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
|
| 47 |
use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
|
|
|
|
| 170 |
specify it as the "assistant model" for generation:
|
| 171 |
|
| 172 |
```python
|
| 173 |
+
from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor
|
| 174 |
import torch
|
| 175 |
from datasets import load_dataset
|
| 176 |
|
|
|
|
| 249 |
|
| 250 |
### Running Distil-Whisper in `openai-whisper`
|
| 251 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 252 |
To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
|
| 253 |
|
| 254 |
```bash
|
|
|
|
| 264 |
from huggingface_hub import hf_hub_download
|
| 265 |
from whisper import load_model, transcribe
|
| 266 |
|
| 267 |
+
distil_small_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
|
| 268 |
+
model = load_model(distil_small_en)
|
| 269 |
|
| 270 |
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
| 271 |
sample = dataset[0]["audio"]["array"]
|
|
|
|
| 275 |
print(pred_out["text"])
|
| 276 |
```
|
| 277 |
|
| 278 |
+
Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently,
|
| 279 |
+
you can re-use the same example, and the weights will be loaded directly from your cache without having to download them
|
| 280 |
+
again.
|
| 281 |
+
|
| 282 |
To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
|
| 283 |
|
| 284 |
```python
|
| 285 |
pred_out = transcribe(model, audio="audio.mp3")
|
| 286 |
```
|
|
|
|
| 287 |
|
| 288 |
### Whisper.cpp
|
| 289 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 290 |
Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
|
| 291 |
sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
|
| 292 |
+
on Mac M1, `distil-small.en` is over 4x faster than `large-v2`, while performing to within 1.4% WER over long-form audio.
|
| 293 |
|
| 294 |
Steps for getting started:
|
| 295 |
1. Clone the Whisper.cpp repository:
|
|
|
|
| 300 |
2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
|
| 301 |
|
| 302 |
```bash
|
| 303 |
+
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-distil-small.en.bin', local_dir='./models')"
|
| 304 |
```
|
| 305 |
|
| 306 |
Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
|
| 307 |
|
| 308 |
```bash
|
| 309 |
+
wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-distil-small.en.bin -P ./models
|
| 310 |
```
|
| 311 |
|
| 312 |
3. Run inference using the provided sample audio:
|
| 313 |
|
| 314 |
```bash
|
| 315 |
+
make -j && ./main -m models/ggml-distil-small.en.bin -f samples/jfk.wav
|
| 316 |
```
|
| 317 |
|
|
|
|
|
|
|
| 318 |
### Transformers.js
|
| 319 |
|
| 320 |
```js
|