seems that the model could not support overlapped speech recognition?

by scutrandom - opened 13 days ago

Discussion

scutrandom

13 days ago

I have tried several audio clips, and find it hard for this model to recognize.

TintinChess

12 days ago

agree

Lakoc

8 days ago

•

edited 8 days ago

Hello, you might consider trying SE-DiCoW, which is specifically designed to handle overlapping speech.
We recently released a new version here: https://huggingface.co/BUT-FIT/SE-DiCoW
The code and instructions are available at: https://github.com/BUTSpeechFIT/DiCoW and https://github.com/BUTSpeechFIT/TS-ASR-Whisper/tree/se_dicow
It outperforms this model on several benchmarks.

scutrandom

8 days ago

Hello, you might consider trying SE-DiCoW, which is specifically designed to handle overlapping speech.
We recently released a new version here: https://huggingface.co/BUT-FIT/SE-DiCoW
The code and instructions are available at: https://github.com/BUTSpeechFIT/DiCoW and https://github.com/BUTSpeechFIT/TS-ASR-Whisper/tree/se_dicow
It outperforms this model on several benchmarks.

thx, i will try it. By the way, how many languages are supported?

Lakoc

8 days ago

90+, all that are supported by Whisper v3-turbo

Lakoc

8 days ago

@scutrandom you can try our demo also on https://pcspeech-demo.fit.vutbr.cz/dicow/

scutrandom

7 days ago

@scutrandom you can try our demo also on https://pcspeech-demo.fit.vutbr.cz/dicow/

Hi, I've tested the demo and overall it works quite well. However, when I tried it with Chinese audio, some sentences were incorrectly transcribed in English. Is there a way to configure the model to prevent this or lock it to Chinese?

Lakoc

7 days ago

Glad to hear that. You can enforce the language via forced_decoder_ids (see here: https://github.com/BUTSpeechFIT/TS-ASR-Whisper/blob/e869be0e7b70d1600d777041bd95d99ad54bc1ed/src/data/collators.py#L171). I can add that feature sometime this weekend, but feel free to open a PR if you want to try it yourself.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment