seems that the model could not support overlapped speech recognition?
I have tried several audio clips, and find it hard for this model to recognize.
agree
Hello, you might consider trying SE-DiCoW, which is specifically designed to handle overlapping speech.
We recently released a new version here: https://huggingface.co/BUT-FIT/SE-DiCoW
The code and instructions are available at: https://github.com/BUTSpeechFIT/DiCoW and https://github.com/BUTSpeechFIT/TS-ASR-Whisper/tree/se_dicow
It outperforms this model on several benchmarks.
Hello, you might consider trying SE-DiCoW, which is specifically designed to handle overlapping speech.
We recently released a new version here: https://huggingface.co/BUT-FIT/SE-DiCoW
The code and instructions are available at: https://github.com/BUTSpeechFIT/DiCoW and https://github.com/BUTSpeechFIT/TS-ASR-Whisper/tree/se_dicow
It outperforms this model on several benchmarks.
thx, i will try it. By the way, how many languages are supported?
90+, all that are supported by Whisper v3-turbo
@scutrandom you can try our demo also on https://pcspeech-demo.fit.vutbr.cz/dicow/
Hi, I've tested the demo and overall it works quite well. However, when I tried it with Chinese audio, some sentences were incorrectly transcribed in English. Is there a way to configure the model to prevent this or lock it to Chinese?
Glad to hear that. You can enforce the language via forced_decoder_ids (see here: https://github.com/BUTSpeechFIT/TS-ASR-Whisper/blob/e869be0e7b70d1600d777041bd95d99ad54bc1ed/src/data/collators.py#L171). I can add that feature sometime this weekend, but feel free to open a PR if you want to try it yourself.