Streaming FastConformer-Hybrid Large (Ja)
This collection contains large size versions of cache-aware FastConformer-Hybrid (around 120M parameters) trained on a Japanse speech. These models are trained for streaming ASR with look-ahead of 1040ms which be used for very low-latency streaming applications. The model is hybrid with both Transducer and CTC decoders.
Model Architecture
These models are cache-aware versions of Hybrid FastConfomer which are trained for streaming ASR. You may find more info on cache-aware models here: Cache-aware Streaming Conformer . The models are trained with multiple look-aheads which makes the model to be able to support different latencies. To learn on how to switch between different look-ahead, you may read the documentation on the cache-aware models.
Datasets
The model in this collection is trained on two datasets comprising approxinately 20000 hours of Janpanese speech:
- Mozilla Common Voice Ja(v23.0)
- AsrSet_Ja
Performance
The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
In CER calculation, punctuation marks and non-alphabet characters are removed, and numbers are transformed to words using num2words library.
| Version | Decoder | JSUT basic5000 | MCV16.1 test |
|---|---|---|---|
| 1.1.0 | CTC | 10.53 | 19.0 |
- Downloads last month
- 4
Evaluation results
- Test CER on JSUT basic5000test set self-reported10.530
- Test CER on Mozilla Common Voice 16.1test set self-reported19.000