Streaming FastConformer-Hybrid Large (Ja)

This collection contains large size versions of cache-aware FastConformer-Hybrid (around 120M parameters) trained on a Japanse speech. These models are trained for streaming ASR with look-ahead of 1040ms which be used for very low-latency streaming applications. The model is hybrid with both Transducer and CTC decoders.

Model Architecture

These models are cache-aware versions of Hybrid FastConfomer which are trained for streaming ASR. You may find more info on cache-aware models here: Cache-aware Streaming Conformer . The models are trained with multiple look-aheads which makes the model to be able to support different latencies. To learn on how to switch between different look-ahead, you may read the documentation on the cache-aware models.

Datasets

The model in this collection is trained on two datasets comprising approxinately 20000 hours of Janpanese speech:

  • Mozilla Common Voice Ja(v23.0)
  • AsrSet_Ja

Performance

The following table summarizes the performance of this model in terms of Character Error Rate (CER%).

In CER calculation, punctuation marks and non-alphabet characters are removed, and numbers are transformed to words using num2words library.

Version Decoder JSUT basic5000 MCV16.1 test
1.1.0 CTC 10.53 19.0
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results