Streaming FastConformer-Hybrid Large (Ja)

This collection contains large size versions of cache-aware FastConformer-Hybrid (around 120M parameters) trained on a Japanse speech. These models are trained for streaming ASR with look-ahead of 1040ms which be used for very low-latency streaming applications. The model is hybrid with both Transducer and CTC decoders.

Model Architecture

These models are cache-aware versions of Hybrid FastConfomer which are trained for streaming ASR. You may find more info on cache-aware models here: Cache-aware Streaming Conformer . The models are trained with multiple look-aheads which makes the model to be able to support different latencies. To learn on how to switch between different look-ahead, you may read the documentation on the cache-aware models.

Datasets

The model in this collection is trained on two datasets comprising approxinately 20000 hours of Janpanese speech:

Mozilla Common Voice Ja(v23.0)
AsrSet_Ja

Performance

The following table summarizes the performance of this model in terms of Character Error Rate (CER%).

In CER calculation, punctuation marks and non-alphabet characters are removed, and numbers are transformed to words using num2words library.

Version	Decoder	JSUT basic5000	MCV16.1 test
1.1.0	CTC	10.53	19.0

Downloads last month: 4

Evaluation results

Test CER on JSUT basic5000
test set self-reported

10.530
Test CER on Mozilla Common Voice 16.1
test set self-reported

19.000

View on Papers With Code