arxiv:1806.05059

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

Published on Jun 12, 2018

Authors:

Abstract

Multilingual automatic speech recognition using a single Transformer model with sub-word units and language symbols shows improved performance over traditional LSTM approaches.

AI-generated summary

Sequence-to-sequence attention-based models integrate an acoustic, pronunciation and language model into a single neural network, which make them very suitable for multilingual automatic speech recognition (ASR). In this paper, we are concerned with multilingual speech recognition on low-resource languages by a single Transformer, one of sequence-to-sequence attention-based models. Sub-words are employed as the multilingual modeling unit without using any pronunciation lexicon. First, we show that a single multilingual ASR Transformer performs well on low-resource languages despite of some language confusion. We then look at incorporating language information into the model by inserting the language symbol at the beginning or at the end of the original sub-words sequence under the condition of language information being known during training. Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can obtain relatively 10.5\% average word error rate (WER) reduction compared to SHL-MLSTM with residual learning. We go on to show that, assuming the language information being known during training and testing, about relatively 12.4\% average WER reduction can be observed compared to SHL-MLSTM with residual learning through giving the language symbol as the sentence start token.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1806.05059 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1806.05059 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1806.05059 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.