INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR
Loading...
Date
Authors
Pham, Van Tung
Xu, Haihua
Khassanov, Yerbolat
Zeng, Zhiping
Chng, Eng Siong
Ni, Chongjia
Ma, Bin
Li, Haizhou
Journal Title
Journal ISSN
Volume Title
Publisher
arxiv
Abstract
The attention-based end-to-end (E2E) automatic speech
recognition (ASR) architecture allows for joint optimization
of acoustic and language models within a single network.
However, in a vanilla E2E ASR architecture, the decoder
sub-network (subnet), which incorporates the role of the lan guage model (LM), is conditioned on the encoder output.
This means that the acoustic encoder and the language model
are entangled that doesn’t allow language model to be trained
separately from external text data. To address this problem,
in this work, we propose a new architecture that separates
the decoder subnet from the encoder output. In this way, the
decoupled subnet becomes an independently trainable LM
subnet, which can easily be updated using the external text
data. We study two strategies for updating the new architec ture. Experimental results show that, 1) the independent LM
architecture benefits from external text data, achieving 9.3%
and 22.8% relative character and word error rate reduction on
Mandarin HKUST and English NSC datasets respectively; 2)
the proposed architecture works well with external LM and
can be generalized to different amount of labelled data.
Description
Citation
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States
