Pham, Van Tung; Xu, Haihua; Khassanov, Yerbolat; Zeng, Zhiping; Chng, Eng Siong; Ni, Chongjia; Ma, Bin; Li, Haizhou
(arxiv, 2019)
The attention-based end-to-end (E2E) automatic speech
recognition (ASR) architecture allows for joint optimization
of acoustic and language models within a single network.
However, in a vanilla E2E ASR architecture, the ...