MULTILINGUAL TEXT-TO-SPEECH ENGINE

Isturlayeva, Aidana

MULTILINGUAL TEXT-TO-SPEECH ENGINE

Files

Thesis - Aidana Isturlayeva.pdf (4.14 MB)

Date

2021-07

Authors

Isturlayeva, Aidana

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Serenity and fluency are the most important synthesis qualities expected from text-tospeech. This project introduces a multilingual text-to-speech (TTS) engine, which is capable of reproducing high-quality speech in English, Kazakh and Russian languages. The main idea is to address the limitation of existing TTS that have one voice in one language. So we have 3 languages at the same time. A text-to-speech synthesis system usually consists of several stages: a text analysis interface, an acoustic model, and a sound synthesis module. For synthesis, we use Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from symbols. Also described a high-quality speech dataset for Kazakh, Russian and English languages. The dataset contains 40 hours per language of transcribed audio recordings spoken by a Female professional speaker. The publicly available large-scale synthesis was developed to promote multilingual text-to-speech (TTS) applications in academia and industry. This paper outlined our experience by describing the dataset development procedures, facing challenges, and discussing important future directions. To evaluate the resulting system, we conducted subjective assessment tests based on the Likert system.

Keywords

text-to-speech, TTS, Research Subject Categories::TECHNOLOGY, Type of access: Gated Access, speech recognition

Citation

Isturlayeva, A. (2021). Multilingual Text-To-Speech Engine (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan

URI

http://nur.nu.edu.kz/handle/123456789/5620

Collections

02. Master's Thesis

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States

Full item page

MULTILINGUAL TEXT-TO-SPEECH ENGINE

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license