KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

Abstract

This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. " "The dataset consists of about 93 hours of transcribed audio recordings spoken by two professional speakers (female and male). It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech (TTS) applications in both academia and industry. " "In this paper, we share our experience by describing the dataset development procedures and faced challenges, and discuss important future directions. " "To demonstrate the reliability of our dataset, we built baseline end-to-end TTS models and evaluated them using the subjective mean opinion score (MOS) measure. " "Evaluation results show that the best TTS models trained on our dataset achieve MOS above 4 for both speakers, which makes them applicable for practical use. " "The dataset, training recipe, and pretrained TTS models are freely available.

Description

Citation

Mussakhojayeva Saida, Janaliyeva Aigerim, Mirzakhmetov Almas, Khassanov Yerbolat, Varol Huseyin Atakan. (2021). KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset. Interspeech 2021. https://doi.org/10.21437/interspeech.2021-2124

Collections

Endorsement

Review

Supplemented By

Referenced By