KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

dc.contributor.authorSaida Mussakhojayeva
dc.contributor.authorAigerim Janaliyeva
dc.contributor.authorAlmas Mirzakhmetov
dc.contributor.authorYerbolat Khassanov
dc.contributor.authorHüseyin Atakan Varol
dc.date.accessioned2025-08-21T09:42:23Z
dc.date.available2025-08-21T09:42:23Z
dc.date.issued2021-08-27
dc.description.abstractThis paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. " "The dataset consists of about 93 hours of transcribed audio recordings spoken by two professional speakers (female and male). It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech (TTS) applications in both academia and industry. " "In this paper, we share our experience by describing the dataset development procedures and faced challenges, and discuss important future directions. " "To demonstrate the reliability of our dataset, we built baseline end-to-end TTS models and evaluated them using the subjective mean opinion score (MOS) measure. " "Evaluation results show that the best TTS models trained on our dataset achieve MOS above 4 for both speakers, which makes them applicable for practical use. " "The dataset, training recipe, and pretrained TTS models are freely available.en
dc.identifier.citationMussakhojayeva Saida, Janaliyeva Aigerim, Mirzakhmetov Almas, Khassanov Yerbolat, Varol Huseyin Atakan. (2021). KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset. Interspeech 2021. https://doi.org/10.21437/interspeech.2021-2124en
dc.identifier.doi10.21437/interspeech.2021-2124
dc.identifier.urihttps://doi.org/10.21437/interspeech.2021-2124
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/9791
dc.language.isoen
dc.publisherISCA
dc.relation.ispartofInterspeech 2021en
dc.sourceInterspeech 2021, (2021)en
dc.subjectComputer scienceen
dc.subjectSpeech synthesisen
dc.subjectKazakhen
dc.subjectBaseline (sea)en
dc.subjectMean opinion scoreen
dc.subjectReliability (semiconductor)en
dc.subjectNatural language processingen
dc.subjectSpeech recognitionen
dc.subjectArtificial intelligenceen
dc.subjectMeasure (data warehouse)en
dc.subjectQuality (philosophy)en
dc.subjectOpen sourceen
dc.subjectMetric (unit)en
dc.subjectData miningen
dc.subjectSoftwareen
dc.subjectLinguisticsen
dc.subjectEngineeringen
dc.subjectPhilosophyen
dc.subjectOceanographyen
dc.subjectOperations managementen
dc.subjectPower (physics)en
dc.subjectPhysicsen
dc.subjectQuantum mechanicsen
dc.subjectGeologyen
dc.subjectEpistemologyen
dc.subjectProgramming languageen
dc.subjecttype of access: open accessen
dc.titleKazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataseten
dc.typearticleen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KazakhTTS__An_Open-Source_Kazakh_Text-to-Speech_Synthesis_Dataset__35296a94.pdf
Size:
285.4 KB
Format:
Adobe Portable Document Format

Collections