DSpace Repository

On Various Approaches to Machine Translation from Russian to Kazakh

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Makazhanov, Aibek
dc.contributor.author Myrzakhmetov, Bagdat
dc.contributor.author Kozhirbayev, Zhanibek
dc.contributor.editor Suleymanov, Dzhavdet
dc.contributor.editor Gatiatullin, Ayrat
dc.date.accessioned 2018-05-02T09:46:08Z
dc.date.available 2018-05-02T09:46:08Z
dc.date.issued 2017-10-21
dc.identifier.isbn 978-5-9690-0406-1
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/3167
dc.description.abstract In this work we compare a number of approaches to machine translation (MT) form Russian to Kazakh. We focus specifically on this pair of languages for a number of reasons. First, these languages are relatively understudied in terms of MT research, as well as, natural language processing (NLP) research in general. Kazakh, in particular, has been actively studied with modern methods for less than a decade. Second, this pair of languages poses several processing challenges rooted in their nature: both languages are morphologically complex and tend to have free order constituents, which makes long term dependencies rather frequent. From the perspective of data-driven approaches to NLP that means increased data sparseness and high OOV rates. Lastly, apart from scientific curiosity there is a strong practical demand for high quality MT between the languages in question. Kazakh is the state language of Kazakhstan, while Russian, due to a strong Soviet heritage, largely remains a language of professional communication and conduct. This frequently results in paperwork being initially prepared in Russian and then translated into Kazakh. Thus, high quality MT systems are in demand as they would greatly reduce manual labor of the professional translators. We categorize the approaches that we compare into data-driven, linguistically motivated and hybrid ones. In the first category we compare a phrase-based statistical MT (SMT) and a neural MT (NMT) approaches. For the latter we experiment with three different neural architectures. As the result of this comparison we conclude that while NMT is a promising research direction one needs a lot more computational resources and, perhaps, even more data to achieve the level of accuracy offered by SMT. As for linguistically motivated and hybrid approaches we compare a rule-based approach with a so called factored model, which is essentially an SMT model that takes into account various linguistic factors, such as parts of speech, lemmata, morphology, etc. Although this comparison has shown that factored models should be strongly favored, we must note that the Russian-Kazakh pair for the rule-based system that was used in the experiment is still a work in progress. Lastly, one final comparison between the best performing models from each category, i.e. a pure data-driven SMT-model and a hybrid factored model, has favored the former. While we acknowledge that the present work makes no significant contribution to the NLP research in general, we want to point out that, to the best of our knowledge, for the particular language pair considered herein experiments on NMT and factored SMT have never been performed before. We speculate that one possible reason for this is the absence of an accessible Russian-Kazakh parallel corpus that is suitable for those experiments in terms of both size and quality. With this in mind we also provide a detailed description of the parallel data set that we used for our experiments and which we plan to make available in the future. en_US
dc.language.iso en en_US
dc.publisher Tatarstan Academy of Sciences en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Machine translation; RBMT; NMT; SMT; factored SMT en_US
dc.title On Various Approaches to Machine Translation from Russian to Kazakh en_US
dc.type Conference Paper en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States