Conference papers
Permanent URI for this collection
Browse
Browsing Conference papers by Author "Assylbekov, Zhenisbek"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access A free/open-source hybrid morphological disambiguation tool for Kazakh(DOI: 10.13140/RG.2.2.12467.43045, 2016-04) Assylbekov, Zhenisbek; Washington, Jonathan; Tyers, Francis; Nurkas, Assulan; Sundetova, Aida; Karibayeva, Aidana; Abduali, Balzhan; Amirova, DinaThis paper presents the results of developing a morphological disambiguation tool for Kazakh. Starting with a previously developed rule-based approach, we tried to cope with the complex morphology of Kazakh by breaking up lexical forms across their derivational boundaries into inflectional groups and modeling their behavior with statistical methods. A hybrid rule-based/statistical approach appears to benefit morphological disambiguation demonstrating a per-token accuracy of 91% in running text.Item Open Access Experiments with Russian to Kazakh sentence alignment(The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”, 2016) Assylbekov, Zhenisbek; Myrzakhmetov, Bagdat; Makazhanov, AibekSentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper we investigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application of all of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences.