DSpace Repository

Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей

Show simple item record

dc.contributor.author Gulmira, Tolegen
dc.contributor.author Alymzhan, Toleu
dc.contributor.author Zheng, Xiaoqing
dc.date.accessioned 2017-01-11T04:08:15Z
dc.date.available 2017-01-11T04:08:15Z
dc.date.issued 2016
dc.identifier.citation Gulmira, Tolegen., Alymzhan, Toleu., Zheng, Xiaoqing. (2016) Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей. The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”.http://turklang.kz/en/index.php ru_RU
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/2234
dc.description.abstract We addressed the Named Entity Recognition (NER) problem for the Kazakh language by using conditional random fields. Kazakh is a typical agglutinative language in which thousands of words could be generated by adding prefixes and suffixes to the same root, which arises a serious data sparsity problem for many NLP tasks. To reduce the data sparsity problem, a necessary preprocessing step is to split the words into their roots and morphemes by morphological analysis. In this study, we designed a CRF-based NER system for Kazakh, which leveraged the features derived from the results of a new-developed morphological analyzer, and found that the performance can be boosted by introducing such derived features. Moreover, we assembled a NER corpus which was manually annotated with location, organization and person names. ru_RU
dc.language.iso en ru_RU
dc.publisher The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016” ru_RU
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Kazakh language ru_RU
dc.subject agglutinative language ru_RU
dc.subject named entity ru_RU
dc.subject NER ru_RU
dc.subject CRF ru_RU
dc.subject Research Subject Categories::SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science ru_RU
dc.subject казахский язык ru_RU
dc.subject агглютинативный язык ru_RU
dc.subject именованные сущности ru_RU
dc.subject NER ru_RU
dc.subject CRF ru_RU
dc.title Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей ru_RU
dc.type Article ru_RU


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States

Video Guide

Submission guideSubmission guide

Submit your materials for publication to

NU Repository Drive

Browse

My Account

Statistics