Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей
dc.contributor.author | Gulmira, Tolegen | |
dc.contributor.author | Alymzhan, Toleu | |
dc.contributor.author | Zheng, Xiaoqing | |
dc.date.accessioned | 2017-01-11T04:08:15Z | |
dc.date.available | 2017-01-11T04:08:15Z | |
dc.date.issued | 2016 | |
dc.description.abstract | We addressed the Named Entity Recognition (NER) problem for the Kazakh language by using conditional random fields. Kazakh is a typical agglutinative language in which thousands of words could be generated by adding prefixes and suffixes to the same root, which arises a serious data sparsity problem for many NLP tasks. To reduce the data sparsity problem, a necessary preprocessing step is to split the words into their roots and morphemes by morphological analysis. In this study, we designed a CRF-based NER system for Kazakh, which leveraged the features derived from the results of a new-developed morphological analyzer, and found that the performance can be boosted by introducing such derived features. Moreover, we assembled a NER corpus which was manually annotated with location, organization and person names. | ru_RU |
dc.identifier.citation | Gulmira, Tolegen., Alymzhan, Toleu., Zheng, Xiaoqing. (2016) Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей. The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”.http://turklang.kz/en/index.php | ru_RU |
dc.identifier.uri | http://nur.nu.edu.kz/handle/123456789/2234 | |
dc.language.iso | en | ru_RU |
dc.publisher | The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016” | ru_RU |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/us/ | * |
dc.subject | Kazakh language | ru_RU |
dc.subject | agglutinative language | ru_RU |
dc.subject | named entity | ru_RU |
dc.subject | NER | ru_RU |
dc.subject | CRF | ru_RU |
dc.subject | Research Subject Categories::SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science | ru_RU |
dc.subject | казахский язык | ru_RU |
dc.subject | агглютинативный язык | ru_RU |
dc.subject | именованные сущности | ru_RU |
dc.subject | NER | ru_RU |
dc.subject | CRF | ru_RU |
dc.title | Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей | ru_RU |
dc.type | Article | ru_RU |