Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей

dc.contributor.authorGulmira, Tolegen
dc.contributor.authorAlymzhan, Toleu
dc.contributor.authorZheng, Xiaoqing
dc.date.accessioned2017-01-11T04:08:15Z
dc.date.available2017-01-11T04:08:15Z
dc.date.issued2016
dc.description.abstractWe addressed the Named Entity Recognition (NER) problem for the Kazakh language by using conditional random fields. Kazakh is a typical agglutinative language in which thousands of words could be generated by adding prefixes and suffixes to the same root, which arises a serious data sparsity problem for many NLP tasks. To reduce the data sparsity problem, a necessary preprocessing step is to split the words into their roots and morphemes by morphological analysis. In this study, we designed a CRF-based NER system for Kazakh, which leveraged the features derived from the results of a new-developed morphological analyzer, and found that the performance can be boosted by introducing such derived features. Moreover, we assembled a NER corpus which was manually annotated with location, organization and person names.ru_RU
dc.identifier.citationGulmira, Tolegen., Alymzhan, Toleu., Zheng, Xiaoqing. (2016) Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей. The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”.http://turklang.kz/en/index.phpru_RU
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/2234
dc.language.isoenru_RU
dc.publisherThe 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”ru_RU
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectKazakh languageru_RU
dc.subjectagglutinative languageru_RU
dc.subjectnamed entityru_RU
dc.subjectNERru_RU
dc.subjectCRFru_RU
dc.subjectResearch Subject Categories::SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems scienceru_RU
dc.subjectказахский языкru_RU
dc.subjectагглютинативный языкru_RU
dc.subjectименованные сущностиru_RU
dc.subjectNERru_RU
dc.subjectCRFru_RU
dc.titleNamed Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полейru_RU
dc.typeArticleru_RU

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
NER.pdf
Size:
770.85 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.22 KB
Format:
Item-specific license agreed upon to submission
Description: