Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей

Loading...
Thumbnail Image

Date

2016

Authors

Gulmira, Tolegen
Alymzhan, Toleu
Zheng, Xiaoqing

Journal Title

Journal ISSN

Volume Title

Publisher

The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”

Abstract

We addressed the Named Entity Recognition (NER) problem for the Kazakh language by using conditional random fields. Kazakh is a typical agglutinative language in which thousands of words could be generated by adding prefixes and suffixes to the same root, which arises a serious data sparsity problem for many NLP tasks. To reduce the data sparsity problem, a necessary preprocessing step is to split the words into their roots and morphemes by morphological analysis. In this study, we designed a CRF-based NER system for Kazakh, which leveraged the features derived from the results of a new-developed morphological analyzer, and found that the performance can be boosted by introducing such derived features. Moreover, we assembled a NER corpus which was manually annotated with location, organization and person names.

Description

Keywords

Kazakh language, agglutinative language, named entity, NER, CRF, Research Subject Categories::SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science, казахский язык, агглютинативный язык, именованные сущности, NER, CRF

Citation

Gulmira, Tolegen., Alymzhan, Toleu., Zheng, Xiaoqing. (2016) Named Entity Recognition for Kazakh Using Conditional Random Fields / Извлечение именованных сущностей из текста на Казахском языке с использованием условных случайных полей. The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”.http://turklang.kz/en/index.php