STATISTICAL METHODS IN NATURAL LANGUAGE PROCESSING
Loading...
Date
2024-04-28
Authors
Nurkhan, Laiyk
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Sciences and Humanities
Abstract
This capstone project explores the application of statistical method
ologies to two distinct natural language processing (NLP) tasks: machine
translation between Ukrainian and Russian languages and the classifica
tion of comments for hate speech detection. The study shows that the
strategic integration of statistical approaches can improve performance
of the machine translation and text classification problems. The imple
mentation of linear regression with an added orthogonal constraint on
weight vectors has resulted in higher precision scores. For the classifi
cation of hate speech within textual comments, logistic regression with
TF-IDF features was identified as the the most effective model in terms
of AUC-ROC metric.
Description
Keywords
machine translation, Type of access: Restricted
Citation
Nurkhan, Laiyk. (2024). Statistical Methods in Natural Language Processing. Nazarbayev University School of Sciences and Humanities