KAZAKH LARGE LANGUAGE MODEL (KAZ-LLM) FOR SENTIMENT ANALYSIS IN ONLINE MEDIA

dc.contributor.authorNurgazy, Abzal
dc.date.accessioned2025-05-16T10:44:03Z
dc.date.available2025-05-16T10:44:03Z
dc.date.issued2025-05-05
dc.description.abstractThe increasing volume of digital content, especially in textual form containing various multilingual and cultural features has highlighted the need in efficient analysis models. Traditional machine learning approaches such as support vector machines, simple neural networks or Bayesian models struggle to capture the complexities of languages. Recent advancements in natural language processing (NLP) and the availability of various text data have made it possible for transformer-based models to perform more in-depth analyses. However, despite these advances, LLMs still face challenges when it comes to applications of under-resourced languages, including Kazakh. This work explores the potential of integrating Retrieval-Augmented Generation (RAG) systems with standalone LLM to enhance their performance in sentiment analysis tasks. By leveraging of embedding models and fine-tuning multilingual models with specific objectives the proposed approach aims to improve the accuracy and relevance of generated final responses . Since the performance of rag systems directly depends on the characteristics of embedding models, this work will evaluate and compare metrics with baseline models. Moreover, based on the parsed data from news feeds with a wide range of different topics, a synthetic dataset will be created for both training and evaluation. Performance evaluation does include metrics such as accuracy, precision, recall and other, providing insight into the applicability of the model in real-world applications.
dc.identifier.citationNurgazy, A. (2025). Kazakh Large Language Model (Kaz-LLM) for Sentiment Analysis in Online Media. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/8512
dc.language.isoen
dc.publisherNazarbayev University School of Engineering and Digital Sciences
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/
dc.subjectNatural Language Processing (NLP)
dc.subjectRetrieval-Augmented Generation (RAG)
dc.subjectEmbedding Models
dc.subjectFine-tuning
dc.subjectSentiment Analysis
dc.subjectSynthetic Dataset
dc.subjecttype of access: open access
dc.titleKAZAKH LARGE LANGUAGE MODEL (KAZ-LLM) FOR SENTIMENT ANALYSIS IN ONLINE MEDIA
dc.typeMaster`s thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AbzalNurgazy_master_thesis.pdf
Size:
2.86 MB
Format:
Adobe Portable Document Format
Description:
Master`s thesis