KAZAKH LARGE LANGUAGE MODEL (KAZ-LLM) FOR SENTIMENT ANALYSIS IN ONLINE MEDIA
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Engineering and Digital Sciences
Abstract
The increasing volume of digital content, especially in textual form containing various multilingual and cultural features has highlighted the need in efficient analysis models. Traditional machine learning approaches such as support vector machines, simple neural networks or Bayesian models struggle to capture the complexities of languages. Recent advancements in natural language processing (NLP) and the availability of various text data have made it possible for transformer-based models to perform more in-depth analyses. However, despite these advances, LLMs still face challenges when it comes to applications of under-resourced languages, including Kazakh. This work explores the potential of integrating Retrieval-Augmented Generation (RAG) systems with standalone LLM to enhance their performance in sentiment analysis tasks. By leveraging of embedding models and fine-tuning multilingual models with specific objectives the proposed approach aims to improve the accuracy and relevance of generated final responses . Since the performance of rag systems directly depends on the characteristics of embedding models, this work will evaluate and compare metrics with baseline models. Moreover, based on the parsed data from news feeds with a wide range of different topics, a synthetic dataset will be created for both training and evaluation. Performance evaluation does include metrics such as accuracy, precision, recall and other, providing insight into the applicability of the model in real-world applications.
Description
Citation
Nurgazy, A. (2025). Kazakh Large Language Model (Kaz-LLM) for Sentiment Analysis in Online Media. Nazarbayev University School of Engineering and Digital Sciences
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States
