MULTIMODAL MACHINE LEARNING FOR EMOTION RECOGNITION

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Emotion recognition has become a popular research area in recent years due to the abundance of useful applications. This technology has been used in a variety of areas, including social media, crowd monitoring, live streaming, and human-robot interaction. Recent approaches to emotion recognition have used neural networks such as transformers, multimodal classification, LSTMs, and convolutional neural networks. Recent research has been facilitated by publicly available datasets, which include videos of persons that have been labeled with the dominant emotion of the given scene. In this work, a multimodal technique is used to classify scenes by emotional expressions from such videos by extracting video frames, audio, and transcribed text. In this work, we have investigated ways to achieve improved performance and efficiency at each stage of the classification process, where we have focused on developing and refining the preprocessing stages of each data input type. This work has allowed us to achieve 89% accuracy on a commonly-used dataset, using a combination of video, audio and text.

Description

Citation

Kazikhan, M. (2025). Multimodal Machine Learning for Emotion Recognition. Nazarbayev University School of Engineering and Digital Sciences

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States