ENHANCED MULTIMODAL EMOTION RECOGNITION SYSTEM WITH DEEP LEARNING AND HYBRID FUSION

Loading...
Thumbnail Image

Files

Access status: Embargo until 2028-05-27 , Senior_Project_Report_nur.pdf (2.54 MB)

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

This paper presents a multimodal emotion recog- nition (MER) system that combines deep learning and hybrid transformer fusion to analyze emotional expressions from raw user-uploaded videos. The system processes and fuses informa- tion from three modalities—video, audio, and text—to identify six primary emotions: happiness, sadness, anger, frustration, excited, and neutral. Leveraging transformers and pre-trained encoders such as RoBERTa and VGG, our architecture captures intra- and inter-modal dependencies to improve classification performance. The MER model was integrated into a user- friendly web application, enabling real-time emotion inference via a React-based frontend and a Flask backend. Evaluated on the IEMOCAP dataset, the system achieved a validation weighted F1-score of 0.5128 and highlighted strong recognition capabilities for expressive emotions like sadness and anger. This work demonstrates the potential of multimodal deep learning approaches in emotion-aware systems and offers a scalable foundation for future applications in mental health monitoring, human-computer interaction, and affective computing.

Description

Citation

Rymkan, A., Sarsengaliyev, D., Khamiyev, A., Baidussenov, A., & Mukhametzhanov, M. (2024). Enhanced multimodal emotion recognition system with deep learning and hybrid fusion. Nazarbayev University School of Engineering and Digital Sciences.

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution 3.0 United States