ENHANCED MULTIMODAL EMOTION RECOGNITION SYSTEM WITH DEEP LEARNING AND HYBRID FUSION

dc.contributor.authorBaidussenov, Alikhan
dc.contributor.authorKhamiyev, Assylzhan
dc.contributor.authorSarsengaliyev, Damir
dc.contributor.authorRymkan, Alisher
dc.contributor.authorMukhametzhanov, Meiram
dc.date.accessioned2025-06-13T10:08:52Z
dc.date.available2025-06-13T10:08:52Z
dc.date.issued2025-05-25
dc.description.abstractThis paper presents a multimodal emotion recog- nition (MER) system that combines deep learning and hybrid transformer fusion to analyze emotional expressions from raw user-uploaded videos. The system processes and fuses informa- tion from three modalities—video, audio, and text—to identify six primary emotions: happiness, sadness, anger, frustration, excited, and neutral. Leveraging transformers and pre-trained encoders such as RoBERTa and VGG, our architecture captures intra- and inter-modal dependencies to improve classification performance. The MER model was integrated into a user- friendly web application, enabling real-time emotion inference via a React-based frontend and a Flask backend. Evaluated on the IEMOCAP dataset, the system achieved a validation weighted F1-score of 0.5128 and highlighted strong recognition capabilities for expressive emotions like sadness and anger. This work demonstrates the potential of multimodal deep learning approaches in emotion-aware systems and offers a scalable foundation for future applications in mental health monitoring, human-computer interaction, and affective computing.
dc.identifier.citationRymkan, A., Sarsengaliyev, D., Khamiyev, A., Baidussenov, A., & Mukhametzhanov, M. (2024). Enhanced multimodal emotion recognition system with deep learning and hybrid fusion. Nazarbayev University School of Engineering and Digital Sciences.
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/8967
dc.language.isoen
dc.publisherNazarbayev University School of Engineering and Digital Sciences
dc.rightsAttribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/
dc.subjectMultimodal Emotion Recognition
dc.subjectDeep Learning
dc.subjectTransformer Fusion
dc.subjectRoBERTa
dc.subjectVGG
dc.subjectIEMOCAP Dataset
dc.subjectPyTorch
dc.subjectWhisper
dc.subjectReact.js
dc.subjectEmotion Classification.
dc.subjecttype of access: embargo
dc.titleENHANCED MULTIMODAL EMOTION RECOGNITION SYSTEM WITH DEEP LEARNING AND HYBRID FUSION
dc.typeBachelor's Capstone project

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Senior_Project_Report_nur.pdf
Size:
2.54 MB
Format:
Adobe Portable Document Format
Description:
Bachelor's Capstone project
Access status: Embargo until 2028-05-27 , Download