MULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION

dc.contributor.authorChokushev, Nursultan
dc.contributor.authorDarigulov, Bolatbek
dc.contributor.authorAkhmurzin, Galissan
dc.contributor.authorTurtkarayeva, Aidana
dc.date.accessioned2025-06-12T04:57:18Z
dc.date.available2025-06-12T04:57:18Z
dc.date.issued2025-04
dc.description.abstractMultimodal Emotion Recognition (MER) has increasingly relied on integrating diverse data sources such as audio, video, and electroencephalogram (EEG). Despite advances, effectively fusing these modalities still remains a challenging problem. In this paper, we propose a novel intermediate fusion framework utilizing custom convolutional neural networks (CNNs) tailored for each modality—audio, video, and EEG—combined with transformer-based fusion blocks employing multi-head attention mechanisms. Our approach integrates modality-specific features through intermediate fusion layers, allowing better emphasis on critical emotional cues. We benchmark our model on the EAV dataset and a recently proposed model utilizing this dataset, demonstrating that our proposed intermediate fusion improves emotion recognition performance compared to unimodal and recent baselines
dc.identifier.citationChokushev, N., Darigulov, B., Akhmurzin, G., Turtkarayeva, A. (2025). Multimodal Emotion Recognition with EEG, Audio, and Video using Transformer Encoder for Intermediate Fusion. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/8880
dc.language.isoen
dc.publisherNazarbayev University School of Engineering and Digital Sciences
dc.rightsCC0 1.0 Universalen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/
dc.subjectmultimodal learning
dc.subjectemotion recognition
dc.subjectin- termediate fusion
dc.subjectelectroencephalogram
dc.subjectmulti-head attention.
dc.subjecttype of access: embargo
dc.titleMULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION
dc.typeBachelor's thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multimodal Emotion Recognition with EEG, Audio, and Video using Transformer Encoder for Intermediate Fusion
Size:
4.7 MB
Format:
Adobe Portable Document Format
Description:
Bachelor's thesis
Access status: Embargo until 2026-01-01 , Download