MULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION
| dc.contributor.author | Chokushev, Nursultan | |
| dc.contributor.author | Darigulov, Bolatbek | |
| dc.contributor.author | Akhmurzin, Galissan | |
| dc.contributor.author | Turtkarayeva, Aidana | |
| dc.date.accessioned | 2025-06-12T04:57:18Z | |
| dc.date.available | 2025-06-12T04:57:18Z | |
| dc.date.issued | 2025-04 | |
| dc.description.abstract | Multimodal Emotion Recognition (MER) has increasingly relied on integrating diverse data sources such as audio, video, and electroencephalogram (EEG). Despite advances, effectively fusing these modalities still remains a challenging problem. In this paper, we propose a novel intermediate fusion framework utilizing custom convolutional neural networks (CNNs) tailored for each modality—audio, video, and EEG—combined with transformer-based fusion blocks employing multi-head attention mechanisms. Our approach integrates modality-specific features through intermediate fusion layers, allowing better emphasis on critical emotional cues. We benchmark our model on the EAV dataset and a recently proposed model utilizing this dataset, demonstrating that our proposed intermediate fusion improves emotion recognition performance compared to unimodal and recent baselines | |
| dc.identifier.citation | Chokushev, N., Darigulov, B., Akhmurzin, G., Turtkarayeva, A. (2025). Multimodal Emotion Recognition with EEG, Audio, and Video using Transformer Encoder for Intermediate Fusion. Nazarbayev University School of Engineering and Digital Sciences | |
| dc.identifier.uri | https://nur.nu.edu.kz/handle/123456789/8880 | |
| dc.language.iso | en | |
| dc.publisher | Nazarbayev University School of Engineering and Digital Sciences | |
| dc.rights | CC0 1.0 Universal | en |
| dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | |
| dc.subject | multimodal learning | |
| dc.subject | emotion recognition | |
| dc.subject | in- termediate fusion | |
| dc.subject | electroencephalogram | |
| dc.subject | multi-head attention. | |
| dc.subject | type of access: embargo | |
| dc.title | MULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION | |
| dc.type | Bachelor's thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Multimodal Emotion Recognition with EEG, Audio, and Video using Transformer Encoder for Intermediate Fusion
- Size:
- 4.7 MB
- Format:
- Adobe Portable Document Format
- Description:
- Bachelor's thesis