MULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION

Chokushev, Nursultan; Darigulov, Bolatbek; Akhmurzin, Galissan; Turtkarayeva, Aidana

MULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION

dc.contributor.author	Chokushev, Nursultan
dc.contributor.author	Darigulov, Bolatbek
dc.contributor.author	Akhmurzin, Galissan
dc.contributor.author	Turtkarayeva, Aidana
dc.date.accessioned	2025-06-12T04:57:18Z
dc.date.available	2025-06-12T04:57:18Z
dc.date.issued	2025-04
dc.description.abstract	Multimodal Emotion Recognition (MER) has increasingly relied on integrating diverse data sources such as audio, video, and electroencephalogram (EEG). Despite advances, effectively fusing these modalities still remains a challenging problem. In this paper, we propose a novel intermediate fusion framework utilizing custom convolutional neural networks (CNNs) tailored for each modality—audio, video, and EEG—combined with transformer-based fusion blocks employing multi-head attention mechanisms. Our approach integrates modality-specific features through intermediate fusion layers, allowing better emphasis on critical emotional cues. We benchmark our model on the EAV dataset and a recently proposed model utilizing this dataset, demonstrating that our proposed intermediate fusion improves emotion recognition performance compared to unimodal and recent baselines
dc.identifier.citation	Chokushev, N., Darigulov, B., Akhmurzin, G., Turtkarayeva, A. (2025). Multimodal Emotion Recognition with EEG, Audio, and Video using Transformer Encoder for Intermediate Fusion. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.uri	https://nur.nu.edu.kz/handle/123456789/8880
dc.language.iso	en
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences
dc.rights	CC0 1.0 Universal	en
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/
dc.subject	multimodal learning
dc.subject	emotion recognition
dc.subject	in- termediate fusion
dc.subject	electroencephalogram
dc.subject	multi-head attention.
dc.subject	type of access: embargo
dc.title	MULTIMODAL EMOTION RECOGNITION WITH EEG, AUDIO, AND VIDEO USING TRANSFORMER ENCODER FOR INTERMEDIATE FUSION
dc.type	Bachelor's thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Multimodal Emotion Recognition with EEG, Audio, and Video using Transformer Encoder for Intermediate Fusion
Size:: 4.7 MB
Format:: Adobe Portable Document Format
Description:: Bachelor's thesis

Download

Collections

03. Bachelor's Thesis