MULTIMODAL EMOTION RECOGNITION WITH DEEP LEARNING AND FUSION MECHANISM

Myrzakhmet, Ayan; Kanafin, Ali; Kuanysh, Zhaksylyk

NUR Home
→
01.NU Schools
→
School of Engineering and Digital Sciences
→
Theses and Dissertations
→
View Item

dc.contributor.author	Myrzakhmet, Ayan
dc.contributor.author	Kanafin, Ali
dc.contributor.author	Kuanysh, Zhaksylyk
dc.date.accessioned	2024-06-15T06:27:15Z
dc.date.available	2024-06-15T06:27:15Z
dc.date.issued	2024-04-21
dc.identifier.citation	Myrzakhmet, A., Kanafin, A. & Kuasyh, Z. (2024). Multimodal Emotion Recognition with Deep Learning and Fusion Mechanism. Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/7872
dc.description.abstract	The increasing interest in multimodal techniques for emotion recognition has led to the incorporation of both physiological and non-physiological data. This study aims to develop an advanced emotion recognition model that combines electroencephalography (EEG), video, and audio data using a hybrid fusion strategy to enhance system performance. We investigated various fusion methods, including diverse algorithmic strategies and intermediate attention, which significantly boosted emotion recognition accuracy. Our comprehensive audio-video-EEG model, equipped with intermediate attention and hybrid multimodal fusion techniques, was successfully tested on the RAVDESS dataset and further tailored to a new dataset at Nazarbayev University's Multimedia Lab, which includes video, audio speech, and EEG data. This thorough integration effectively tackled multimodal signal fusion challenges, improving both model sophistication and accuracy. Experimental findings indicate that integrating physiological and non-physiological data substantially enhances the accuracy of emotion detection systems. With our tailored dataset, the top-performing model achieved an accuracy of 68\% for 21-second inputs and 57\% for 3.6-second inputs across five categories. Despite challenges such as variable temporal frames and data segmentation, the results highlight the potent effectiveness of multimodal fusion in advancing emotion recognition. This research not only paves the way for future advancements but also emphasizes the need for ongoing optimization and further exploration of signal integration techniques.	en_US
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Type of access: Open access	en_US
dc.title	MULTIMODAL EMOTION RECOGNITION WITH DEEP LEARNING AND FUSION MECHANISM	en_US
dc.type	Bachelor's thesis	en_US
workflow.import.source	science