SEQUENTIAL MODELING AND CROSS-MODAL ATTENTION FOR EMOTION RECOGNITION IN HUMAN INTERACTIONS
| dc.contributor.author | Mukhamadiyeva, Aigerim | |
| dc.date.accessioned | 2025-06-02T10:10:41Z | |
| dc.date.available | 2025-06-02T10:10:41Z | |
| dc.date.issued | 2025-05-08 | |
| dc.description.abstract | Emotion recognition in human interactions is a main avenue of research in affective computing that is vital to developing emotionally intelligent systems. In this paper, a context-aware multimodal emotion recognition framework is presented, which includes a sequential approach to modeling and cross-modal attention, allowing for more effective integration of data from textual, audio, and visual modalities. The methodology employs modality-specific Bidirectional LSTM (BiLSTM) encoders to learn temporal dependencies as part of each modality. In the case of text, embeddings are constructed using Word2Vec, audio is transformed into features using Librosa, and visual data is encoded into vectors using OpenFace. To accommodate for the variation in relevance of each modality across the emotional contexts, a learnable cross-modal attention mechanism was applied which allows the model to dynamically fuse modality-specific embeddings as well as attend to the most informative cues from that specific interaction. The model was trained and evaluated on the IEMOCAP and MELD datasets, both consisting of multimodal conversations and content across a variety of emotions. Experimental results achieved a performance increase against baseline fusion strategies, while retaining computational efficiency and interpretability. By integrating sequential modeling with cross-modal attention, a strong yet scalable solution was produced for emotion recognition in real-world human interactions. | |
| dc.identifier.citation | Mukhamadiyeva, A. (2025). Sequential Modeling and Cross-Modal Attention for Emotion Recognition in Human Interactions. Nazarbayev University School of Engineering and Digital Sciences | |
| dc.identifier.uri | https://nur.nu.edu.kz/handle/123456789/8688 | |
| dc.language.iso | en | |
| dc.publisher | Nazarbayev University School of Engineering and Digital Sciences | |
| dc.rights | Attribution 3.0 United States | en |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/us/ | |
| dc.subject | Multimodal emotion recognition | |
| dc.subject | Cross-Modal Attention | |
| dc.subject | IEMOCAP | |
| dc.subject | MELD | |
| dc.subject | Sequential Modeling | |
| dc.subject | BiLSTM | |
| dc.subject | Deep Learning | |
| dc.subject | type of access: embargo | |
| dc.title | SEQUENTIAL MODELING AND CROSS-MODAL ATTENTION FOR EMOTION RECOGNITION IN HUMAN INTERACTIONS | |
| dc.type | Master`s thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Thesis_Aigerim_Mukhamadiyeva.pdf
- Size:
- 5.41 MB
- Format:
- Adobe Portable Document Format
- Description:
- Master`s thesis