EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS

Imankulov, Tair

EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS

Files

Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors (4.04 MB)

Date

2025-05-11

Authors

Imankulov, Tair

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Research on Human Activity Recognition (HAR) in smart home settings has become essential, opening up possibilities for automation, security, and medical applications. To improve recognition accuracy and real-time feasibility, this study integrates video, audio, and sensor data into a robust multimodal HAR framework using the YouHome ADL dataset. To successfully integrate various data modalities, the methodology uses advanced feature extraction, privacy-preserving preprocessing, and a late fusion approaches. To ensure effective representation learning for various modalities, the HAR system uses a YOLOv8n+MobileNetV2+BiLSTM for video, a BILSTM+GRU+TCN model for ambient sensor data, CNN+GRU for inertial sensor data, and a BiLSTM for audio based features classification. Class imbalances were also addressed by applying class merging and dataset balancing using SMOTE, which greatly increased classification accuracy and decreased bias toward overrepresented activities. The final multimodal HAR model performed significantly better than baseline techniques, according to experimental evaluations. On the top of that, several ablation studies evaluated the trade-offs between privacy, computational costs, and performance. Although real-time viability on devices with limited resources was not addressed, the results confirm the possible real-time execution of the proposed systems. These findings demonstrate how crucial fusion approach, model optimization, and data preprocessing are to build an efficient, multimodal HAR systems.

Keywords

Multimodal HAR, Efficient HAR, Activity recognition, type of access: open access

Citation

Imankulov, T. (2025). Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors. Nazarbayev University School of Engineering and Digital Sciences

URI

https://nur.nu.edu.kz/handle/123456789/8704

Collections

02. Master's Thesis

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Full item page

EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license