EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Research on Human Activity Recognition (HAR) in smart home settings has become essential, opening up possibilities for automation, security, and medical applications. To improve recognition accuracy and real-time feasibility, this study integrates video, audio, and sensor data into a robust multimodal HAR framework using the YouHome ADL dataset. To successfully integrate various data modalities, the methodology uses advanced feature extraction, privacy-preserving preprocessing, and a late fusion approaches. To ensure effective representation learning for various modalities, the HAR system uses a YOLOv8n+MobileNetV2+BiLSTM for video, a BILSTM+GRU+TCN model for ambient sensor data, CNN+GRU for inertial sensor data, and a BiLSTM for audio based features classification. Class imbalances were also addressed by applying class merging and dataset balancing using SMOTE, which greatly increased classification accuracy and decreased bias toward overrepresented activities. The final multimodal HAR model performed significantly better than baseline techniques, according to experimental evaluations. On the top of that, several ablation studies evaluated the trade-offs between privacy, computational costs, and performance. Although real-time viability on devices with limited resources was not addressed, the results confirm the possible real-time execution of the proposed systems. These findings demonstrate how crucial fusion approach, model optimization, and data preprocessing are to build an efficient, multimodal HAR systems.

Description

Citation

Imankulov, T. (2025). Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors. Nazarbayev University School of Engineering and Digital Sciences

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States