EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Engineering and Digital Sciences
Abstract
Research on Human Activity Recognition (HAR) in smart home settings has become essential, opening up possibilities for automation, security, and medical applications. To improve recognition accuracy and real-time feasibility, this study integrates video, audio, and sensor data into a robust multimodal HAR framework using the YouHome ADL dataset. To successfully integrate various data modalities, the methodology uses advanced feature extraction, privacy-preserving preprocessing, and a late fusion approaches.
To ensure effective representation learning for various modalities, the HAR system uses a YOLOv8n+MobileNetV2+BiLSTM for video, a BILSTM+GRU+TCN model for ambient sensor data, CNN+GRU for inertial sensor data, and a BiLSTM for audio based features classification. Class imbalances were also addressed by applying class merging and dataset balancing using SMOTE, which greatly increased classification accuracy and decreased bias toward overrepresented activities.
The final multimodal HAR model performed significantly better than baseline techniques, according to experimental evaluations. On the top of that, several ablation studies evaluated the trade-offs between privacy, computational costs, and performance. Although real-time viability on devices with limited resources was not addressed, the results confirm the possible real-time execution of the proposed systems. These findings demonstrate how crucial fusion approach, model optimization, and data preprocessing are to build an efficient, multimodal HAR systems.
Description
Citation
Imankulov, T. (2025). Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors. Nazarbayev University School of Engineering and Digital Sciences
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States
