EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS

dc.contributor.authorImankulov, Tair
dc.date.accessioned2025-06-02T12:04:11Z
dc.date.available2025-06-02T12:04:11Z
dc.date.issued2025-05-11
dc.description.abstractResearch on Human Activity Recognition (HAR) in smart home settings has become essential, opening up possibilities for automation, security, and medical applications. To improve recognition accuracy and real-time feasibility, this study integrates video, audio, and sensor data into a robust multimodal HAR framework using the YouHome ADL dataset. To successfully integrate various data modalities, the methodology uses advanced feature extraction, privacy-preserving preprocessing, and a late fusion approaches. To ensure effective representation learning for various modalities, the HAR system uses a YOLOv8n+MobileNetV2+BiLSTM for video, a BILSTM+GRU+TCN model for ambient sensor data, CNN+GRU for inertial sensor data, and a BiLSTM for audio based features classification. Class imbalances were also addressed by applying class merging and dataset balancing using SMOTE, which greatly increased classification accuracy and decreased bias toward overrepresented activities. The final multimodal HAR model performed significantly better than baseline techniques, according to experimental evaluations. On the top of that, several ablation studies evaluated the trade-offs between privacy, computational costs, and performance. Although real-time viability on devices with limited resources was not addressed, the results confirm the possible real-time execution of the proposed systems. These findings demonstrate how crucial fusion approach, model optimization, and data preprocessing are to build an efficient, multimodal HAR systems.
dc.identifier.citationImankulov, T. (2025). Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/8704
dc.language.isoen
dc.publisherNazarbayev University School of Engineering and Digital Sciences
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/
dc.subjectMultimodal HAR
dc.subjectEfficient HAR
dc.subjectActivity recognition
dc.subjecttype of access: open access
dc.titleEFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS
dc.typeMaster`s thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors
Size:
4.04 MB
Format:
Adobe Portable Document Format
Description:
Master`s thesis