EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS
| dc.contributor.author | Imankulov, Tair | |
| dc.date.accessioned | 2025-06-02T12:04:11Z | |
| dc.date.available | 2025-06-02T12:04:11Z | |
| dc.date.issued | 2025-05-11 | |
| dc.description.abstract | Research on Human Activity Recognition (HAR) in smart home settings has become essential, opening up possibilities for automation, security, and medical applications. To improve recognition accuracy and real-time feasibility, this study integrates video, audio, and sensor data into a robust multimodal HAR framework using the YouHome ADL dataset. To successfully integrate various data modalities, the methodology uses advanced feature extraction, privacy-preserving preprocessing, and a late fusion approaches. To ensure effective representation learning for various modalities, the HAR system uses a YOLOv8n+MobileNetV2+BiLSTM for video, a BILSTM+GRU+TCN model for ambient sensor data, CNN+GRU for inertial sensor data, and a BiLSTM for audio based features classification. Class imbalances were also addressed by applying class merging and dataset balancing using SMOTE, which greatly increased classification accuracy and decreased bias toward overrepresented activities. The final multimodal HAR model performed significantly better than baseline techniques, according to experimental evaluations. On the top of that, several ablation studies evaluated the trade-offs between privacy, computational costs, and performance. Although real-time viability on devices with limited resources was not addressed, the results confirm the possible real-time execution of the proposed systems. These findings demonstrate how crucial fusion approach, model optimization, and data preprocessing are to build an efficient, multimodal HAR systems. | |
| dc.identifier.citation | Imankulov, T. (2025). Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors. Nazarbayev University School of Engineering and Digital Sciences | |
| dc.identifier.uri | https://nur.nu.edu.kz/handle/123456789/8704 | |
| dc.language.iso | en | |
| dc.publisher | Nazarbayev University School of Engineering and Digital Sciences | |
| dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | |
| dc.subject | Multimodal HAR | |
| dc.subject | Efficient HAR | |
| dc.subject | Activity recognition | |
| dc.subject | type of access: open access | |
| dc.title | EFFICIENT MULTIMODAL HUMAN ACTIVITY RECOGNITION (HAR) SOLUTIONS USING VIDEO, AUDIO, AND SENSORS | |
| dc.type | Master`s thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Efficient Multimodal Human Activity Recognition (HAR) solutions using Video, Audio, and Sensors
- Size:
- 4.04 MB
- Format:
- Adobe Portable Document Format
- Description:
- Master`s thesis