ENHANCING AMBIENT ASSISTED LIVING WITH MULTI-MODAL VISION AND LANGUAGE MODELS: A NOVEL APPROACH FOR REAL-TIME ABNORMAL BEHAVIOR DETECTION AND EMERGENCY RESPONSE

dc.contributor.authorZhiyenbayev, Adil
dc.date.accessioned2024-06-23T20:17:44Z
dc.date.available2024-06-23T20:17:44Z
dc.date.issued2024-04-28
dc.description.abstractThe global demographic forecast predicts a surge to over 1.9 billion individuals by 2050, escalating the demand for efficient healthcare delivery, particularly for the elderly and disabled, who frequently require caregiving due to prevalent mental and physical health issues. This demographic trend underscores the critical need for robust long-term care services and continuous monitoring systems. However, the efficacy of these solutions is often compromised by caregiver overload, financial constraints, and logistical challenges in transportation, necessitating advanced technological interventions. In response, researchers have been refining ambient assisted living (AAL) environments through the integration of human activity recognition (HAR) utilizing advanced machine learning (ML) and deep learning (DL) techniques. These methods aim to reduce emergency incidents and enhance early detection and intervention. Traditional sensor-based HAR systems, despite their utility, suffer from significant limitations, including high data variability, environmental interference, and contextual inadequacies. To address these issues, vision language models (VLMs) enhance detection accuracy by interpreting scene contexts via caption generation, visual question answering (VQA), commonsense reasoning, and action recognition. However, VLMs face challenges in real-time application scenarios due to language ambiguity and occlusions, which can degrade the detection accuracy. Large language models (LLMs) combined with text-to-speech (TTS) and speech-to-text (STT) technologies can facilitate direct communication with the individual and enable real-time interactive assessments of a situation. Integrating real-time conversational capabilities via LLM, TTS, and STT into VLM framework significantly improves the detection of abnormal behavior by leveraging a comprehensive scene understanding and direct patient feedback, thus enhancing the system's reliability. A qualitative evaluation showed high system usability results in a subjective questionnaire during real-time experiments with participants. A quantitative evaluation of the developed system demonstrated high performance, achieving detection accuracy and recall rates of 93.44\% and 95\%, respectively, and a specificity rate of 88.88\% in various emergency scenarios before interaction. After the interaction stage, the performance was boosted to 100\% accuracy due to increased context from user's responses. Furthermore, the system not only effectively identifies emergencies but also provides contextual summaries and actionable recommendations to caregivers and patients. The research introduces a multimodal framework that combines VLMs, LLMs, TTS, and STT for real-time abnormal behavior detection and assistance. This study aims to develop a comprehensive framework that overcomes traditional HAR and AAL limitations by integrating instructions-driven VLM, LLM, human detection, TTS, and STT modules to enhance emergency response efficiency in home environments. This innovative approach promises substantial advancements in the field of AAL by providing timely and context-aware detection and response in emergencies.en_US
dc.identifier.citationZhiyenbayev, A. (2024). Enhancing Ambient Assisted Living: Multi-Modal Vision and Language Models for Real-Time Emergency Response. Nazarbayev University School of Engineering and Digital Sciencesen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/7973
dc.language.isoenen_US
dc.publisherNazarbayev University School of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectType of access: Restricteden_US
dc.subjectAmbient assisted livingen_US
dc.subjectHuman activity recognitionen_US
dc.subjectVision-language modelsen_US
dc.subjectLarge language modelsen_US
dc.subjectSpeech modelsen_US
dc.subjectPrompt engineeringen_US
dc.titleENHANCING AMBIENT ASSISTED LIVING WITH MULTI-MODAL VISION AND LANGUAGE MODELS: A NOVEL APPROACH FOR REAL-TIME ABNORMAL BEHAVIOR DETECTION AND EMERGENCY RESPONSEen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Thesis_A.Z..pdf
Size:
2.92 MB
Format:
Adobe Portable Document Format
Description:
Master`s thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.28 KB
Format:
Item-specific license agreed upon to submission
Description: