ENHANCING EMERGENCY RESPONSE: THE ROLE OF INTEGRATED VISION-LANGUAGE MODELS IN IN-HOME HEALTHCARE AND EFFICIENT MULTIMEDIA RETRIEVAL

Abdrakhmanov, Rakhat

ENHANCING EMERGENCY RESPONSE: THE ROLE OF INTEGRATED VISION-LANGUAGE MODELS IN IN-HOME HEALTHCARE AND EFFICIENT MULTIMEDIA RETRIEVAL

dc.contributor.author	Abdrakhmanov, Rakhat
dc.date.accessioned	2024-07-09T05:40:02Z
dc.date.available	2024-07-09T05:40:02Z
dc.date.issued	2024-06
dc.description.abstract	Incidents of in-home injuries and sudden critical health conditions are relatively common and necessitate swift medical expertise. This study introduces an innovative use of vision-language models (VLMs) to elevate human healthcare through improved emergency recognition and efficient multimedia search capabilities. By harnessing the combined strengths of large language models (LLMs) and vision transformers (ViTs), this study enhances the analysis of both visual and textual information. We propose a framework that utilizes the PrismerZ VLM in both its Base and Large forms, along with a key frame selection (KFS) algorithm, to pinpoint and exam- ine pertinent images within video streams. This allows for the creation of enriched datasets, filled with images that are paired with descriptive narratives and insights gained from visual question answering (VQA). Through the integration of the CLIP- ViT-L-14 model and the MongoDB Atlas cloud database, we developed a multimodal retrieval system that achieves complex query handling and improved user experience. Additionally, this research undertakes data collection to assess the system’s adaptabil- ity, providing proof of concept and refining the framework. The results showcase the system’s robustness, evidenced by high accuracy rates—86.5% in image captioning and 92.5% in VQA tasks—on the kinetics dataset. When tested with human subject data, the PrismerZ Large model achieved 85.8% accuracy in image captioning and 87.5% in VQA tasks. This performance was further enhanced through fine-tuning with the GPT-4 based Chat GPT, one of the largest language assistants, leading to a 20% improvement in semantic text similarity as measured by the BERT model. The PrismerZ models also stand out for their speed, with the Base and Large versions processing image captioning and VQA tasks in just seconds, even on the NVidia Jet- son Orin NX edge device. These findings confirm the system’s reliability in real-life scenarios. The multimodal retrieval system achieved top performance with a mean average precision at k (MAP@k) of 93% and mean reciprocal rank (MRR) of 94.79% on the kinetics dataset, maintaining an average search latency of merely 0.33 seconds for text queries. This research significantly propels the fields of human activity recognition (HAR) and emergency detection forward, carving out new paths for anomaly detection and enriched multimedia understanding. Our objective in integrating the VLM with multimedia information retrieval is to establish new benchmarks for hu- man care, improving its timeliness, comprehensiveness, and efficiency in accessing multimedia data	en_US
dc.identifier.citation	Abdrakhmanov, R. (2024). Enhancing Emergency Response: The Role of Integrated Vision-Language Models in In-Home Healthcare and Efficient Multimedia Retrieval. Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/8096
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	type of access: restricted access	en_US
dc.title	ENHANCING EMERGENCY RESPONSE: THE ROLE OF INTEGRATED VISION-LANGUAGE MODELS IN IN-HOME HEALTHCARE AND EFFICIENT MULTIMEDIA RETRIEVAL	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: RakhatAbdrakhmanov_ThesisManuscript.docx.pdf
Size:: 769.05 KB
Format:: Adobe Portable Document Format
Description:: Master`s thesis

Download

Collections

02. Master's Thesis