VISION-LANGUAGE MODELS ON THE EDGE: AN ASSISTIVE TECHNOLOGY FOR THE VISUALLY IMPAIRED

Arystanbekov, Batyr

NUR Home
→
01.NU Schools
→
School of Engineering and Digital Sciences
→
Theses and Dissertations
→
View Item

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

dc.contributor.author	Arystanbekov, Batyr
dc.date.accessioned	2024-07-04T10:41:07Z
dc.date.available	2024-07-04T10:41:07Z
dc.date.issued	2024-06
dc.identifier.citation	Arystanbekov, B. (2024). Vision-Language Models on the Edge: An Assistive Technology for the Visually Impaired. Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/8082
dc.description.abstract	Vision-Language Models, or VLMs, are deep learning models at the intersection of Computer Vision and Natural Language Processing. They effectively combine image understanding and language generation capabilities and are widely used in various as sistive tasks today. Nevertheless, the application of VLMs to assist visually impaired and blind people remains an underexplored area in the field. Existing approaches to developing assistive technology for the visually impaired have a substantial limitation: the computation is usually performed on the cloud, which makes the systems heavily dependent on an internet connection and the state of the remote server. This makes the systems unreliable, which limits their practical usage in everyday tasks. In our work, to address the issues of the previous approaches, we propose utilizing VLMs on embedded systems, ensuring real-time efficiency and autonomy of the assistive module. We present an end-to-end workflow for developing the system, extensively covering hardware and software architecture and integration with speech recogni tion and text-to-speech technologies. The developed system possesses comprehensive scene interpretation and user navigation capabilities necessary for visually impaired individuals to enhance their day-to-day activities. Moreover, we confirm the prac tical application of the wearable assistive module by conducting experiments with actual human participants and provide subjective as well as objective results from the system’s assessment.	en_US
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Type of access: Restricted	en_US
dc.title	VISION-LANGUAGE MODELS ON THE EDGE: AN ASSISTIVE TECHNOLOGY FOR THE VISUALLY IMPAIRED	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science