DSpace Repository

VISION-LANGUAGE MODELS ON THE EDGE: AN ASSISTIVE TECHNOLOGY FOR THE VISUALLY IMPAIRED

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Arystanbekov, Batyr
dc.date.accessioned 2024-07-04T10:41:07Z
dc.date.available 2024-07-04T10:41:07Z
dc.date.issued 2024-06
dc.identifier.citation Arystanbekov, B. (2024). Vision-Language Models on the Edge: An Assistive Technology for the Visually Impaired. Nazarbayev University School of Engineering and Digital Sciences en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/8082
dc.description.abstract Vision-Language Models, or VLMs, are deep learning models at the intersection of Computer Vision and Natural Language Processing. They effectively combine image understanding and language generation capabilities and are widely used in various as sistive tasks today. Nevertheless, the application of VLMs to assist visually impaired and blind people remains an underexplored area in the field. Existing approaches to developing assistive technology for the visually impaired have a substantial limitation: the computation is usually performed on the cloud, which makes the systems heavily dependent on an internet connection and the state of the remote server. This makes the systems unreliable, which limits their practical usage in everyday tasks. In our work, to address the issues of the previous approaches, we propose utilizing VLMs on embedded systems, ensuring real-time efficiency and autonomy of the assistive module. We present an end-to-end workflow for developing the system, extensively covering hardware and software architecture and integration with speech recogni tion and text-to-speech technologies. The developed system possesses comprehensive scene interpretation and user navigation capabilities necessary for visually impaired individuals to enhance their day-to-day activities. Moreover, we confirm the prac tical application of the wearable assistive module by conducting experiments with actual human participants and provide subjective as well as objective results from the system’s assessment. en_US
dc.language.iso en en_US
dc.publisher Nazarbayev University School of Engineering and Digital Sciences en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Type of access: Restricted en_US
dc.title VISION-LANGUAGE MODELS ON THE EDGE: AN ASSISTIVE TECHNOLOGY FOR THE VISUALLY IMPAIRED en_US
dc.type Master's thesis en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States