VISION-LANGUAGE MODELS ON THE EDGE: AN ASSISTIVE TECHNOLOGY FOR THE VISUALLY IMPAIRED

dc.contributor.authorArystanbekov, Batyr
dc.date.accessioned2024-07-04T10:41:07Z
dc.date.available2024-07-04T10:41:07Z
dc.date.issued2024-06
dc.description.abstractVision-Language Models, or VLMs, are deep learning models at the intersection of Computer Vision and Natural Language Processing. They effectively combine image understanding and language generation capabilities and are widely used in various as sistive tasks today. Nevertheless, the application of VLMs to assist visually impaired and blind people remains an underexplored area in the field. Existing approaches to developing assistive technology for the visually impaired have a substantial limitation: the computation is usually performed on the cloud, which makes the systems heavily dependent on an internet connection and the state of the remote server. This makes the systems unreliable, which limits their practical usage in everyday tasks. In our work, to address the issues of the previous approaches, we propose utilizing VLMs on embedded systems, ensuring real-time efficiency and autonomy of the assistive module. We present an end-to-end workflow for developing the system, extensively covering hardware and software architecture and integration with speech recogni tion and text-to-speech technologies. The developed system possesses comprehensive scene interpretation and user navigation capabilities necessary for visually impaired individuals to enhance their day-to-day activities. Moreover, we confirm the prac tical application of the wearable assistive module by conducting experiments with actual human participants and provide subjective as well as objective results from the system’s assessment.en_US
dc.identifier.citationArystanbekov, B. (2024). Vision-Language Models on the Edge: An Assistive Technology for the Visually Impaired. Nazarbayev University School of Engineering and Digital Sciencesen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/8082
dc.language.isoenen_US
dc.publisherNazarbayev University School of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjecttype of access: restricted accessen_US
dc.titleVISION-LANGUAGE MODELS ON THE EDGE: AN ASSISTIVE TECHNOLOGY FOR THE VISUALLY IMPAIREDen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Batyr_Arystanbekov_Thesis.pdf
Size:
4.74 MB
Format:
Adobe Portable Document Format
Description:
Master`s thesis