FEW-SHOT MEDICAL IMAGE CLASSIFICATION USING VISION TRANSFORMERS

Nurgazin, Maxat

NUR Home
→
01.NU Schools
→
School of Engineering and Digital Sciences
→
Theses and Dissertations
→
View Item

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

dc.contributor.author	Nurgazin, Maxat
dc.date.accessioned	2023-06-13T09:41:51Z
dc.date.available	2023-06-13T09:41:51Z
dc.date.issued	2023
dc.identifier.citation	Nurgazin, M. (2023). Few-shot Medical Image Classification using Vision Transformers. School of Engineering and Digital Sciences	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/7219
dc.description.abstract	The analysis of medical imaging is crucial to improve and facilitate diagnosis of human diseases. Recently, Vision Transformers were successfully used for this task. However, lots of data is needed to train such a model to achieve satisfying results. It may be a problem in medical imaging as some diseases are rare and scarcely represented in datasets, while manual labeling is expensive as it requires professional expertise. For that, methods of few-shot learning can be used as they deal with learning from only few examples. Therefore, this research investigates the use of different Vision Transformer architectures for medical image classification in a few-shot learning scenario using two few-shot learning algorithms, ProtoNet and Reptile. This work also proposes a new ViT architecture which combines ConViT with Squeeze and Excitation block. In addition to the main experiments, we tested Cutout, Mixup, and Cutmix data augmentation techniques to evaluate their impact on performance. Our findings indicate that Vision Transformers used with ProtoNets consistently outperform similarly-sized CNNs in the tested scenarios. Additionally, ViT small outperformed PFEMed, a specialized model for few-shot learning, on ISIC 2018 dataset in all tasks and on BreakHis x100 dataset in 2-shot-10-way and all 3-way tasks, despite being significantly smaller. Our proposed model did not perform better than a standard ConVit. However, this is a preliminary result from pre-training on a small dataset. The advanced input augmentation techniques did not yield significant performance improvements over the standard approach. In fact, most of these techniques led to worse results, with the exception of Mixup, which demonstrated some positive effects on the performance of models	en_US
dc.language.iso	en	en_US
dc.publisher	School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Type of access: Embargo	en_US
dc.subject	Vision Transformers	en_US
dc.subject	Medical Image Classification	en_US
dc.title	FEW-SHOT MEDICAL IMAGE CLASSIFICATION USING VISION TRANSFORMERS	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science