DSpace Repository

FEW-SHOT MEDICAL IMAGE CLASSIFICATION USING VISION TRANSFORMERS

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Nurgazin, Maxat
dc.date.accessioned 2023-06-13T09:41:51Z
dc.date.available 2023-06-13T09:41:51Z
dc.date.issued 2023
dc.identifier.citation Nurgazin, M. (2023). Few-shot Medical Image Classification using Vision Transformers. School of Engineering and Digital Sciences en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/7219
dc.description.abstract The analysis of medical imaging is crucial to improve and facilitate diagnosis of human diseases. Recently, Vision Transformers were successfully used for this task. However, lots of data is needed to train such a model to achieve satisfying results. It may be a problem in medical imaging as some diseases are rare and scarcely represented in datasets, while manual labeling is expensive as it requires professional expertise. For that, methods of few-shot learning can be used as they deal with learning from only few examples. Therefore, this research investigates the use of different Vision Transformer architectures for medical image classification in a few-shot learning scenario using two few-shot learning algorithms, ProtoNet and Reptile. This work also proposes a new ViT architecture which combines ConViT with Squeeze and Excitation block. In addition to the main experiments, we tested Cutout, Mixup, and Cutmix data augmentation techniques to evaluate their impact on performance. Our findings indicate that Vision Transformers used with ProtoNets consistently outperform similarly-sized CNNs in the tested scenarios. Additionally, ViT small outperformed PFEMed, a specialized model for few-shot learning, on ISIC 2018 dataset in all tasks and on BreakHis x100 dataset in 2-shot-10-way and all 3-way tasks, despite being significantly smaller. Our proposed model did not perform better than a standard ConVit. However, this is a preliminary result from pre-training on a small dataset. The advanced input augmentation techniques did not yield significant performance improvements over the standard approach. In fact, most of these techniques led to worse results, with the exception of Mixup, which demonstrated some positive effects on the performance of models en_US
dc.language.iso en en_US
dc.publisher School of Engineering and Digital Sciences en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Type of access: Embargo en_US
dc.subject Vision Transformers en_US
dc.subject Medical Image Classification en_US
dc.title FEW-SHOT MEDICAL IMAGE CLASSIFICATION USING VISION TRANSFORMERS en_US
dc.type Master's thesis en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States