Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.
dc.contributor.author | Nurgazin, Maxat![]() |
|
dc.date.accessioned | 2023-06-13T09:41:51Z | |
dc.date.available | 2023-06-13T09:41:51Z | |
dc.date.issued | 2023 | |
dc.identifier.citation | Nurgazin, M. (2023). Few-shot Medical Image Classification using Vision Transformers. School of Engineering and Digital Sciences | en_US |
dc.identifier.uri | http://nur.nu.edu.kz/handle/123456789/7219 | |
dc.description.abstract | The analysis of medical imaging is crucial to improve and facilitate diagnosis of human diseases. Recently, Vision Transformers were successfully used for this task. However, lots of data is needed to train such a model to achieve satisfying results. It may be a problem in medical imaging as some diseases are rare and scarcely represented in datasets, while manual labeling is expensive as it requires professional expertise. For that, methods of few-shot learning can be used as they deal with learning from only few examples. Therefore, this research investigates the use of different Vision Transformer architectures for medical image classification in a few-shot learning scenario using two few-shot learning algorithms, ProtoNet and Reptile. This work also proposes a new ViT architecture which combines ConViT with Squeeze and Excitation block. In addition to the main experiments, we tested Cutout, Mixup, and Cutmix data augmentation techniques to evaluate their impact on performance. Our findings indicate that Vision Transformers used with ProtoNets consistently outperform similarly-sized CNNs in the tested scenarios. Additionally, ViT small outperformed PFEMed, a specialized model for few-shot learning, on ISIC 2018 dataset in all tasks and on BreakHis x100 dataset in 2-shot-10-way and all 3-way tasks, despite being significantly smaller. Our proposed model did not perform better than a standard ConVit. However, this is a preliminary result from pre-training on a small dataset. The advanced input augmentation techniques did not yield significant performance improvements over the standard approach. In fact, most of these techniques led to worse results, with the exception of Mixup, which demonstrated some positive effects on the performance of models | en_US |
dc.language.iso | en | en_US |
dc.publisher | School of Engineering and Digital Sciences | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/us/ | * |
dc.subject | Type of access: Embargo | en_US |
dc.subject | Vision Transformers | en_US |
dc.subject | Medical Image Classification | en_US |
dc.title | FEW-SHOT MEDICAL IMAGE CLASSIFICATION USING VISION TRANSFORMERS | en_US |
dc.type | Master's thesis | en_US |
workflow.import.source | science |
The following license files are associated with this item: