COVID-19 CLASSIFICATION IN CT IMAGES WITH CONVOLUTIONAL NEURAL NETWORK-BASED ENSEMBLE LEARNING

Kushenchirekova, DinaCOVID-19 CLASSIFICATION IN CT IMAGES WITH CONVOLUTIONAL NEURAL NETWORK-BASED ENSEMBLE LEARNINGNazarbayev University School of Engineering and Digital Sciences2022SARS-COV-2 CT datasettype of access: open accessCOVID-19CNNdeep learningconvolutional neural networksResearch Subject Categories::TECHNOLOGYMy UniversityMy University2022-06-212022-06-212022-04enMaster's thesisKushenchirekova, D. (2022). COVID-19 CLASSIFICATION IN CT IMAGES WITH CONVOLUTIONAL NEURAL NETWORK-BASED ENSEMBLE LEARNING (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstanhttp://nur.nu.edu.kz/handle/123456789/6287Attribution-NonCommercial-ShareAlike 3.0 United Stateshttp://creativecommons.org/licenses/by-nc-sa/3.0/us/The coronavirus infection has spread all over the world with great speed and the virus continues to grow and change. The COVID-19 infection that became a cause of the pandemic was a huge issue that people faced. Deep learning has a significant and important part in application of medical image analysis, and in this paper we use deep learning and convolutional neural network (CNN) methods. CNN helps us to classify our formations, since it is an effective tool at image classification. Deep learning is the field of Artificial Intelligence that copes with the classification problems, such as classifying and recognizing COVID-19 infection using computer tomography (CT) images that contain lungs. In the study, we utilize several of the most popular convolutional neural networks and evaluate them using the common metrics. Among 8 CNN architectures we used, which are VGG-19, VGG-16, MobileNetV2, Xception, ResNet50V2, DenseNet201, Inception-V3, and EfficientNetB3, the most efficient and outperforming was VGG-19, as it achieved the highest accuracy score. Specifically, the VGG-16 CNN architecture’s accuracy on CovidX CT dataset is 0.97, on SARSCoV- 2 CT dataset is 0.95, and on UCSD COVID-CT dataset the score is 0.94. The arisen question now is how to properly utilize data mining to build an efficient detection system and mining framework. To answer the question we decide to use ensemble learning, which integrates fusion, modeling, and mining into a single model. Our proposal is ensemble learning algorithm that substantially stacks several neural network architectures into one. The logic behind the method is to extract features from the images using several of the above-mentioned models and combine the features into a "stack". The results suggest that the method performs better than each individual architecture. As the ensemble model considers each of the features and the losses provided by the models, the resultant loss is lower. This results in a higher accuracy score. In this way, we achieved the Ensemble model’s accuracy of 0.9867 for the UCSD COVID-CT dataset, while the highest accuracy of the individual model was 0.945. As a result of the SVM integrated alternative methodology, ensemble model has shown the accuracy of 0.982 for SARS-COV-2 CT dataset.