Abstract:
The automatic heart sound classification is an integral part of the early diagnosis
of cardiovascular diseases(CVDs). Even though advances in medical technologies
allow us to diagnose many CVDs, it remains one of the leading causes of death
worldwide due to its absence of symptoms at the initial stages. Thus, there is a huge
demand to develop other methods of identifying heart sound abnormalities that are
less expensive, simple, and applicable. Several audio feature extraction methods, in
combination with classification models, have been developed over time. However,
existing feature extraction methods are sensitive to noise, which negatively impacts
the performance of the heart sound classification model. In addition, there is a strong
need to develop models more sensitive to heart sound abnormalities in patients. In this
work, we address the limitations of extracted features by using spectrogram images
that are taken from Discrete Fourier Transform, and introducing them to Vision
Transformer Model. Results of our experiments on the benchmark of PhysioNet Heart
Sound Dataset show that the proposed method outperforms existing methodologies
with an accuracy of 0.925 and with a sensitivity score of 0.955