Abstract:
Facial expression recognition is an active area of research in computer vision and deep
learning, which has become popular in recent decades. The results of these studies
are used in psychology, behavioral science and computer-human interaction. Emotion
recognition is a very difficult task, since it is necessary to overcome such difficulties as
the presence of a large number of images, head rotation, lighting conditions, partial
face closure (glasses, mask, hand, etc.) In this regard, in this practical study, we use
different models of Vision Transformer (ViT) to improve the accuracy of classification
on publicly available datasets of CK+ and JAFFE. The results obtained show that
we have achieved excellent accuracy values compared to state-of-the-art works using
a fewer computational resource to train.
Keywords— facial expression recognition, Vision Transformer, attention mechanism,
image classification