DEEP LEARNING APPROACHES FOR CLASSIFYING SIGN LANGUAGE GESTURES

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Communication is the basic need of the humankind and one of the cornerstones of human interaction in expressing their feelings, needs, emotions, and opinions clearly. Many platforms offer real-time video messaging options, but only for spoken language. This means that people with disabilities cannot communicate using these technologies without difficulties. This work addresses that omission by training deep networks to recognize the 25-letter Russian Sign Language (RSL) fingerspelling alphabet from RGB photos. With a 2,500-image Kaggle dataset, we use standardised preprocessing (resize 224×224, flipping, rotation, colour-jitter) and evaluate five convolutional pipelines: a tailored lightweight CNN, VGG-16, ResNet-18, ResNet-50 and EfficientNet-B0. Fine-tuned VGG-16 achieves the optimal trade-off with 92% macro-F1 and 91% accuracy on a stratified 15% test set, with the tailored CNN lagging by some 10 pp. Confusion-matrix analysis indicates errors are localized to visually similar hand configurations (e.g., А/Б), meaning that multi-view or segmentation will yield additional benefit. These findings show that moderate data and standard vision networks are enough to provide consistent RSL letter recognition, providing practical foundations for inclusive, real-time sign-to-text in video telephony and public-service kiosks.

Description

Citation

Otegen, N. (2025). Deep learning approaches for classifying sign language gestures. Nazarbayev University School of Engineering and Digital Sciences

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NoDerivs 3.0 United States