DEEP LEARNING APPROACHES FOR CLASSIFYING SIGN LANGUAGE GESTURES
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Engineering and Digital Sciences
Abstract
Communication is the basic need of the humankind and one of the cornerstones of human interaction in expressing their feelings, needs, emotions, and opinions clearly. Many platforms offer real-time video messaging options, but only for spoken language. This means that people with disabilities cannot communicate using these technologies without difficulties. This work addresses that omission by training deep networks to recognize the 25-letter Russian Sign Language (RSL) fingerspelling alphabet from RGB photos. With a 2,500-image Kaggle dataset, we use standardised preprocessing (resize 224×224, flipping, rotation, colour-jitter) and evaluate five convolutional pipelines: a tailored lightweight CNN, VGG-16, ResNet-18, ResNet-50 and EfficientNet-B0. Fine-tuned VGG-16 achieves the optimal trade-off with 92% macro-F1 and 91% accuracy on a stratified 15% test set, with the tailored CNN lagging by some 10 pp. Confusion-matrix analysis indicates errors are localized to visually similar hand configurations (e.g., А/Б), meaning that multi-view or segmentation will yield additional benefit. These findings show that moderate data and standard vision networks are enough to provide consistent RSL letter recognition, providing practical foundations for inclusive, real-time sign-to-text in video telephony and public-service kiosks.
Description
Keywords
Citation
Otegen, N. (2025). Deep learning approaches for classifying sign language gestures. Nazarbayev University School of Engineering and Digital Sciences
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NoDerivs 3.0 United States
