Abstract
This review synthesizes research on static American Sign Language (ASL) alphabet recognition from images, comparing traditional machine learning pipelines, convolutional neural network (CNN) transfer learning, and hybrid or transformer-based models. The analysis spans 2016 to 2025 studies that detail preprocessing, model design, and quantitative results on datasets such as the ASL Alphabet and Sign Language MNIST. Classical approaches using engineered features with classifiers such as Support Vector Machines (SVMs) or Random Forests perform well in controlled settings but rely on robust segmentation and handcrafted descriptors. Transfer learning on CNN backbones, including MobileNetV2, ResNet, EfficientNet, DenseNet, and the Visual Geometry Group (VGG) models, achieves near-perfect within-dataset accuracy; pure and modified Vision Transformers (ViTs) and CNN–transformer hybrids also reach ceiling-level performance with favorable speed-to-accuracy tradeoffs. Most evaluations remain closed set and seldom report signer-independent splits, cross-dataset transfer, or deployment metrics.
| Original language | English |
|---|---|
| Title of host publication | Improving Quality of Life for People with Disabilities Through Smart Technologies |
| Publisher | IGI Global |
| Pages | 237-272 |
| Number of pages | 36 |
| ISBN (Electronic) | 9798337320359 |
| ISBN (Print) | 9798337320335 |
| DOIs | |
| Publication status | Published - Dec 2025 |