Abstract:
The research focuses on developing an image classification system using a modified VGG-19 neural network to translate sign language for the deaf community. Given the initial isolation of deaf and mute individuals from society due to communication barriers—where normal people may not understand sign language—this system becomes pivotal. Leveraging deep learning and transfer learning, the model aims to address challenges such as diverse backgrounds and real-world conditions. The adapted VGG-19 architecture incorporates additional layers to enhance both feature extraction and classification performance. On top of the VGG19 base, additional layers are added: a global average pooling layer for dimensionality reduction, two dense layers with 1024 and 512 units, respectively for further feature processing, each followed by a dropout layer for regularisation, and finally, a dense output layer with a softmax activation function for generating probability scores across different classes.The data collection involves two approaches: downloading 1200 images from Kaggle and capturing 50 additional images for each class in various backgrounds using a camera. This is because the downloaded Kaggle images are all in black background. For the real-world system, additional images were collected, resulting in a total of 1250 images for each class. The dataset is meticulously curated to encompass various backgrounds and conditions, bolstering the model's robustness. Preprocessing involves manual cropping, resizing via OpenCV, and employing data augmentation techniques. The model, utilising the Adam optimizer and cross-entropy loss, undergoes training on a balanced dataset of 43,750 images, featuring 35 alphanumeric signs. The dataset is divided into training, validation, and testing sets with a ratio of 6:2:2. Initial training achieves 100% accuracy on the Kaggle dataset but experiences a drop to 58.9% on a more diverse dataset. However, bycombining datasets, the accuracy improves to 98.57%, and further finetuning results in an impressive 100% accuracy. The study underscores the real-time applicability of the approach, making a significant contribution to the development of an efficient system for classifying Indian Sign Language. The model's performance is substantiated through precision, recall, F1 score, and confusion matrix evaluations, highlighting its potential for fostering inclusive communication between deaf and nondeaf individuals