This is a follow up of my previous post, Deep learning for sign language recoginition. This is another exercise resulted from Coursera’s Convolutional Neural Networks. This model differs from the previous one in that this one uses Convolutional Neural Network.
The Dataset
The dataset is obtained from Kaggle (https://www.kaggle.com/datamunge/sign-language-mnist). The training data has 27,455 examples and the test data has 7,172 examples. Each example is a 784 (28×28) pixel vector with grayscale values between 0-255. It has 24 classes of letters (excluding J and Z) in American Sign Language.
An illustration of the sign language is shown here (image courtesy of Kaggle):
Grayscale images with (0-255) pixel values:
One example in the MNIST dataset:
Convolutional Neural Network Architecture
My network architecture borrowed the ideas of LeNet-5 model (http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf), thanks to its relatively simple and easy to train network. The architecture is as follows:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED -> FULLYCONNECTED -> FULLYCONNECTED
The architecture is depicted below:
The hyperparameter values are learning_rate = 0.0001, num_epochs = 30, minibatch_size = 64, and optimizer = AdamOptimizer.
The program is written in Python and Tensorflow 1.x
The result:
Train Accuracy: 1.0
Test Accuracy: 0.89445066
Even though it is a relatively simple network, it achieved very good results. Compared to my previous deep learning model for the same task, this model is not only better in accuracy, but also faster in training time with much lower number of epochs.
Full source code can be found at https://github.com/minhthangdang/SignLanguageRecognitionCNN