Sign language recognition using Convolutional Neural Network

This is a follow up of my previous post, Deep learning for sign language recoginition. This is another exercise resulted from Coursera’s Convolutional Neural Networks. This model differs from the previous one in that this one uses Convolutional Neural Network.

The Dataset

The dataset is obtained from Kaggle (https://www.kaggle.com/datamunge/sign-language-mnist). The training data has 27,455 examples and the test data has 7,172 examples. Each example is a 784 (28×28) pixel vector with grayscale values between 0-255. It has 24 classes of letters (excluding J and Z) in American Sign Language.

An illustration of the sign language is shown here (image courtesy of Kaggle):

Grayscale images with (0-255) pixel values:

Image of American Sign Language (gray scale)

One example in the MNIST dataset:

Convolutional Neural Network Architecture

My network architecture borrowed the ideas of LeNet-5 model (http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf), thanks to its relatively simple and easy to train network. The architecture is as follows:

CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED -> FULLYCONNECTED -> FULLYCONNECTED

The architecture is depicted below:

The hyperparameter values are learning_rate = 0.0001, num_epochs = 30, minibatch_size = 64, and optimizer = AdamOptimizer.

The program is written in Python and Tensorflow 1.x

The result:

Train Accuracy: 1.0
Test Accuracy: 0.89445066

Even though it is a relatively simple network, it achieved very good results. Compared to my previous deep learning model for the same task, this model is not only better in accuracy, but also faster in training time with much lower number of epochs.

Full source code can be found at https://github.com/minhthangdang/SignLanguageRecognitionCNN

Home » Knowledge Sharing » Sign language recognition using Convolutional Neural Network

The Dataset

Convolutional Neural Network Architecture

Share this: