Deep CNN-Based Recognition of JSL Finger Spelling
In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language on RGB images. The finger spelled signs were recognized by an ensemble consisting of a ResNet-based convolutional neural network and two ResNet quaternion convolutional neural networks. A 3D articulated hand model has been used to generate synthetic finger spellings and to extend a dataset consisting of real hand gestures. Twelve different gesture realizations were prepared for each of 41 signs. Ten images have been rendered for each realization through interpolations between the starting and end poses. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. Results achieved by the ResNet quaternion convolutional neural network are better than results obtained by the ResNet CNN. The best recognition results were achieved by the ensemble. The JSL-rend dataset is available for download.
This work was supported by Polish National Science Center (NCN) under a research grant 2017/27/B/ST6/01743 and JSPS KAKENHI under a grant 17H06114.
- 3.Raj, M.D., Gogul, I., Thangaraja, M., Kumar, V.: Static gesture recognition based precise positioning of 5-DOF robotic arm using FPGA. In: Trends in Industrial Measurement and Automation (TIMA), pp. 1–6 (2017)Google Scholar
- 5.Patil, S., et al.: GesturePod: programmable gesture recognition for augmenting assistive devices, Technical report, Microsoft, May 2018Google Scholar
- 10.Oyedotun, O., Khashman, A.: Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl., 1–11 (2016)Google Scholar
- 12.Nagi, J., Ducatelle, F., et al.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: IEEE ICSIP, pp. 342–347 (2011)Google Scholar
- 14.Koller, O., Ney, H., Bowden, R.: Deep hand: how to train a CNN on 1 million hand images when your data is continuous and weakly labelled. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802 (2016)Google Scholar
- 15.Tabata, Y., Kuroda, T.: Finger spelling recognition using distinctive features of hand shape. In: International Conference on Disability, Virtual Reality and Associated Technologies with Art Abilitation, pp. 287–292 (2008)Google Scholar
- 18.Rosalina, L.Y., Hadisukmana, N., Wahyu, R.B., Roestam, R., Wahyu, Y.: Implementation of real-time static hand gesture recognition using artificial neural network. In: CAIPT, pp. 1–6 (2017)Google Scholar
- 20.Dawod, A.Y., Nordin, M.J., Abdullah, J.: Static fingerspelling recognition based on boundary tracing algorithm and chain code. In: International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, pp. 104–109. ACM (2018)Google Scholar
- 21.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
- 22.Parcollet, T., et al.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. In: Interspeech, ISCA, pp. 22–26 (2018)Google Scholar
- 24.Nitta, T.: A quaternary version of the back-propagation algorithm. In: Proceedings of International Conference on Neural Networks, vol. 5, pp. 2753–2756 (1995)Google Scholar