Abstract
In this paper, we investigate the state-of-the-art deep learning methods for sign language recognition. In order to achieve this goal, Capsule Network (CapsNet) is proposed in this paper, which shows positive result. We also propose a Selective Kernel Network (SKNet) with attention mechanism in order to extract spatial information. Sign language as an important means of communications, the problems of recognizing sign language from digital videos in real time have become the new challenge of this research field. The contributions of this paper are: (1) The CapsNet attains the accuracy of overall recognition up to 98.72% based on our own dataset. (2) SKNet with attention mechanism is able to achieve the best recognition accuracy 98.88%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi.org/10.1007/s10462-012-9356-9
Dardas, N.H., Georganas, N.D.: Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Meas. 60(11), 3592–3607 (2011)
Tharwat, A., Gaber, T., Hassanien, A.E., Shahin, M.K., Refaat, B.: SIFT-based arabic sign language recognition system. In: Abraham, A., Krömer, P., Snasel, V. (eds.) Afro-European Conference for Industrial Advancement. AISC, vol. 334, pp. 359–370. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13572-4_30
Jasim, M., Hasanuzzaman, M.: Sign language interpretation using linear discriminant analysis and local binary patterns. In: International Conference on Informatics, Electronics & Vision, pp. 1–5 (2014)
Cote, M., Payeur, P., Comeau, G.: Comparative study of adaptive segmentation techniques for gesture analysis in unconstrained environments. In: IEEE International Workshop on Imagining Systems and Techniques, pp. 28–33 (2006)
Lu, J., Shen, J., Yan, W., Bacic, B.: An empirical study for human behavior analysis. Int. J. Digit. Crime Forensics 9, 11–27 (2017)
Asadi-Aghbolaghi, M., et al.: A survey on deep learning based approaches for action and gesture recognition in image sequences. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 476–483 (2017)
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: IEEE Conference on Computer Vision and Pattern Recognition (2004)
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence (2017)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Rao, G.A., Syamala, K., Kishore, P.V.V., Sastry, A.S.C.S.: Deep convolutional neural networks for sign language recognition. In: The Conference on Signal Processing and Communication Engineering Systems, pp. 194–197 (2018)
Koller, O., Ney, H., Bowden, R.: Deep hand: how to train a CNN on 1 million hand images when your data is continuous and weakly labelled. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802 (2016)
Wu, J., Ishwar, P., Konrad, J.: Two-stream CNNs for gesture-based verification and identification: Learning user style. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–50 (2016)
Liu, Z., Zhang, C., Tian, Y.: 3D-based deep convolutional neural network for action recognition with depth sequences. Image Vis. Comput. 55, 93–100 (2016)
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)
Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2015)
Neverova, N., Wolf, C., Taylor, G.W., Nebout, F.: Hand segmentation with structured convolutional learning. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 687–702. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_45
Han, M., Chen, J., Li, L., Chang, Y.: Visual hand gesture recognition with convolution neural network. In: IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 287–291 (2016)
Dadashzadeh, A., Targhi, A.T., Tahmasbi, M., Mirmehdi, M.: HGR-net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019)
Elboushaki, A., Hannane, R., Afdel, K., Koutti, L.: MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst. Appl. 139, 112829 (2020)
Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D. N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: British Machine Vision Conference, pp. 1–13 (2019)
dos Santos, C.C., Samatelo, J.L.A., Vassallo, R.F.: Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400, 238–254 (2020)
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. In: International Conference on Pattern Recognition, pp. 7–12 (2016)
Duan, J., Zhou, S., Wan, J., Guo, X., Li, S. Z.: Multi-modality fusion based on consensus-voting and 3D convolution for isolated gesture recognition. arXiv:1611.06689 (2016)
Rastgoo, R., Kiani, K., Escalera, S.: Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine. Entropy 20(11), 809 (2018)
Rastgoo, R., Kiani, K., Escalera, S.: Video-based isolated hand sign language recognition using a deep cascaded model. Multimed. Tools Appl. 79, 22965–22987 (2020). https://doi.org/10.1007/s11042-020-09048-5
Sabour, S., Frosst, N., Hinton, G. E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
Lu, J., Nguyen, M., Yan, W.: Deep learning methods for human behavior recognition. In: IEEE IVCNZ (2020)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, J., Nguyen, M., Yan, W.Q. (2021). Sign Language Recognition from Digital Videos Using Deep Learning Methods. In: Nguyen, M., Yan, W.Q., Ho, H. (eds) Geometry and Vision. ISGV 2021. Communications in Computer and Information Science, vol 1386. Springer, Cham. https://doi.org/10.1007/978-3-030-72073-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-72073-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72072-8
Online ISBN: 978-3-030-72073-5
eBook Packages: Computer ScienceComputer Science (R0)