Abstract
This chapter covers several research works on sign language recognition (SLR), including isolated word recognition and continuous sentence translation. To solve isolated SLR, an Adaptive-HMM (hidden Markov model) framework (Guo et al., TOMCCAP 14(1):1–18, 2017) is proposed. The method explores the intrinsic properties and complementary relationship among different modalities. Continuous sentence sign translation (SLT) suffers from sequential variations of visual representations without any word alignment clue. To exploit spatiotemporal clues for identifying signs, a hierarchical recurrent neural network (RNN) is adopted to encode visual contents at different visual granularities (Guo et al., AAAI, pp 6845–6852, 2018; Guo et al., ACM TIP 29:1575–1590, 2020). In the encoding stage, key segments in the temporal stream are adaptively captured. Not only RNNs are used for sequential learning; convolutional neural networks (CNNs) can be used (Wang et al., ACM MM, pp 1483–1491, 2018). The proposed DenseTCN model encodes temporal cues of continuous gestures by using CNN operations (Guo et al., IJCAI, pp 744–750, 2019). As SLT is a weakly supervised task, due to the gesture variation without word alignment annotation, the pseudo-supervised learning mechanism contributes to solving the word alignment issue (Guo et al., IJCAI, pp 751–757, 2019).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
HMM package: http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm/html. Parameters Q and M are discussed, whereas A, B and π can be handled by this code package.
- 2.
As in [6], here we set similarity preference to media similarity in AP.
- 3.
In this work, HOG features were extracted through OpenCV with basic parameters [30] and further optimized, e.g., some invalid frames are deleted.
- 4.
- 5.
- 6.
https://www-i6.informatik.rwth-aachen.de/$sim$koller/RWTH-PHOENIX/.
References
Camgoz, N.C., Hadfield, S., Koller, O., Bowden, R.: Subunets: End-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 3075–3084 (2017)
Celebi, S., Aydin, A.S., Temiz, T.T., Arici, T.: Gesture recognition using skeleton data with weighted dynamic time warping. In: VISAPP, pp. 620–625 (2013)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets (2014). Preprint. arXiv:14053531
Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: CVPR, pp. 7361–7369 (2017)
Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1610–1618 (2017)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning (ICML), pp. 369–376 (2006)
Guo, D., Zhou, W., Li, H., Wang, M.: Online early-late fusion based on adaptive HMM for sign language recognition. TOMCCAP 14(1), 1–18 (2017)
Guo, D., Zhou, W., Li, H., Wang, M.: Hierarchical LSTM for sign language translation. In: AAAI, pp. 6845–6852 (2018)
Guo, D., Tang, S., Wang, M.: Connectionist temporal modeling of video and language: a joint model for translation and sign labeling. In: IJCAI, pp. 751–757 (2019)
Guo, D., Wang, S., Tian, Q., Wang, M.: Dense temporal convolution network for sign language translation. In: IJCAI, pp. 744–750 (2019)
Guo, D., Zhou, W., Li, A., Li, H., Wang, M.: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. ACM TIP 29, 1575–1590 (2020)
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3d residual networks for action recognition. In: ICCV Workshop on Action, Gesture, and Emotion Recognition, vol. 2, p. 4 (2017)
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation (2018). Preprint. arXiv:180110111
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vision Image Understanding 141, 108–125 (2015)
Koller, O., Ney, H., Bowden, R.: Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3793–3802 (2016)
Koller, O., Ney, H., Bowden, R.: Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In: CVPR, pp. 3793–3802 (2016)
Koller, O., Zargaran, O., Ney, H., Bowden, R.: Deep sign: hybrid cnn-hmm for continuous sign language recognition. In: British Machine Vision Conference (BMVC), p. 12 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4594–4602 (2016)
Pei, X., Guo, D., Zhao, Y.: Continuous sign language recognition based on pseudo-supervised learning. In: MAHCI, pp. 33–39 (2019)
Pu, J., Zhou, W., Li, H.: Dilated convolutional network with iterative optimization for coutinuous sign language recognition. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 885–891 (2018)
Salvador, S., Chan, P.: Toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 11(5), 561–580 (2007)
Song, P., Guo, D., Zhou, W., Wang, M., Li, H.: Parallel temporal encoder for sign language translation. In: ICIP, pp. 1915–1919 (2019)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 4489–4497 (2015)
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4534–4542 (2015)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)
Wang, H., Chai, X., Zhou, Y., Chen, X.: Fast sign language recognition benefited from low rank approximation. In: Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–6 (2015)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV, pp. 20–36 (2016)
Wang, S., Guo, D., Zhou, W., Zha, Z., Wang, M.: Connectionist temporal fusion for sign language translation. In: ACM MM, pp. 1483–1491 (2018)
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4507–4515 (2015)
Zhang, J., Zhou, W., Xie, C., Pu, J., Li, H.: Chinese sign language recognition with adaptive HMM. In: International Conference on Multimedia and Expo (ICME), pp. 1–6 (2016)
Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1741–1750 (2015)
Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016)
Acknowledgements
This work is supported in part by the State Key Development Program under Grant 2018YFC0830103, in part by the National Natural Science Foundation of China (NSFC) under Grant 61876058, and in part by the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Guo, D., Tang, S., Hong, R., Wang, M. (2021). Sign Language Recognition. In: McDaniel, T., Liu, X. (eds) Multimedia for Accessible Human Computer Interfaces. Springer, Cham. https://doi.org/10.1007/978-3-030-70716-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-70716-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70715-6
Online ISBN: 978-3-030-70716-3
eBook Packages: Computer ScienceComputer Science (R0)