Sign Language Recognition

Guo, Dan; Tang, Shengeng; Hong, Richang; Wang, Meng

doi:10.1007/978-3-030-70716-3_2

Dan Guo³,
Shengeng Tang³,
Richang Hong³ &
…
Meng Wang³

344 Accesses

Abstract

This chapter covers several research works on sign language recognition (SLR), including isolated word recognition and continuous sentence translation. To solve isolated SLR, an Adaptive-HMM (hidden Markov model) framework (Guo et al., TOMCCAP 14(1):1–18, 2017) is proposed. The method explores the intrinsic properties and complementary relationship among different modalities. Continuous sentence sign translation (SLT) suffers from sequential variations of visual representations without any word alignment clue. To exploit spatiotemporal clues for identifying signs, a hierarchical recurrent neural network (RNN) is adopted to encode visual contents at different visual granularities (Guo et al., AAAI, pp 6845–6852, 2018; Guo et al., ACM TIP 29:1575–1590, 2020). In the encoding stage, key segments in the temporal stream are adaptively captured. Not only RNNs are used for sequential learning; convolutional neural networks (CNNs) can be used (Wang et al., ACM MM, pp 1483–1491, 2018). The proposed DenseTCN model encodes temporal cues of continuous gestures by using CNN operations (Guo et al., IJCAI, pp 744–750, 2019). As SLT is a weakly supervised task, due to the gesture variation without word alignment annotation, the pseudo-supervised learning mechanism contributes to solving the word alignment issue (Guo et al., IJCAI, pp 751–757, 2019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
HMM package: http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm/html. Parameters Q and M are discussed, whereas A, B and π can be handled by this code package.
2.
As in [6], here we set similarity preference to media similarity in AP.
3.
In this work, HOG features were extracted through OpenCV with basic parameters [30] and further optimized, e.g., some invalid frames are deleted.
4.
http://mccipc.ustc.edu.cn/mediawiki/index.php/SLR_Dataset.
5.
https://github.com/baidu-research/warp-ctc
6.
https://www-i6.informatik.rwth-aachen.de/$sim$koller/RWTH-PHOENIX/.

References

Camgoz, N.C., Hadfield, S., Koller, O., Bowden, R.: Subunets: End-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 3075–3084 (2017)
Google Scholar
Celebi, S., Aydin, A.S., Temiz, T.T., Arici, T.: Gesture recognition using skeleton data with weighted dynamic time warping. In: VISAPP, pp. 620–625 (2013)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets (2014). Preprint. arXiv:14053531
Google Scholar
Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: CVPR, pp. 7361–7369 (2017)
Google Scholar
Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1610–1618 (2017)
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning (ICML), pp. 369–376 (2006)
Google Scholar
Guo, D., Zhou, W., Li, H., Wang, M.: Online early-late fusion based on adaptive HMM for sign language recognition. TOMCCAP 14(1), 1–18 (2017)
Article Google Scholar
Guo, D., Zhou, W., Li, H., Wang, M.: Hierarchical LSTM for sign language translation. In: AAAI, pp. 6845–6852 (2018)
Google Scholar
Guo, D., Tang, S., Wang, M.: Connectionist temporal modeling of video and language: a joint model for translation and sign labeling. In: IJCAI, pp. 751–757 (2019)
Google Scholar
Guo, D., Wang, S., Tian, Q., Wang, M.: Dense temporal convolution network for sign language translation. In: IJCAI, pp. 744–750 (2019)
Google Scholar
Guo, D., Zhou, W., Li, A., Li, H., Wang, M.: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. ACM TIP 29, 1575–1590 (2020)
MathSciNet Google Scholar
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3d residual networks for action recognition. In: ICCV Workshop on Action, Gesture, and Emotion Recognition, vol. 2, p. 4 (2017)
Google Scholar
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation (2018). Preprint. arXiv:180110111
Google Scholar
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Article Google Scholar
Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vision Image Understanding 141, 108–125 (2015)
Article Google Scholar
Koller, O., Ney, H., Bowden, R.: Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3793–3802 (2016)
Google Scholar
Koller, O., Ney, H., Bowden, R.: Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In: CVPR, pp. 3793–3802 (2016)
Google Scholar
Koller, O., Zargaran, O., Ney, H., Bowden, R.: Deep sign: hybrid cnn-hmm for continuous sign language recognition. In: British Machine Vision Conference (BMVC), p. 12 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
Google Scholar
Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4594–4602 (2016)
Google Scholar
Pei, X., Guo, D., Zhao, Y.: Continuous sign language recognition based on pseudo-supervised learning. In: MAHCI, pp. 33–39 (2019)
Google Scholar
Pu, J., Zhou, W., Li, H.: Dilated convolutional network with iterative optimization for coutinuous sign language recognition. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 885–891 (2018)
Google Scholar
Salvador, S., Chan, P.: Toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 11(5), 561–580 (2007)
Article Google Scholar
Song, P., Guo, D., Zhou, W., Wang, M., Li, H.: Parallel temporal encoder for sign language translation. In: ICIP, pp. 1915–1919 (2019)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 4489–4497 (2015)
Google Scholar
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4534–4542 (2015)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)
Google Scholar
Wang, H., Chai, X., Zhou, Y., Chen, X.: Fast sign language recognition benefited from low rank approximation. In: Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–6 (2015)
Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV, pp. 20–36 (2016)
Google Scholar
Wang, S., Guo, D., Zhou, W., Zha, Z., Wang, M.: Connectionist temporal fusion for sign language translation. In: ACM MM, pp. 1483–1491 (2018)
Google Scholar
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4507–4515 (2015)
Google Scholar
Zhang, J., Zhou, W., Xie, C., Pu, J., Li, H.: Chinese sign language recognition with adaptive HMM. In: International Conference on Multimedia and Expo (ICME), pp. 1–6 (2016)
Google Scholar
Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1741–1750 (2015)
Google Scholar
Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the State Key Development Program under Grant 2018YFC0830103, in part by the National Natural Science Foundation of China (NSFC) under Grant 61876058, and in part by the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Hefei University of Technology, University of Hefei, Hefei, China
Dan Guo, Shengeng Tang, Richang Hong & Meng Wang

Authors

Dan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Shengeng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Richang Hong
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Guo .

Editor information

Editors and Affiliations

Arizona State University, Mesa, AZ, USA
Troy McDaniel
Hefei University of Technology, Hefei, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guo, D., Tang, S., Hong, R., Wang, M. (2021). Sign Language Recognition. In: McDaniel, T., Liu, X. (eds) Multimedia for Accessible Human Computer Interfaces. Springer, Cham. https://doi.org/10.1007/978-3-030-70716-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-70716-3_2
Published: 19 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70715-6
Online ISBN: 978-3-030-70716-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics