Global-Temporal Enhancement for Sign Language Recognition

Qin, Xiaofei; Wang, Hui; He, Changxiang; Zhang, Xuedian

doi:10.1007/978-3-031-44198-1_23

Xiaofei Qin¹¹,
Hui Wang¹¹,
Changxiang He¹¹ &
…
Xuedian Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14261))

Included in the following conference series:

International Conference on Artificial Neural Networks

653 Accesses

Abstract

The continuous sign language recognition task is challenging which needs to identify unsegmented gloss from long videos in a weakly supervised manner. Some previous methods hope to extract information of different modalities to enhance the representation of features, which often complicates the network and focuses too much on visual features. Sign language data is a long video, as the time span increases, the model may forget the information of early time steps. The long-distance temporal modeling ability directly affects the recognition performance. Therefore, a Global-Temporal Enhancement (GTE) module is proposed to enhance temporal learning ability. Most of the current continuous sign language recognition networks have a three-step architecture, i.e., visual, sequence and alignment module. However, such architecture is difficult to get enough training under current Connectionist Temporal Classification (CTC) losses. So two auxiliary supervision methods are proposed, namely Temporal-Consistency Self-Distillation (TCSD) and GTE loss. TCSD uses two global temporal outputs from different depths to supervise local temporal information. GTE loss can provide a moderate supervision to balance the features extracted by deep and shallow layers. The proposed model achieves state-of-the-art or competitive performance on PHOENIX14, PHOENIX14-T datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Cheng, K.L., Yang, Z., Chen, Q., Tai, Y.-W.: Fully convolutional networks for continuous sign language recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 697–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_41
Chapter Google Scholar
Cui, R., Liu, H., Zhang, C.: A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans. Multimedia 21(7), 1880–1891 (2019)
Article Google Scholar
Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J., Ramabhadran, B.: Efficient knowledge distillation from an ensemble of teachers. In: Interspeech, pp. 3697–3701 (2017)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Guo, M.H., Liu, Z.N., Mu, T.J., Hu, S.M.: Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Google Scholar
Hao, A., Min, Y., Chen, X.: Self-mutual distillation learning for continuous sign language recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11303–11312 (2021)
Google Scholar
Hu, L., Gao, L., Feng, W., et al.: Self-emphasizing network for continuous sign language recognition. arXiv preprint arXiv:2211.17081 (2022)
Hu, L., Gao, L., Liu, Z., Feng, W.: Temporal lift pooling for continuous sign language recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXXV, pp. 511–527. Springer, Cham (2022)
Google Scholar
Jiang, S., Sun, B., Wang, L., Bai, Y., Li, K., Fu, Y.: Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3413–3423 (2021)
Google Scholar
Min, Y., Hao, A., Chai, X., Chen, X.: Visual alignment constraint for continuous sign language recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11542–11551 (2021)
Google Scholar
Niu, Z., Mak, B.: Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 172–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_11
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019)
Google Scholar
Zhou, H., Zhou, W., Zhou, Y., Li, H.: Spatial-temporal multi-cue network for continuous sign language recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13009–13016 (2020)
Google Scholar
Zuo, R., Mak, B.: C2SLR: consistency-enhanced continuous sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5131–5140 (2022)
Google Scholar

Download references

Acknowledgements

Funded by National Natural Science Foundation of China (NSFC), Grant Number: 92048205; Also funded by China Scholarship Council (CSC), Grant Number: 202008310014.

Author information

Authors and Affiliations

University of Shanghai for Science and Technology, Shanghai, China
Xiaofei Qin, Hui Wang, Changxiang He & Xuedian Zhang

Authors

Xiaofei Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Changxiang He
View author publications
You can also search for this author in PubMed Google Scholar
Xuedian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuedian Zhang .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, X., Wang, H., He, C., Zhang, X. (2023). Global-Temporal Enhancement for Sign Language Recognition. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-44198-1_23
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44197-4
Online ISBN: 978-3-031-44198-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Global-Temporal Enhancement for Sign Language Recognition