Skip to main content

Global-Temporal Enhancement for Sign Language Recognition

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14261))

Included in the following conference series:

  • 653 Accesses

Abstract

The continuous sign language recognition task is challenging which needs to identify unsegmented gloss from long videos in a weakly supervised manner. Some previous methods hope to extract information of different modalities to enhance the representation of features, which often complicates the network and focuses too much on visual features. Sign language data is a long video, as the time span increases, the model may forget the information of early time steps. The long-distance temporal modeling ability directly affects the recognition performance. Therefore, a Global-Temporal Enhancement (GTE) module is proposed to enhance temporal learning ability. Most of the current continuous sign language recognition networks have a three-step architecture, i.e., visual, sequence and alignment module. However, such architecture is difficult to get enough training under current Connectionist Temporal Classification (CTC) losses. So two auxiliary supervision methods are proposed, namely Temporal-Consistency Self-Distillation (TCSD) and GTE loss. TCSD uses two global temporal outputs from different depths to supervise local temporal information. GTE loss can provide a moderate supervision to balance the features extracted by deep and shallow layers. The proposed model achieves state-of-the-art or competitive performance on PHOENIX14, PHOENIX14-T datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  2. Cheng, K.L., Yang, Z., Chen, Q., Tai, Y.-W.: Fully convolutional networks for continuous sign language recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 697–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_41

    Chapter  Google Scholar 

  3. Cui, R., Liu, H., Zhang, C.: A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans. Multimedia 21(7), 1880–1891 (2019)

    Article  Google Scholar 

  4. Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J., Ramabhadran, B.: Efficient knowledge distillation from an ensemble of teachers. In: Interspeech, pp. 3697–3701 (2017)

    Google Scholar 

  5. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

    Google Scholar 

  6. Guo, M.H., Liu, Z.N., Mu, T.J., Hu, S.M.: Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2022)

    Google Scholar 

  7. Hao, A., Min, Y., Chen, X.: Self-mutual distillation learning for continuous sign language recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11303–11312 (2021)

    Google Scholar 

  8. Hu, L., Gao, L., Feng, W., et al.: Self-emphasizing network for continuous sign language recognition. arXiv preprint arXiv:2211.17081 (2022)

  9. Hu, L., Gao, L., Liu, Z., Feng, W.: Temporal lift pooling for continuous sign language recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXXV, pp. 511–527. Springer, Cham (2022)

    Google Scholar 

  10. Jiang, S., Sun, B., Wang, L., Bai, Y., Li, K., Fu, Y.: Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3413–3423 (2021)

    Google Scholar 

  11. Min, Y., Hao, A., Chai, X., Chen, X.: Visual alignment constraint for continuous sign language recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11542–11551 (2021)

    Google Scholar 

  12. Niu, Z., Mak, B.: Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 172–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_11

    Chapter  Google Scholar 

  13. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  14. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019)

    Google Scholar 

  15. Zhou, H., Zhou, W., Zhou, Y., Li, H.: Spatial-temporal multi-cue network for continuous sign language recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13009–13016 (2020)

    Google Scholar 

  16. Zuo, R., Mak, B.: C2SLR: consistency-enhanced continuous sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5131–5140 (2022)

    Google Scholar 

Download references

Acknowledgements

Funded by National Natural Science Foundation of China (NSFC), Grant Number: 92048205; Also funded by China Scholarship Council (CSC), Grant Number: 202008310014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuedian Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qin, X., Wang, H., He, C., Zhang, X. (2023). Global-Temporal Enhancement for Sign Language Recognition. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44198-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44197-4

  • Online ISBN: 978-3-031-44198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics