Skip to main content
Log in

Sign language recognition and translation network based on multi-view data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Sign language recognition and translation can address the communication problem between hearing-impaired and general population, and can break the sign language boundariesy between different countries and different languages. Traditional sign language recognition and translation algorithms use Convolutional Neural Networks (CNNs) to extract spatial features and Recurrent Neural Networks (RNNs) to extract temporal features. However, these methods cannot model the complex spatiotemporal features of sign language. Moreover, RNN and its variant algorithms find it difficult to learn long-term dependencies. This paper proposes a novel and effective network based on Transformer and Graph Convolutional Network (GCN), which can be divided into three parts: a multi-view spatiotemporal embedding network (MSTEN), a continuous sign language recognition network (CSLRN), and a sign language translation network (SLTN). MSTEN can extract the spatiotemporal features of RGB data and skeleton data. CSLRN can recognize sign language glosses and obtain intermediate features from multi-view input sign data. SLTN can translate intermediate features into spoken sentences. The entire network was designed as end-to-end. Our method was tested on three public sign language datasets (SLR-100, RWTH, and CSL-daily) and the results demonstrated that our method achieved excellent performance on these datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Rastgoo R, Kiani K, Escalera S (2021) Sign language recognition: a deep survey. Expert Syst Appl 164:113794

    Article  Google Scholar 

  2. Cheok MJ, Omar Z, Jaward MH (2019) A review of hand gesture and sign language recognition techniques. Int J Mach Learn Cybern 10(1):131–153

    Article  Google Scholar 

  3. Camgoz NC, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: Joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10023–10033

  4. Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R (2018) Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7784–7793

  5. Li D, Xu C, Yu X, Zhang K, Swift B, Suominen H, Li H (2020) Tspnet:, Hierarchical feature learning via temporal semantic pyramid for sign language translation. arXiv:2010.05468

  6. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  7. Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1145–1153

  8. Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186

    Article  Google Scholar 

  9. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  10. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035

  11. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words:, Transformers for image recognition at scale. arXiv:2010.11929

  12. Yang Q, Peng JY (2014) Chinese sign language recognition method based on depth image information and surf-bow. Patt recog artificial intell, 8(009)

  13. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: European Conference on Computer Vision, pp 572–578. Springer

  14. Zhou H, Zhou W, Zhou Y, Li H (2020) Spatial-temporal multi-cue network for continuous sign language recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 13009–13016

  15. Yuan Q, Wan J, Lin C, Li Y, Miao Q, Li SZ, Wang L, Lu Y (2019) Global and local spatial-attention network for isolated gesture recognition. In: Chinese conference on biometric recognition, pp 84–93. Springer

  16. Zhang J, Zhou W, Li H (2014) A threshold-based hmm-dtw approach for continuous sign language recognition. In: Proceedings of international conference on internet multimedia computing and service, pp 237–240

  17. Camgoz NC, Hadfield S, Koller O, Bowden R (2017) Subunets: End-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3056–3065

  18. Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans Multimed 21(7):1880–1891

    Article  Google Scholar 

  19. Zhang Q, Wang D, Zhao R, Yinggang Y (2019) Myosign: enabling end-to-end sign language recognition with wearables. In: Proceedings of the 24th international conference on intelligent user interfaces, pp 650–660

  20. Mittal A, Kumar P, Roy PP, Balasubramanian R, Chaudhuri BB (2019) A modified lstm model for continuous sign language recognition using leap motion. IEEE Sensors J 19(16):7056–7063

    Article  Google Scholar 

  21. Rashid M, Khan MA, Alhaisoni M, Wang SH, Naqvi SR, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037

    Article  Google Scholar 

  22. Khan MA, Sharif MI, Raza M, Anjum A, Saba T, Shad SA (2019) Skin lesion segmentation and classification: a unified framework of deep neural network features fusion and selection. Expert Systems, pp e12497

  23. Arshad H, Khan MA, Sharif MI, Yasmin M, Tavares JMRS, Zhang YD, Satapathy SC (2020) A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition. Expert System pp e12541

  24. Akhtar Z, Lee JW, Khan MA, Sharif M, Khan SA, Riaz N (2020) Optical character recognition (ocr) using partial least square (pls) based feature reduction: An application to artificial intelligence for biometric identification Journal of Enterprise Information Management

  25. Zahid M, Khan MA, Azam F, Sharif M, Kadry S, Mohanty J (2021) Pedestrian identification using motion-controlled deep neural network in real-time visual surveillance. Soft Computing, pp 1–17

  26. Luong MT, Brevdo E, Zhao R (2017) Neural machine translation (seq2seq) tutorial

  27. Nolla FC, Abril LP (2017) Neural machine translation. Revista Tradumà,tica, (15):66

  28. Wang H, Chai X, Zhou Y, Chen X (2015) Fast sign language recognition benefited from low rank approximation. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, vol 1, pp 1–6

  29. Orbay A, Akarun L (2020) Neural sign language translation by learning tokenization. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 222–228 IEEE

  30. Nguyen XP, Joty S, Kui W, Aw AT (2020) Data diversification:, A simple strategy for neural machine translation. pp 572–578

  31. Aghajanyan A, Shrivastava A, Gupta A, Goyal N, Zettlemoyer L, Gupta S (2020) Better fine-tuning by reducing representational collapse

  32. Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376

  33. Koller O, Forster J, Ney H (2015) Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Comput Vis Image Underst 141:108–125

    Article  Google Scholar 

  34. Zhang J, Zhou W, Xie C, Junfu P, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE

  35. Zhou H, Zhou W, Qi W, Pu J, Li H (2021) Improving sign language translation with monolingual data by sign back-translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1316–1325

  36. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  37. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch

  38. Kingma DP, Ba J (2014) Adam:, A method for stochastic optimization. arXiv:1412.6980

  39. Pan J, Bai H, Tang J (2020) Cascaded deep video deblurring using temporal sharpness prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3043–3051

  40. Yang W, Tao J, Ye Z (2016) Continuous sign language recognition using level building based on fast hidden markov model. Pattern Recogn Lett 78:28–35

    Article  Google Scholar 

  41. Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-based sign language recognition without temporal segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence

  42. Guo D, Zhou W, Li H, Wang M (2018) Hierarchical lstm for sign language translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32

  43. Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4165–4174

  44. Yang Z, Shi Z, Shen X, Tai YW (2019) Sf-net:, Structured feature network for continuous sign language recognition. arXiv:1908.01341

  45. Cheng KL, Yang Z, Chen Q, Tai YW (2020) Fully convolutional networks for continuous sign language recognition. In: European Conference on Computer Vision, pages 697–714. Springer

  46. Koller Oscar, Zargaran O, Ney Hermann, Bowden Richard (2016) Deep sign: Hybrid cnn-hmm for continuous sign language recognition. In: Proceedings of the British Machine Vision Conference, p 2016

  47. Cui R, Hu L, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7361–7369

Download references

Acknowledgements

This work was funded by National Natural Science Foundation of China (62073061), and the Fundamental Research Funds for the Central Universities (N2204009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Meng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Multi-view Learning Guest Editors: Guoqing Chao, Xingquan Zhu, Weiping Ding, Jinbo Bi and Shiliang Sun

Appendix

Appendix

figure c
figure d
figure e

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, R., Meng, L. Sign language recognition and translation network based on multi-view data. Appl Intell 52, 14624–14638 (2022). https://doi.org/10.1007/s10489-022-03407-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03407-5

Keywords

Navigation