A CNN-Based Algorithm with an Optimized Attention Mechanism for Sign Language Gesture Recognition

Yang, Kai; Yang, Zhiwei; Liu, Li; Liu, Yuqi; Zhang, Xinyu; Wang, Naihe; Zhang, Shengwei

doi:10.1007/978-3-031-50580-5_8

Kai Yang¹⁹,
Zhiwei Yang¹⁹,
Li Liu¹⁹,
Yuqi Liu¹⁹,
Xinyu Zhang¹⁹,
Naihe Wang¹⁹ &
…
Shengwei Zhang¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 535))

Included in the following conference series:

International Conference on Multimedia Technology and Enhanced Learning

95 Accesses

Abstract

Sign language is the main method for people with hearing impairment to communicate with others and obtain information from the outside world. It is also an important tool to help them integrate into society. Continuous sign language recognition is a challenging task. Most current models need to pay more attention to the ability to model lengthy sequences as a whole, resulting in low accuracy in the recognition and translation of longer sign language videos. This paper proposes a sign language recognition network based on a target detection network model. First, an optimized attention module is introduced in the backbone network of YOLOv4-tiny, which optimizes channel attention and spatial attention and replaces the original feature vectors with weighted feature vectors for residual fusion. Thus, it can enhance feature representation and reduce the influence of other background sounds; In addition, to reduce the time-consuming object detection, three identical MobileNet modules are used to replace the three CSPBlock modules in the YOLOv4-tiny network to simplify the network structure. The experimental results show that the enhanced network model has improved the average precision mean, precision rate, and recall rate, respectively, effectively improving the detection accuracy of the sign language recognition network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: IEEE CVPR2016 Conference on Computer Vision and Pattern Recognition, pp. 779–788. IEEE Computer Society Press, Washington DC (2016)
Google Scholar
Wang, P., Huang, H., Wang, M., et al.: YOLOv5s-FCG: an improved YOLOv5 method for inspecting riders’ helmet wearing. J. Phys: Conf. Ser. 2024, 012059 (2021)
Google Scholar
Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. In: Proceedings of the 15th European Conference on Computer Vision, Munich, 3–19 (2018)
Google Scholar
Zhu, R., Huang, X., Huang, X., Li, D., Yang, Q.: An on-site-based opportunistic routing protocol for scalable and energy-efficient underwater acoustic sensor networks. Appl. Sci. 12(23), 12482 (2022)
Article Google Scholar
Berman, M., Triki, A.R., Blaschiko, M.B.: The Lovasz-Softmax Loss: a tractable surrogate for optimizing the intersection-over-union measure in neural networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018)
Google Scholar
Boukdir, A., Benaddy, M., Ellahyani, A., et al.: Isolated video-based Arabic sign language recognition using convolutional and recursive neural networks. Arab. J. Sci. Eng. 47, 2187–2199 (2022)
Article Google Scholar
Oz, C., Leu, M.c.: American Sign Language word recognition with a sensory glove using artificial neural networks. Eng. Appl. Artif. Intell. 24(7), 1204–1213 (2011)
Google Scholar
Camgoz, N.c., Koller, O., Hadfield, S., et al.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10020–10030 (2020)
Google Scholar
Jin, X., Lan, C.L., Zeng, W.J., et al.: Style normalization and restitution for generalizable person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3140–3149. IEEE, Seattle, WA, USA (2020)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3; an incremental improvement. arXiv: 1804.02767 (2018)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Guo, X.J., Sui, H.D.: Application of improved YOLOv3 in foreign object debris target detection on airfield pavement. Comput. Eng. Appl. 57(8), 249–255 (2021)
Google Scholar
Chao, H.Q., He, Y.W., Zhang, J.P., et al.: Gait set: regarding gait as a set for cross-view gait recognition. Proceedings of the AAAI Conference on Artificial Intelligence 33, 8126–8133 (2019)
Article Google Scholar
Zheng, H.L., Wu, Y.J., Deng, L., et al.: Going deeper with directly-trained larger spiking neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 35(12), 11062–11070 (2021)
Article Google Scholar
Guo, D., Zhou, W.G., Wang, M., et al.: Hierarchical LSTM for sign language translation. In: Proceedings of the 32 ND AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 6845–6852 (2018)
Google Scholar
Yu, S.Q., Tan, D.L., Tan, T.N.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International Conference on Pattern Recognition (ICPR'06), pp. 44–444. IEEE, Hong Kong, China (2006)
Google Scholar
Camgoz, N.C., Hadfield, S., Koller, O., et al.: Neural sign language translation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7784–7793. IEEE Computer Society, Piscataway, NJ (2018)
Google Scholar
Zhang, S.J., Zhang, Q.: Sign language recognition based on global-local attention. J. Vis. Commun. Image Represent. 80(7), 103280 (2021)
Article Google Scholar
Ren, Z., Zhang, Y., Wang, S.: A hybrid framework for lung cancer classification. Electronics 11(10), 1614 (2022). May
Article Google Scholar
Wang, W., Pei, Y., Wang, S.H., Gorrz, J.M., Zhang, Y.D.: PSTCNN: Explainable COVID-19 diagnosis using PSO-guided self-tuning CNN. Biocell
Google Scholar

Download references

Acknowledgements

This work was supported by Universities'Philosophy and Social Science Researches Project in Jiangsu Province. (No. 2020SJA0631 & No. 2019SJA0544) & Educational Reform Research Project(No.2018XJJG28) from Nanjing Normal University of Special Education.

Author information

Authors and Affiliations

Nanjing Normal University of Special Education, Nanjing, 210038, China
Kai Yang, Zhiwei Yang, Li Liu, Yuqi Liu, Xinyu Zhang, Naihe Wang & Shengwei Zhang

Authors

Kai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Naihe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shengwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Liu .

Editor information

Editors and Affiliations

Nanjing Normal University of Special Education, Nanjing, China
Bing Wang
Nanjing Normal University of Special Education, Nanjing, China
Zuojin Hu
Nanjing Normal University of Special Education, Nanjing, China
Xianwei Jiang
University of Leicester, Leicester, UK
Yu-Dong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, K. et al. (2024). A CNN-Based Algorithm with an Optimized Attention Mechanism for Sign Language Gesture Recognition. In: Wang, B., Hu, Z., Jiang, X., Zhang, YD. (eds) Multimedia Technology and Enhanced Learning. ICMTEL 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 535. Springer, Cham. https://doi.org/10.1007/978-3-031-50580-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-50580-5_8
Published: 21 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50579-9
Online ISBN: 978-3-031-50580-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics