Skip to main content
Log in

ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

To better meet the communication needs of hearing impaired people and the public, it is of great significance to recognize sign language more quickly and accurately on embedded platforms and mobile terminals. YOLOv3, raised by Joseph Redmon and Ali Farhadi in 2018, achieved a great improvement in detection speed with considerable accuracy by optimizing Yolo. However, YOLOv3 is still too bloated to use on mobile terminals. A static sign language recognition method based on the ShuffleNetv2-YOLOv3 lightweight model was proposed. The ShuffleNetv2-YOLOv3 lightweight model makes the network lightweight by using ShuffleNetv2 as the backbone network of YOLOv3. The lightweight network improved the recognition speed steeply. Combing with the CIoU loss function, the ShuffleNetv2-YOLOv3 keeps the recognition accuracy while improving the recognition speed. Recognition effectiveness of the self-made sign language images and public database by the ShuffleNetv2-YOLOv3 lightweight model was evaluated by F1 score and mAP value. The performance of the ShuffleNetv2-YOLOv3 model was compared with that of the YOLOv3-tiny, SSD, Faster-RCNN, and YOLOv4-tiny model, respectively. The experimental results show that the proposed ShuffleNetv2-YOLOv3 model achieved a good balance between the accuracy and speed of the gesture detection under the premise of model lightweight. The F1 score and mAP value of the ShuffleNetv2-YOLOv3 model were 99.1% and 98.4%, respectively. The gesture detection speed on the GPU reaches 54 frames per second, which is better than other models. The mobile terminal application of the proposed lightweight model was also evaluated. The minimal inference speed of single frame images on the CPU and GPU is 0.14 and 0.025 s per image, respectively. It is only 1/6.5 and 1/8.5 of the running speed of the original YOLOv3 model. The ShuffleNetv2-YOLOv3 lightweight model is conducive to quick, real time, and similar static sign language gesture recognition, laying a good foundation for real-time gesture recognition in the embedded platforms and mobile terminals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

We used Microsoft Kinect and Leap Motion datasets and Creative Senz3D datasets and referenced [29, 30, 32], and [32] in the text.

References

  1. Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. IEEE (2015)

  2. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. IEEE (2017). https://doi.org/10.1109/CVPR.2017.502

    Article  Google Scholar 

  3. Liao, Y., Xiong, P., Min, W., Min, W., Lu, J.: Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access 2, 1–1 (2019)

    Google Scholar 

  4. Basak, H., Kundu, R., Singh, P.K., Ijaz, M.F., Woniak, M., Sarkar, R.: A union of deep learning and swarm-based optimization for 3d human action recognition. Sci. Rep. 12(1), 1–17 (2022)

    Article  Google Scholar 

  5. Takahashi, T., Kishino, F.: Hand gesture coding based on experiments using a hand gesture interface device. Acm Sigchi Bull. 23(2), 67–74 (1991)

    Article  Google Scholar 

  6. Munib, Q., Habeeb, M., Takruri, B., Al-Malik, H.A.: American sign language (ASL) recognition based on Hough transform and neural networks. Expert Syst. Appl. 32(1), 24–37 (2007)

    Article  Google Scholar 

  7. Pigou, L., Dieleman, S., Kindermans, P. J., and Schrauwen, B.: Sign language recognition using convolutional neural networks. European Conference on Computer Vision. Springer, Cham (2014)

  8. Ghosh, D.K., Ari, S.: On an algorithm for vision-based hand gesture recognition. SIViP 10(4), 655–662 (2016)

    Article  Google Scholar 

  9. Xu, L.K., Zhang, K.Q., Xu, Z.H., Y, G.K.: Convolutional neural network hand gesture recognition algorithm based on surface EMG energy kernel phase diagram. J. Biomed Eng 38(4), 9 (2021)

    Google Scholar 

  10. Huang, Y., Yang, J.: A multi-scale descriptor for real time RGB-D hand gesture recognition. Pattern Recognit. Lett. 144(10), 4365 (2020)

    Google Scholar 

  11. Tan, Y.S., Lim, K.M., Tee, C., Lee, C.P., Low, C.Y.: Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput. Appl. 33(10), 5339–5351 (2021)

    Article  Google Scholar 

  12. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE (2018). https://doi.org/10.1109/TPAMI.2019.2929257

    Article  Google Scholar 

  13. Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., and Chang, C. L., et al.: MediaPipe Hands: on-device real-time hand tracking. (2020). https://doi.org/10.48550/arXiv.2006.10214

  14. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Comput. Soc. (2013). https://doi.org/10.1109/CVPR.2014.81

    Article  Google Scholar 

  15. Girshick, R.: Fast r-cnn. Comput. Sci. 28, 1440 (2015)

    Google Scholar 

  16. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NIPS 28, 148 (2016)

    Google Scholar 

  17. Jiang, Y., Zhao, M., Wang, C., Wei, F., Wang, K., Qi, H.: Diver’s hand gesture recognition and segmentation for human–robot interaction on AUV. Signal Image Video Process. 15(8), 1899–1906 (2021)

    Article  Google Scholar 

  18. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR) 2016, 779–788 (2016)

    Google Scholar 

  19. Redmon, J., and Farhadi, A.: YOLO9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 6517–6525 (2017)

  20. Redmon, J., and Farhadi, A.: Yolov3: an incremental improvement. (2018). arXiv e-prints

  21. Fan, J. J., X, H. W., W, X.H., and W, M.L.: Hand gesture recognition algorithm with ghost feature mapping and channel attention mechanism. J. Comput. Aided Des. Graph. 34(3):12 (2022)

  22. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer International Publishing (2016)

    Chapter  Google Scholar 

  23. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., and Weyand, T., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. (2017)

  24. Zhang, X., Zhou, X., Lin, M., & Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. (2017). https://doi.org/10.48550/arXiv.1707.01083

  25. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018, pp. 122–138. Springer International Publishing (2018)

    Chapter  Google Scholar 

  26. Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y.: Acquisition of localization confidence for accurate object detection. (2018)

  27. Rezatofighi, H., Tsoi, N., Gwak, J. Y., Sadeghian, A., & Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)

  28. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. (2019). https://doi.org/10.1609/aaai.v34i07.6999

  29. Marin, G., Dominio, F., and Zanuttigh, P.: Hand gesture recognition with leap motion and kinect devices. IEEE (2015)

  30. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl. 75(22), 1–25 (2016)

    Article  Google Scholar 

  31. Memo, A., Minto, L., and Zanuttigh, P.: Exploiting silhouette descriptors and synthetic data for hand gesture recognition. (2015)

  32. Memo, A., Zanuttigh, P.: Head-mounted gesture controlled interface for human-computer interaction. Multimed. Tools Appl. 77(1), 27–53 (2018)

    Article  Google Scholar 

  33. ULTRALYTICS. YOLOv3[EB/OL]. [2021–11–15]. https://github.com/ultralytics/yolov3

  34. Jin, F.R., Wang, Y.P., Yong, J.: Real time hand detection method based on lightweight network. Comput. Eng. Appl. 21, 495 (2022)

    Google Scholar 

Download references

Funding

This study was supported by Research and Development project of Key Core Technology and Common Technology in Shanxi Province (Grant No. 2020XXX001, 2020XXX009) and Major science and technology project of Shanxi Tiandi Coal Machine Equipment Co., LTD M2020-ZD03.

Author information

Authors and Affiliations

Authors

Contributions

SS proposed this method and trained the model, HL and WJ set up the database, HH and HJ directed the writing of the thesis and helped revise and polish it. XW made suggestions on the model revision of the paper. KP and ZX are responsible for data collection.

Corresponding author

Correspondence to Huimin Hao.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, S., Han, L., Wei, J. et al. ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network. SIViP 17, 2721–2729 (2023). https://doi.org/10.1007/s11760-023-02489-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02489-z

Keywords

Navigation