ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network

Sun, Shiniu; Han, Lisheng; Wei, Jie; Hao, Huimin; Huang, Jiahai; Xin, Wenbin; Zhou, Xu; Kang, Peng

doi:10.1007/s11760-023-02489-z

ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network

Original Paper
Published: 12 January 2023

Volume 17, pages 2721–2729, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Shiniu Sun¹,
Lisheng Han¹,
Jie Wei¹,
Huimin Hao¹,
Jiahai Huang¹,
Wenbin Xin¹,
Xu Zhou^2,3 &
…
Peng Kang^2,3

3 Citations
1 Altmetric
Explore all metrics

Abstract

To better meet the communication needs of hearing impaired people and the public, it is of great significance to recognize sign language more quickly and accurately on embedded platforms and mobile terminals. YOLOv3, raised by Joseph Redmon and Ali Farhadi in 2018, achieved a great improvement in detection speed with considerable accuracy by optimizing Yolo. However, YOLOv3 is still too bloated to use on mobile terminals. A static sign language recognition method based on the ShuffleNetv2-YOLOv3 lightweight model was proposed. The ShuffleNetv2-YOLOv3 lightweight model makes the network lightweight by using ShuffleNetv2 as the backbone network of YOLOv3. The lightweight network improved the recognition speed steeply. Combing with the CIoU loss function, the ShuffleNetv2-YOLOv3 keeps the recognition accuracy while improving the recognition speed. Recognition effectiveness of the self-made sign language images and public database by the ShuffleNetv2-YOLOv3 lightweight model was evaluated by F1 score and mAP value. The performance of the ShuffleNetv2-YOLOv3 model was compared with that of the YOLOv3-tiny, SSD, Faster-RCNN, and YOLOv4-tiny model, respectively. The experimental results show that the proposed ShuffleNetv2-YOLOv3 model achieved a good balance between the accuracy and speed of the gesture detection under the premise of model lightweight. The F1 score and mAP value of the ShuffleNetv2-YOLOv3 model were 99.1% and 98.4%, respectively. The gesture detection speed on the GPU reaches 54 frames per second, which is better than other models. The mobile terminal application of the proposed lightweight model was also evaluated. The minimal inference speed of single frame images on the CPU and GPU is 0.14 and 0.025 s per image, respectively. It is only 1/6.5 and 1/8.5 of the running speed of the original YOLOv3 model. The ShuffleNetv2-YOLOv3 lightweight model is conducive to quick, real time, and similar static sign language gesture recognition, laying a good foundation for real-time gesture recognition in the embedded platforms and mobile terminals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

A review of hand gesture and sign language recognition techniques

Article 08 August 2017

Data availability

We used Microsoft Kinect and Leap Motion datasets and Creative Senz3D datasets and referenced [29, 30, 32], and [32] in the text.

References

Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. IEEE (2015)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. IEEE (2017). https://doi.org/10.1109/CVPR.2017.502
Article Google Scholar
Liao, Y., Xiong, P., Min, W., Min, W., Lu, J.: Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access 2, 1–1 (2019)
Google Scholar
Basak, H., Kundu, R., Singh, P.K., Ijaz, M.F., Woniak, M., Sarkar, R.: A union of deep learning and swarm-based optimization for 3d human action recognition. Sci. Rep. 12(1), 1–17 (2022)
Article Google Scholar
Takahashi, T., Kishino, F.: Hand gesture coding based on experiments using a hand gesture interface device. Acm Sigchi Bull. 23(2), 67–74 (1991)
Article Google Scholar
Munib, Q., Habeeb, M., Takruri, B., Al-Malik, H.A.: American sign language (ASL) recognition based on Hough transform and neural networks. Expert Syst. Appl. 32(1), 24–37 (2007)
Article Google Scholar
Pigou, L., Dieleman, S., Kindermans, P. J., and Schrauwen, B.: Sign language recognition using convolutional neural networks. European Conference on Computer Vision. Springer, Cham (2014)
Ghosh, D.K., Ari, S.: On an algorithm for vision-based hand gesture recognition. SIViP 10(4), 655–662 (2016)
Article Google Scholar
Xu, L.K., Zhang, K.Q., Xu, Z.H., Y, G.K.: Convolutional neural network hand gesture recognition algorithm based on surface EMG energy kernel phase diagram. J. Biomed Eng 38(4), 9 (2021)
Google Scholar
Huang, Y., Yang, J.: A multi-scale descriptor for real time RGB-D hand gesture recognition. Pattern Recognit. Lett. 144(10), 4365 (2020)
Google Scholar
Tan, Y.S., Lim, K.M., Tee, C., Lee, C.P., Low, C.Y.: Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput. Appl. 33(10), 5339–5351 (2021)
Article Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE (2018). https://doi.org/10.1109/TPAMI.2019.2929257
Article Google Scholar
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., and Chang, C. L., et al.: MediaPipe Hands: on-device real-time hand tracking. (2020). https://doi.org/10.48550/arXiv.2006.10214
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Comput. Soc. (2013). https://doi.org/10.1109/CVPR.2014.81
Article Google Scholar
Girshick, R.: Fast r-cnn. Comput. Sci. 28, 1440 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NIPS 28, 148 (2016)
Google Scholar
Jiang, Y., Zhao, M., Wang, C., Wei, F., Wang, K., Qi, H.: Diver’s hand gesture recognition and segmentation for human–robot interaction on AUV. Signal Image Video Process. 15(8), 1899–1906 (2021)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR) 2016, 779–788 (2016)
Google Scholar
Redmon, J., and Farhadi, A.: YOLO9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 6517–6525 (2017)
Redmon, J., and Farhadi, A.: Yolov3: an incremental improvement. (2018). arXiv e-prints
Fan, J. J., X, H. W., W, X.H., and W, M.L.: Hand gesture recognition algorithm with ghost feature mapping and channel attention mechanism. J. Comput. Aided Des. Graph. 34(3):12 (2022)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer International Publishing (2016)
Chapter Google Scholar
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., and Weyand, T., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. (2017)
Zhang, X., Zhou, X., Lin, M., & Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. (2017). https://doi.org/10.48550/arXiv.1707.01083
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018, pp. 122–138. Springer International Publishing (2018)
Chapter Google Scholar
Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y.: Acquisition of localization confidence for accurate object detection. (2018)
Rezatofighi, H., Tsoi, N., Gwak, J. Y., Sadeghian, A., & Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. (2019). https://doi.org/10.1609/aaai.v34i07.6999
Marin, G., Dominio, F., and Zanuttigh, P.: Hand gesture recognition with leap motion and kinect devices. IEEE (2015)
Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl. 75(22), 1–25 (2016)
Article Google Scholar
Memo, A., Minto, L., and Zanuttigh, P.: Exploiting silhouette descriptors and synthetic data for hand gesture recognition. (2015)
Memo, A., Zanuttigh, P.: Head-mounted gesture controlled interface for human-computer interaction. Multimed. Tools Appl. 77(1), 27–53 (2018)
Article Google Scholar
ULTRALYTICS. YOLOv3[EB/OL]. [2021–11–15]. https://github.com/ultralytics/yolov3
Jin, F.R., Wang, Y.P., Yong, J.: Real time hand detection method based on lightweight network. Comput. Eng. Appl. 21, 495 (2022)
Google Scholar

Download references

Funding

This study was supported by Research and Development project of Key Core Technology and Common Technology in Shanxi Province (Grant No. 2020XXX001, 2020XXX009) and Major science and technology project of Shanxi Tiandi Coal Machine Equipment Co., LTD M2020-ZD03.

Author information

Authors and Affiliations

College of Mechanical and Vehicle Engineering of Taiyuan University of Technology, Taiyuan, China
Shiniu Sun, Lisheng Han, Jie Wei, Huimin Hao, Jiahai Huang & Wenbin Xin
China Coal Science and Industry Group Taiyuan Research Institute Co. Ltd., Taiyuan, China
Xu Zhou & Peng Kang
Shanxi Tiandi Coal Machine Equipment Co. Ltd., Taiyuan, China
Xu Zhou & Peng Kang

Authors

Shiniu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Lisheng Han
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wei
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Hao
View author publications
You can also search for this author in PubMed Google Scholar
Jiahai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Xin
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Peng Kang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SS proposed this method and trained the model, HL and WJ set up the database, HH and HJ directed the writing of the thesis and helped revise and polish it. XW made suggestions on the model revision of the paper. KP and ZX are responsible for data collection.

Corresponding author

Correspondence to Huimin Hao.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, S., Han, L., Wei, J. et al. ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network. SIViP 17, 2721–2729 (2023). https://doi.org/10.1007/s11760-023-02489-z

Download citation

Received: 22 April 2022
Revised: 23 November 2022
Accepted: 04 January 2023
Published: 12 January 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11760-023-02489-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Convolutional neural network: a review of models, methodologies and applications to object detection

A review of hand gesture and sign language recognition techniques

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Convolutional neural network: a review of models, methodologies and applications to object detection

A review of hand gesture and sign language recognition techniques

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation