Abstract
To better meet the communication needs of hearing impaired people and the public, it is of great significance to recognize sign language more quickly and accurately on embedded platforms and mobile terminals. YOLOv3, raised by Joseph Redmon and Ali Farhadi in 2018, achieved a great improvement in detection speed with considerable accuracy by optimizing Yolo. However, YOLOv3 is still too bloated to use on mobile terminals. A static sign language recognition method based on the ShuffleNetv2-YOLOv3 lightweight model was proposed. The ShuffleNetv2-YOLOv3 lightweight model makes the network lightweight by using ShuffleNetv2 as the backbone network of YOLOv3. The lightweight network improved the recognition speed steeply. Combing with the CIoU loss function, the ShuffleNetv2-YOLOv3 keeps the recognition accuracy while improving the recognition speed. Recognition effectiveness of the self-made sign language images and public database by the ShuffleNetv2-YOLOv3 lightweight model was evaluated by F1 score and mAP value. The performance of the ShuffleNetv2-YOLOv3 model was compared with that of the YOLOv3-tiny, SSD, Faster-RCNN, and YOLOv4-tiny model, respectively. The experimental results show that the proposed ShuffleNetv2-YOLOv3 model achieved a good balance between the accuracy and speed of the gesture detection under the premise of model lightweight. The F1 score and mAP value of the ShuffleNetv2-YOLOv3 model were 99.1% and 98.4%, respectively. The gesture detection speed on the GPU reaches 54 frames per second, which is better than other models. The mobile terminal application of the proposed lightweight model was also evaluated. The minimal inference speed of single frame images on the CPU and GPU is 0.14 and 0.025 s per image, respectively. It is only 1/6.5 and 1/8.5 of the running speed of the original YOLOv3 model. The ShuffleNetv2-YOLOv3 lightweight model is conducive to quick, real time, and similar static sign language gesture recognition, laying a good foundation for real-time gesture recognition in the embedded platforms and mobile terminals.
Similar content being viewed by others
References
Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. IEEE (2015)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. IEEE (2017). https://doi.org/10.1109/CVPR.2017.502
Liao, Y., Xiong, P., Min, W., Min, W., Lu, J.: Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access 2, 1–1 (2019)
Basak, H., Kundu, R., Singh, P.K., Ijaz, M.F., Woniak, M., Sarkar, R.: A union of deep learning and swarm-based optimization for 3d human action recognition. Sci. Rep. 12(1), 1–17 (2022)
Takahashi, T., Kishino, F.: Hand gesture coding based on experiments using a hand gesture interface device. Acm Sigchi Bull. 23(2), 67–74 (1991)
Munib, Q., Habeeb, M., Takruri, B., Al-Malik, H.A.: American sign language (ASL) recognition based on Hough transform and neural networks. Expert Syst. Appl. 32(1), 24–37 (2007)
Pigou, L., Dieleman, S., Kindermans, P. J., and Schrauwen, B.: Sign language recognition using convolutional neural networks. European Conference on Computer Vision. Springer, Cham (2014)
Ghosh, D.K., Ari, S.: On an algorithm for vision-based hand gesture recognition. SIViP 10(4), 655–662 (2016)
Xu, L.K., Zhang, K.Q., Xu, Z.H., Y, G.K.: Convolutional neural network hand gesture recognition algorithm based on surface EMG energy kernel phase diagram. J. Biomed Eng 38(4), 9 (2021)
Huang, Y., Yang, J.: A multi-scale descriptor for real time RGB-D hand gesture recognition. Pattern Recognit. Lett. 144(10), 4365 (2020)
Tan, Y.S., Lim, K.M., Tee, C., Lee, C.P., Low, C.Y.: Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput. Appl. 33(10), 5339–5351 (2021)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE (2018). https://doi.org/10.1109/TPAMI.2019.2929257
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., and Chang, C. L., et al.: MediaPipe Hands: on-device real-time hand tracking. (2020). https://doi.org/10.48550/arXiv.2006.10214
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Comput. Soc. (2013). https://doi.org/10.1109/CVPR.2014.81
Girshick, R.: Fast r-cnn. Comput. Sci. 28, 1440 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NIPS 28, 148 (2016)
Jiang, Y., Zhao, M., Wang, C., Wei, F., Wang, K., Qi, H.: Diver’s hand gesture recognition and segmentation for human–robot interaction on AUV. Signal Image Video Process. 15(8), 1899–1906 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR) 2016, 779–788 (2016)
Redmon, J., and Farhadi, A.: YOLO9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 6517–6525 (2017)
Redmon, J., and Farhadi, A.: Yolov3: an incremental improvement. (2018). arXiv e-prints
Fan, J. J., X, H. W., W, X.H., and W, M.L.: Hand gesture recognition algorithm with ghost feature mapping and channel attention mechanism. J. Comput. Aided Des. Graph. 34(3):12 (2022)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer International Publishing (2016)
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., and Weyand, T., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. (2017)
Zhang, X., Zhou, X., Lin, M., & Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. (2017). https://doi.org/10.48550/arXiv.1707.01083
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018, pp. 122–138. Springer International Publishing (2018)
Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y.: Acquisition of localization confidence for accurate object detection. (2018)
Rezatofighi, H., Tsoi, N., Gwak, J. Y., Sadeghian, A., & Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. (2019). https://doi.org/10.1609/aaai.v34i07.6999
Marin, G., Dominio, F., and Zanuttigh, P.: Hand gesture recognition with leap motion and kinect devices. IEEE (2015)
Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl. 75(22), 1–25 (2016)
Memo, A., Minto, L., and Zanuttigh, P.: Exploiting silhouette descriptors and synthetic data for hand gesture recognition. (2015)
Memo, A., Zanuttigh, P.: Head-mounted gesture controlled interface for human-computer interaction. Multimed. Tools Appl. 77(1), 27–53 (2018)
ULTRALYTICS. YOLOv3[EB/OL]. [2021–11–15]. https://github.com/ultralytics/yolov3
Jin, F.R., Wang, Y.P., Yong, J.: Real time hand detection method based on lightweight network. Comput. Eng. Appl. 21, 495 (2022)
Funding
This study was supported by Research and Development project of Key Core Technology and Common Technology in Shanxi Province (Grant No. 2020XXX001, 2020XXX009) and Major science and technology project of Shanxi Tiandi Coal Machine Equipment Co., LTD M2020-ZD03.
Author information
Authors and Affiliations
Contributions
SS proposed this method and trained the model, HL and WJ set up the database, HH and HJ directed the writing of the thesis and helped revise and polish it. XW made suggestions on the model revision of the paper. KP and ZX are responsible for data collection.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, S., Han, L., Wei, J. et al. ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network. SIViP 17, 2721–2729 (2023). https://doi.org/10.1007/s11760-023-02489-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02489-z