Dual-branch deep learning architecture enabling miner behavior recognition

Wang, Zheng; Liu, Yan; Yang, Yi; Duan, Siyuan

doi:10.1007/s11042-024-19164-1

Dual-branch deep learning architecture enabling miner behavior recognition

Published: 22 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zheng Wang ORCID: orcid.org/0000-0001-9052-0015¹,
Yan Liu²,
Yi Yang² &
…
Siyuan Duan²

35 Accesses
Explore all metrics

Abstract

Nonstandard miner behavior can have adverse effects on coal mine safety production. Therefore, accurately capturing miner behavior in complex environments is particularly important. In the intelligent mine monitoring system, using visual perception to detect miner behavior is a challenging task due to high behavioral similarity and difficult temporal relationships. In this paper, a new deep learning framework is proposed to construct a coal miner behavior recognition model with a spatio-temporal dual-branch structure and transposed attention representation mechanism. The spatio-temporal dual-branch structure extracts rich spatial semantic information from intrinsic safety video sensor input video sequences while ensuring effective capture of rapidly changing human behavior. Subsequently, considering the discrimination of miner behavior similarity, a merged transposed weighted representation mechanism (TWR) is introduced to guide the model in extracting feature information more strongly related to the classification target, thereby effectively improving the model’s ability to classify highly similar behaviors. Experiments were conducted on UCF101, HMDB51, and a self-built miner behavior dataset, achieving significant improvements compared to other state-of-the-art methods. This collaborative structure further creates a more discriminative behavior detection model, contributing to the reliability of miner behavior detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rep-YOLO: an efficient detection method for mine personnel

Article 06 February 2024

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

Article 02 January 2024

Local motion feature extraction and spatiotemporal attention mechanism for action recognition

Article 19 December 2023

Data availability

Data sharing not applicable to this article as no datasets were generated during the current study.

References

Hu W, Zhai Y, Yun R (2023) Overview and prospect of visual detection methods for underground unsafe behaviors in China. Colliery Mech Electr Technol 44(01):1–7
Google Scholar
Ma N, Wu Z, Cheung Y-M, Guo Y, Gao Y, Li J-H, Jiang B-Y (2022) A survey of human action recognition and posture prediction. Tsinghua Sci Technol 27(6):973–1001
Article Google Scholar
Lyu P, He M, Chen X, Bao Y (2018) Development and prospect of wisdom mine. Ind Mine Autom 09:84–88
Google Scholar
Zhao A, Dong J, Li J, Qi L, Zhou H (2021) Associated spatio-temporal capsule network for gait recognition. IEEE Trans Multimed 24:846–860
Article Google Scholar
Jiang S, Qi Y, Zhang H, Bai Z, Lu X, Wang P (2020) D3d: dual 3-d convolutional network for real-time action recognition. IEEE Trans Industr Inf 17(7):4584–4593
Article Google Scholar
Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA, Abbasi AA (2024) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl 83(5):14885–14911
Article Google Scholar
Liu B, Jia H, Yang Y, Shen J, Gai M, Song T (2023) Research on miners’dangerous behavior recognition based on improved OpenPose algorithm. Ideo Eng 02:20–23
Google Scholar
Luo X, Yuan Y, Wang D, Zhong S, Zhang B, Li Q (2020) Research on continuous learning model of complex behavior recognition in coal mine video. Metal Mine 10:118–123
Google Scholar
Wen T, Wang G, Kong X, Liu M, Bo J (2020) Identification of miners’ unsafe behaviors based on transfer learning and residual network. China Saf Sci J 30(03):41–46
Google Scholar
Dang W, Zhang Z, Bai S, Gong D, Wu Z (2020) Inspection behavior recognition of underground power distribution room based on improved two-stream CNN method. Ind Mine Autom 46(04):75–80
Google Scholar
Huang H, Cheng X, Yun X, Zhou Y, Sun Y (2021) DA-GCN-based coal mine personnel action recognition method. Ind Mine Autom 47(04):62–66
Google Scholar
Zhao X, Wu X, Miao J, Chen W, Chen PC, Li Z (2023) Alike: accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans Multimed 25:3101–3112
Article Google Scholar
Dairi A, Harrou F, Khadraoui S, Sun Y (2021) Integrated multiple directed attention-based deep learning for improved air pollution forecasting. IEEE Trans Instrum Meas 70:1–15
Article Google Scholar
Gu F, Lu J, Cai C (2023) A robust attention-enhanced network with transformer for visual tracking. Multimed Tools Appl 82(26):40761–40782
Article Google Scholar
Graham B, El-Nouby A, Touvron H, Stock P, Joulin A, Jégou H, Douze M (2021) Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 12259–12269
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 579–588
Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision, pp 1440–1448
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Lu M, Li. N Z, Wang Y, Pan G (2019) Deep attention network for egocentric action recognition. IEEE Trans Image Process 28(8):3703–3713
Article MathSciNet Google Scholar
Wang X, Zhang L, Huang W, Wang S, Wu H, He J, Song A (2021) Deep convolutional networks with tunable speed–accuracy tradeoff for human activity recognition using wearables. IEEE Trans Instrum Meas 71:1–12
Google Scholar
Gao W, Zhang L, Huang W, Min F, He J, Song A (2021) Deep neural networks for sensor-based human activity recognition using selective kernel convolution. IEEE Trans Instrum Meas 70:1–13
Google Scholar
Chen Z, Jiang C, Xiang S, Ding J, Wu M, Li X (2019) Smartphone sensor-based human activity recognition using feature fusion and maximum full posteriori. IEEE Trans Instrum Meas 69(7):3992–4001
Article Google Scholar
Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126
Article MathSciNet Google Scholar
Ling H, Wu J, Huang J, Chen J, Li P (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79:5595–5616
Article Google Scholar
Shi J, Wang Y, Yu Z, Li G, Hong X, Wang F, Gong Y (2024) Exploiting multi-scale parallel self-attention and local variation via dual-branch Transformer-CNN structure for face super-resolution. IEEE Trans Multimed 26:2608–2620
Article Google Scholar
Zhang F, Liu N, Duan F (2024) Coarse-to-fine depth super-resolution with adaptive RGB-D feature attention. IEEE Trans Multimed 26:2621–2633
Article Google Scholar
Ramesh M, Mahesh K (2019) Sports video classification with deep convolution neural network: a test on UCF101 dataset. Int J Eng Adv Technol 8(4S2):2249–8958
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp 2556–2563
Zhou Y, Song Y, Chen L, Chen Y, Ben X, Cao Y (2022) A novel micro-expression detection algorithm based on BERT and 3DCNN. Image Vis Comput 119:104378
Article Google Scholar
Xiong Q, Zhang J, Wang P, Liu D, Gao R-X (2020) Transferable two-stream convolutional neural network for human action recognition. J Manuf Syst 56:605–614
Article Google Scholar
Kujani T, Kumar VD (2023) Head movements for behavior recognition from real time video based on deep learning ConvNet transfer learning. J Ambient Intell Humaniz Comput 14(6):7047–7061
Article Google Scholar
Arnab A, Dehghani M, Heigold G, Sun C, Lučić M, Schmid C (2021) Vivit: a video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6836–6846
Duan H, Zhao Y, Xiong Y, Liu W, Lin D (2020) Omni-sourced webly-supervised learning for video recognition. In: European Conference on Computer Vision, pp 670–688
Selvaraju R-R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (51804249), Shaanxi Province Qin Chuang yuan “Scientists + Engineers” Team Construction (2022KXJ-38), the Natural Science Basic Research Program of Shaanxi (Grant No. 2021JQ-574).

Author information

Authors and Affiliations

Xi’an Key Laboratory of Electrical Equipment Condition Mornitoring and Power Supply Security, Xi’an University of Science and Technology, Xi’an, 710054, China
Zheng Wang
College of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an, 710054, China
Yan Liu, Yi Yang & Siyuan Duan

Authors

Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Wang.

Ethics declarations

Confict of interest

The authors declare that they have no confict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Z., Liu, Y., Yang, Y. et al. Dual-branch deep learning architecture enabling miner behavior recognition. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19164-1

Download citation

Received: 10 September 2023
Revised: 02 January 2024
Accepted: 02 April 2024
Published: 22 April 2024
DOI: https://doi.org/10.1007/s11042-024-19164-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-branch deep learning architecture enabling miner behavior recognition

Abstract

Access this article

Similar content being viewed by others

Rep-YOLO: an efficient detection method for mine personnel

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

Local motion feature extraction and spatiotemporal attention mechanism for action recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Confict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual-branch deep learning architecture enabling miner behavior recognition

Abstract

Access this article

Similar content being viewed by others

Rep-YOLO: an efficient detection method for mine personnel

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

Local motion feature extraction and spatiotemporal attention mechanism for action recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Confict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation