Abstract
Nonstandard miner behavior can have adverse effects on coal mine safety production. Therefore, accurately capturing miner behavior in complex environments is particularly important. In the intelligent mine monitoring system, using visual perception to detect miner behavior is a challenging task due to high behavioral similarity and difficult temporal relationships. In this paper, a new deep learning framework is proposed to construct a coal miner behavior recognition model with a spatio-temporal dual-branch structure and transposed attention representation mechanism. The spatio-temporal dual-branch structure extracts rich spatial semantic information from intrinsic safety video sensor input video sequences while ensuring effective capture of rapidly changing human behavior. Subsequently, considering the discrimination of miner behavior similarity, a merged transposed weighted representation mechanism (TWR) is introduced to guide the model in extracting feature information more strongly related to the classification target, thereby effectively improving the model’s ability to classify highly similar behaviors. Experiments were conducted on UCF101, HMDB51, and a self-built miner behavior dataset, achieving significant improvements compared to other state-of-the-art methods. This collaborative structure further creates a more discriminative behavior detection model, contributing to the reliability of miner behavior detection.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated during the current study.
References
Hu W, Zhai Y, Yun R (2023) Overview and prospect of visual detection methods for underground unsafe behaviors in China. Colliery Mech Electr Technol 44(01):1–7
Ma N, Wu Z, Cheung Y-M, Guo Y, Gao Y, Li J-H, Jiang B-Y (2022) A survey of human action recognition and posture prediction. Tsinghua Sci Technol 27(6):973–1001
Lyu P, He M, Chen X, Bao Y (2018) Development and prospect of wisdom mine. Ind Mine Autom 09:84–88
Zhao A, Dong J, Li J, Qi L, Zhou H (2021) Associated spatio-temporal capsule network for gait recognition. IEEE Trans Multimed 24:846–860
Jiang S, Qi Y, Zhang H, Bai Z, Lu X, Wang P (2020) D3d: dual 3-d convolutional network for real-time action recognition. IEEE Trans Industr Inf 17(7):4584–4593
Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA, Abbasi AA (2024) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl 83(5):14885–14911
Liu B, Jia H, Yang Y, Shen J, Gai M, Song T (2023) Research on miners’dangerous behavior recognition based on improved OpenPose algorithm. Ideo Eng 02:20–23
Luo X, Yuan Y, Wang D, Zhong S, Zhang B, Li Q (2020) Research on continuous learning model of complex behavior recognition in coal mine video. Metal Mine 10:118–123
Wen T, Wang G, Kong X, Liu M, Bo J (2020) Identification of miners’ unsafe behaviors based on transfer learning and residual network. China Saf Sci J 30(03):41–46
Dang W, Zhang Z, Bai S, Gong D, Wu Z (2020) Inspection behavior recognition of underground power distribution room based on improved two-stream CNN method. Ind Mine Autom 46(04):75–80
Huang H, Cheng X, Yun X, Zhou Y, Sun Y (2021) DA-GCN-based coal mine personnel action recognition method. Ind Mine Autom 47(04):62–66
Zhao X, Wu X, Miao J, Chen W, Chen PC, Li Z (2023) Alike: accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans Multimed 25:3101–3112
Dairi A, Harrou F, Khadraoui S, Sun Y (2021) Integrated multiple directed attention-based deep learning for improved air pollution forecasting. IEEE Trans Instrum Meas 70:1–15
Gu F, Lu J, Cai C (2023) A robust attention-enhanced network with transformer for visual tracking. Multimed Tools Appl 82(26):40761–40782
Graham B, El-Nouby A, Touvron H, Stock P, Joulin A, Jégou H, Douze M (2021) Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 12259–12269
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 579–588
Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision, pp 1440–1448
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Lu M, Li. N Z, Wang Y, Pan G (2019) Deep attention network for egocentric action recognition. IEEE Trans Image Process 28(8):3703–3713
Wang X, Zhang L, Huang W, Wang S, Wu H, He J, Song A (2021) Deep convolutional networks with tunable speed–accuracy tradeoff for human activity recognition using wearables. IEEE Trans Instrum Meas 71:1–12
Gao W, Zhang L, Huang W, Min F, He J, Song A (2021) Deep neural networks for sensor-based human activity recognition using selective kernel convolution. IEEE Trans Instrum Meas 70:1–13
Chen Z, Jiang C, Xiang S, Ding J, Wu M, Li X (2019) Smartphone sensor-based human activity recognition using feature fusion and maximum full posteriori. IEEE Trans Instrum Meas 69(7):3992–4001
Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126
Ling H, Wu J, Huang J, Chen J, Li P (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79:5595–5616
Shi J, Wang Y, Yu Z, Li G, Hong X, Wang F, Gong Y (2024) Exploiting multi-scale parallel self-attention and local variation via dual-branch Transformer-CNN structure for face super-resolution. IEEE Trans Multimed 26:2608–2620
Zhang F, Liu N, Duan F (2024) Coarse-to-fine depth super-resolution with adaptive RGB-D feature attention. IEEE Trans Multimed 26:2621–2633
Ramesh M, Mahesh K (2019) Sports video classification with deep convolution neural network: a test on UCF101 dataset. Int J Eng Adv Technol 8(4S2):2249–8958
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp 2556–2563
Zhou Y, Song Y, Chen L, Chen Y, Ben X, Cao Y (2022) A novel micro-expression detection algorithm based on BERT and 3DCNN. Image Vis Comput 119:104378
Xiong Q, Zhang J, Wang P, Liu D, Gao R-X (2020) Transferable two-stream convolutional neural network for human action recognition. J Manuf Syst 56:605–614
Kujani T, Kumar VD (2023) Head movements for behavior recognition from real time video based on deep learning ConvNet transfer learning. J Ambient Intell Humaniz Comput 14(6):7047–7061
Arnab A, Dehghani M, Heigold G, Sun C, Lučić M, Schmid C (2021) Vivit: a video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6836–6846
Duan H, Zhao Y, Xiong Y, Liu W, Lin D (2020) Omni-sourced webly-supervised learning for video recognition. In: European Conference on Computer Vision, pp 670–688
Selvaraju R-R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626
Acknowledgements
This study was supported by the National Natural Science Foundation of China (51804249), Shaanxi Province Qin Chuang yuan “Scientists + Engineers” Team Construction (2022KXJ-38), the Natural Science Basic Research Program of Shaanxi (Grant No. 2021JQ-574).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Confict of interest
The authors declare that they have no confict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., Liu, Y., Yang, Y. et al. Dual-branch deep learning architecture enabling miner behavior recognition. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19164-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19164-1