Abstract
The traditional behavior recognition model has the disadvantage that it can’t get the internal relationship between similar behaviors, such as smoking, pen, chin and the clamped objects, which limits the actual landing of such fine and complex behaviors as smoking recognition. To solve these problems, this paper puts forward the heterogeneous algorithm HMMA-NET (Heterogeneous multi-task smoking behavior recognition model combined with Attention), which consists of two modules: behavior prior and local detection, aiming at establishing the relationship between behavior and behavior objects. CNN combined with channel attention mechanism is used in both behavior prior module and local detection module. The former uses sign language semantic features to complete the primary prior of behavior according to the obtained behavior affinity vector field, while the latter designs network optimization such as fast Edgebox to obtain candidate areas, so as to transfer component information and achieve the goal of fast fine-grained detection. Finally, the two modules use SaaS mode to complete association recognition. Experiment shows that the algorithm can recognize complex actions effectively, and its accuracy is still equal to or even better than that of a single model, in which the accuracy of detecting smoking behavior scenes is 96.10%, and the false detection rate is 3.6%. The algorithm has been commercialized and applied to the actual monitoring of petrochemical scenes. The running results show that the algorithm can maintain good real-time performance and generalization ability.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08616-8/MediaObjects/521_2023_8616_Fig9_HTML.png)
Similar content being viewed by others
Data availability statement
The data of this paper can be obtained through the email to the authors.
References
Wang L, Xiong Y et al. (2016) Temporal segment networks: towards good practices for deep action recognition. In European conference on computer vision. pp 20–36
Li XB (2018) Study on heterogeneous multitask learning and task grouping efficiency (in chinese)" [Master’s thesis]. Yanshan University
Caruana R (1993) Multitask learning: a knowledge-based source of inductive bias1. In: Proceedings of the Tenth International Conference on Machine Learning. pp 41–48
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. pp 160–167
Liu X, Gao J et al. (2015) Representation learning using multi-task deep neural networks for semantic classification and information retrieval
Melvin J, Mike S et al (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351
Seltzer ML, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. pp 6965–6969
Zhang K, Zhang Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Girshick R, Donahue J et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision. pp 2650–2658
Li B, Shen C et al. (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1119–1127
Kopilovic I, Vagvolgyi B (2000) Application of panoramic annular lens for motion analysis tasks: surveillance and smoke detection. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000. pp 714–717
Truong TX, Kim JM (2010) An early smoke detection system based on motion estimation. In: International Forum on Strategic Technology 2010. pp 437–440
Yang J, Chen F et al. (2008) Visual-based smoke detection using support vector machine. In: 2008 Fourth International Conference on Natural Computation. pp 301–305
Wei Y, Chunyu Y et al. (2009) Based on wavelet transformation fire smoke detection method. In: 2009 9th International Conference on Electronic Measurement & Instruments, pp 2–872
Zhang B, Wei W et al (2018) Early wildfire smoke detection based on multi-feature fusion (in chinese). J Chengdu Univ Inf Technol 33(4):408–412
Wang F (2020) Research and Implementation of Forest Fire Detection System Based on Deep Learning (in chinese). [Master’s thesis], University of Electronic Science and Technology of China
Li P, Zhang J et al (2021) Smoke detection method based on optical flow improvement and YOLOv3. J Zhejiang Univ Technol 49:9–15
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767
Li C, Yang B et al (2020) Real-time video-based smoke detection with high accuracy and efficiency. Fire Saf J 117:103184
Funding
This research was supported by the Beijing Municipal Natural Science Foundation [4202028]; General Project of the National Language Committee [YB145-25]; National Natural Science Foundation of China [62036001]; Support Plan for Beijing Municipal University Faculty Construction—High-Level Scientific Research and Innovation Team Project [BPHR20220121]; Premium Funding Project for Academic Human Resources Development in Beijing Union University [BPHR2019CZ05]; Jiangsu Province Key R&D Program (Industry Prospects and Key Core Technologies) [BE2020047]; and the characteristic-disciplines oriented research project in Beijing Union University [KYDE40201702].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qiu, X., Kang, X., Zhang, Y. et al. Heterogeneous multi-task smoking behavior recognition model combined with attention. Neural Comput & Applic 35, 25175–25187 (2023). https://doi.org/10.1007/s00521-023-08616-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08616-8