Abstract
Industrial smoke emissions present a serious threat to natural ecosystems and human health. Prior works have shown that using computer vision techniques to identify smoke is a low-cost and convenient method. However, translucent smoke detection is a challenging task because of the irregular contours and complex motion state. To overcome these problems, we propose a novel spatiotemporal cross network (STCNet) to recognize industrial smoke emissions. The proposed STCNet involves a spatial pathway to extract appearance features and a temporal pathway to capture smoke motion information. Our STCNet is more targeted and goal oriented for dealing with translucent, nonrigid smoke objects. The spatial path can easily recognize obvious nonsmoking objects such as trees and buildings, and the temporal path can highlight the obscure traces of motion smoke. Our STCNet achieves the mutual guidance of multilevel spatiotemporal information by bidirectional feature fusion on multilevel feature maps. Extensive experiments on public datasets show that our STCNet achieves clear improvements against the best competitors by 6.2%. We also perform in-depth ablation studies on STCNet to explore the impacts of different feature fusion methods for the entire model. The code will be available at https://github.com/Caoyichao/STCNet.
Similar content being viewed by others
References
Cao Y, Tang Q, Wu X, Lu X (2021) EFFNet: enhanced feature foreground network for video smoke source prediction and detection. IEEE Trans Circ Syst Video Technol https://doi.org/10.1109/TCSVT.2021.3083112
Carreira J, Zisserman A (2017) Quo Vadis, action recognition? A new model and the kinetics dataset, presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 577–585
Dimitropoulos K, Barmpoutis P, Grammalidis N (2017) Higher order linear dynamical systems for smoke detection in video surveillance applications. IEEE Transactions on Circuits and Systems for Video Technology 27(5):1143–1154. https://doi.org/10.1109/TCSVT.2016.2527340
Donahue J, et al (2015) Long-term recurrent convolutional networks for visual recognition and description, presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Goldberg Y (2017) Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies 10(1):1–309
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hsu Y-C, et al (2020) RISE video dataset: Recognizing industrial smoke emissions. arXiv:2005.06111. Accessed 09 July 2020
Hussein N, Gavves E, Smeulders AWM (2019) Timeception for complex action recognition, presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 254–263
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection, presented at the Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Lin J, Gan C, Han S (2019) TSM: Temporal shift module for efficient video understanding. In: 2019 IEEE/CVF international conference on computer vision (ICCV). Seoul, Korea (South), pp 7082–7092 https://doi.org/10.1109/ICCV.2019.00718
Lin G, Zhang Y, Xu G, Zhang Q (2019) Smoke detection on video sequences using 3D convolutional neural networks. Fire Technol https://doi.org/10.1007/s10694-019-00832-w
Liu Y, Qin W, Liu K, Zhang F, Xiao Z (2019) A dual convolution network using dark channel prior for image smoke classification. IEEE Access 7:60697–60706. https://doi.org/10.1109/ACCESS.2019.2915599
Liu P, Yu H, Cang S (2019) Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances. Nonlinear Dyn 98(2):1447–1464. https://doi.org/10.1007/s11071-019-05170-8
Long C et al (2010) Transmission: A new feature for computer vision based smoke detection. In: Wang FL, Deng H, Gao Y, Lei J (eds) Artificial intelligence and computational intelligence, vol 6319. Springer, Berlin Heidelberg, pp 389–396
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2019) MobileNetV2: inverted residuals and linear bottlenecks. arXiv:1801.04381. Accessed 03 Sep 2020
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV), pp 618–626 https://doi.org/10.1109/ICCV.2017.74
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, pp 568–576
Sobral A, Vacavant A (2014) A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Computer Vision and Image Understanding 122:4–21. https://doi.org/10.1016/j.cviu.2013.12.005
Sun L, Zhao C, Yan Z, Liu P, Duckett T, Stolkin R (2019) A novel weakly-supervised approach for RGB-D-based nuclear waste object detection. IEEE Sensors Journal 19(9):3487–3500. https://doi.org/10.1109/JSEN.2018.2888815
Tian H, Li W, Wang L, Ogunbona P (2014) Smoke detection in video: An image separation approach. International Journal of Computer Vision 106(2):192–209
Tian H, Li W, Ogunbona P, Wang L (2015) Single image smoke detection. Computer vision - ACCV 2014:87–101
Tian H, Li W, Ogunbona PO, Wang L (2018) Detection and separation of smoke from single image frames. IEEE Transactions on Image Processing 27(3):1164–1177. https://doi.org/10.1109/TIP.2017.2771499
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 4489–4497 https://doi.org/10.1109/ICCV.2015.510
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 7794–7803 https://doi.org/10.1109/CVPR.2018.00813
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, HI, pp 5987–5995 https://doi.org/10.1109/CVPR.2017.634
Xu G, Zhang Q, Liu D, Lin G, Wang J, Zhang Y (2019) Adversarial adaptation from synthesis to reality in fast detector for smoke detection. IEEE Access 7:29471–29483
Yin Z, Wan B, Yuan F, Xia X, Shi J (2017) A deep normalization and convolutional neural network for image smoke detection. IEEE Access 5:18429–18438
Yuan FN (2012) A double mapping framework for extraction of shape-invariant features based on multi-scale partitions with AdaBoost for video smoke detection. Pattern Recognition 45(12):4326–4336
Yuan F, Xia X, Shi J, Li H, Li G (2017) Non-linear dimensionality reduction and gaussian process based classification method for smoke detection. IEEE Access 5(99):6833–6841
Yuan F, Zhang L, Xia X, Huang Q, Li X (2019) A wave-shaped deep neural network for smoke density estimation. IEEE Trans Image Process 1 https://doi.org/10.1109/TIP.2019.2946126
Zhao Y, Ma J, Li X, Zhang J (2018) Saliency detection and deep learning-based wildfire identification in UAV imagery. Sensors (Basel) 18(3)
Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision-ECCV 2018, vol 11205. Springer International Publishing, Cham, pp 831–846
Zolfaghari M, Singh K, Brox T (2018) ECO: Efficient convolutional network for online video understanding. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision-ECCV 2018, vol 11206. Springer International Publishing, Cham, pp 713–730
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and is a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions. We thank the Big Data Center of Southeast University for providing facility support for the numerical calculations in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, Y., Tang, Q. & Lu, X. STCNet: spatiotemporal cross network for industrial smoke detection. Multimed Tools Appl 81, 10261–10277 (2022). https://doi.org/10.1007/s11042-021-11766-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11766-3