Abstract
In many smoke detection fields, such as fire detection in confined space of aircraft cargo hold, the false alarm rate is still high. In the closed and dark environment of the confined space cabin, the traditional video smoke detection method is difficult to find the fire early because of the limitation of lighting conditions. The advantage of fire detection based on infrared video image is that it does not need lighting conditions and has better performance in dark environment. There is a rapid temperature rise process in the confined space at the beginning of the fire, which is more easily captured by infrared cameras. However, there is little research on infrared frame detection methods in confined space. Therefore, based on the limited space environment of aviation industry, this paper studies the smoke detection problem under the infrared framework, and proposes a high-precision fire and smoke image detection algorithm based on infrared double convolution neural network. By modeling the texture features of neural network and infrared smoke frames, and using video frames as an auxiliary means to increase the number of available training images, the problem of insufficient infrared video data sets is solved. The experimental results show that the detection effect of this method is better than other comparison algorithms in limited space, and the detection false alarm rate is effectively reduced.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Early detection of fire or smoke is critical to timely intervention to avoid large-scale damage. There are several methods and tools used to identify fire or smoke in a visual scene of confined spaces. As in aircraft cargo hold, traditional fire and smoke detection methods mostly use sensor-based detection tools, and their main disadvantages are limited detection range and high false alarm rate according to the U.S. Federal Aviation Administration (FAA) Technology Center’s report [1]. The basic framework of the traditional method is shown in Fig. 1. In addition, such detection sensors cannot provide sufficient information about smoke and fire. Usually, when a fire occurs, infrastructure or materials will block people’s view, and the initial smoke will appear in the surveillance cameras. Therefore, detecting smoke from video surveillance can provide early warning of fire hazards. In the early detection stage of smoke, many features are extracted from motion areas of images. At present, video-based fire smoke detection algorithms are usually based on one or more characteristics of smoke, and make decisions through direct or indirect methods. Toreyin et al. [2] mainly use motion and edge features for smoke detection. Due to the need to analyze the background of the integrated scene, the scope of application of this algorithm is limited. Chenebert et al. [3] proposed a non-temporal image fire detection method. The advantage of this method is that it does not require any temporal information. Fujiwara et al. [4] proposed a smoke extraction technique that uses the concept of fractal coding in the image area. This method is not suitable for low-contrast or blurred smoke images. Yuan [5] proposed a video smoke detection method using cumulative motion model and block-based algorithm to improve detection efficiency. Toreyin [6] proposed a one-dimensional and two-dimensional wavelet transform method based on the typical characteristics of smoke. It also uses a support vector machine as a smoke classifier. This detection technique can only distinguish between smoke and non-smoke, but cannot specifically detect the specific smoke position. Tung et al. [7] proposed to perform smoke detection by segmenting and classifying the motion area. Xu [8] proposed a BP neural network training method based on the static and dynamic characteristics of smoke, so that the detection system has better accuracy and anti-interference ability. Frizzi [9] and others proposed a fire video smoke detection model based on convolutional neural network. In this method, the training model is obtained by directly performing operations on the original frame, instead of extracting features separately. Wang HB improves the photoelectric smoke detection algorithm and uses the dual-wavelength mechanism to reduce the false alarm rate of fire detection in confined space [10]. Cheoi [11] also used convolutional neural networks to process fire images and extract feature data of suspected fire areas. Maksymiv [12] first used AdaBoost and LBP algorithm to make a preliminary judgment on the image, and then carried out feature extraction on the image, which improved the recognition accuracy. Feiniu Yuan [13] uses convolutional neural networks to classify smoke images and compares them with other neural networks. Chang J et al. [14] proposed a new type of infrared detection system based on the inclined porthole and the elliptical bow under the nose of the aircraft. This new type of infrared optical detection system has met the requirements of fire warning detection, but this method cannot detect smoke. It is not suitable for the cargo hold of the aircraft. Hackner A et al. [15] realized fire detection based on flame image information and smoke gas concentration. This method is more effective than differential image (DI) mode fire warning technology. Emily J et al. [16] simulated indoor building fires and used image processing and smoke characteristic detection techniques to determine whether there was a burning phenomenon in the target environment. To sum up, it is concluded that the accuracy of fire detection results is related to fire detection algorithm and many fire detection characteristic parameters when conducting fire detection tasks in confined space environment. However, the above algorithms do not take into account the problem that video signals are weak and difficult to process when visible light is insufficient. Fire detection in confined spaces such as aircraft cargo holds, warehouses, hangars, etc. Due to the more complicated changes of smoke color, texture, height and other characteristics, and the possibility of insufficient or no light, it will become more difficult for us to use traditional video smoke detection technology. Therefore, we consider using infrared frames to detect possible fires in such spaces. In order to detect smoke in infrared images, the key problem that video detection algorithm needs to solve is the extraction and recognition of effective smoke feature parameters in infrared images.
2 Source of the Data
In this part, the construction of the source of the data is introduced in details. And the video source and infrared source of the data are compared.
2.1 Image Data Collection
Image recognition tasks using deep learning methods usually have a large number of parameters. As far as we know, there is no publicly available infrared data set containing fire smoke information. Therefore, we created a data set consisting of 2000 high-resolution indoor confined space infrared images, which is called Train2. Figure 2 shows some sample images of Train2. Train2 includes the infrared image of the confined space of the fire detection laboratory warehouse and the infrared image of the confined space of the simulated aircraft cargo hold. So far, our data set covers infrared smoke frames in different scenes, including confined space fires and moving objects. The photo was taken by a professional uncooled infrared camera.
2.2 Visual and Infrared Source Comparison
Since the infrared image and visible image of the object are similar to some extent, such as shape, size, edge, motion characteristics, etc., we use infrared camera and visible camera to shoot the same smoke scene for comparison, as shown in Fig. 3. At the same time, we also selected some fire smoke image data in indoor and outdoor space from the network database as comparative experimental data.
The infrared image and the visual image of the object are similar to some extent, such as shape, size, edge, movement characteristics, etc. Figure 3 is a comparison of the same smoke scene using an infrared camera and a video camera. From the perspective of similarity, the trucks in the infrared image and the visual image in Fig. 3 are very similar in shape, size, and structural features, and these features are consistent in information processing. However, the smoke characteristics in the infrared image are not obvious. It is difficult to obtain smoke characteristic information directly from infrared images. We found that when the picture changes continuously, the position of the smoke can be identified by the movement characteristics of the smoke. This method can effectively learn motion characteristics.
3 Deep Convolution Dual-Network for Smoke Detection
Although deep learning methods, especially convolutional neural network (CNN), have achieved good results in solving visual recognition problems, few studies have applied these methods to smoke recognition in infrared images. In order to detect smoke in infrared images, this paper proposes a virtual fire detection system that includes a double convolution network with motion and texture feature extraction mechanism, and verifies it in a limited space environment. We found that, unlike video image detection, smoke and flame images in infrared images have very similar characteristics, such as the movement characteristics and diffusion characteristics of smoke. In addition, both flame and smoke generate a lot of heat, but the heat of flame is obviously higher than that of smoke. Because of the above characteristics, the smoke in infrared images can be easily filtered as noise in the recognition algorithm, and it is difficult to distinguish between flame and smoke. In this paper, the proposed dual-depth CNN model is applied to the classification of fire and smoke images in confined space to improve the detection efficiency. This method has the ability to classify smoke and fire images at the same time, and has many advantages and performance over the existing CNN model based on visible light images in smoke and fire identification in confined spaces. In our model, learning texture and motion features from source infrared frames containing smoke is carried out by a dual network called texture network and motion network. The CNN1 network is composed of 8 layers, which is used for texture extraction, and its depth is different from the original VGG network [17]. CNN2 network has a different structure from CNN1, and has three convolution layers, a collection layer for motion extraction and a fully connected layer for classification [18].
3.1 CNN for Texture Features
The dual CNN proposed in this paper consists of both motion and texture network and the fire detection network learning the spatial representation of the fire from source video frames as an auxiliary. CNN is used in these networks which are multiple layer neural networks composed of convolutional layers and pooling layers. The basic network structure of the texture feature of the smoke detection algorithm based on deep convolutional neural network proposed in this paper is shown in Fig. 4.
As depicted in [19], the texture detection network is composed of multiple layers with different functions, which are used to for deep learning of smoke features. The network contains five convolutional layers marked convX and two full connection layers. In convolutional layers, 5 × 5 sized filters are used for the first two steps with 32 feature maps. In the third convolution layer, the number of filters is doubled. Therefore, the feature maps are multiplied by 2 and 4 in the following convolutional layers. For the last layer of the output layer, it is designed as 2 nodes, and is fully connected with 2176 neurons of the full connection layer. For the classifier set by the output layer, the result obtained by the full connection layer can be regarded as the high-level feature information of the smoke image extracted layer by layer from the previous several layers of the convolution layer and the down-sampling progress, and mapped to a 2176, the feature vector of the dimension. Then, for the feature vector, a binary classification process is performed.
In order to reduce the amount of calculation and improve the training speed of the entire network, this paper adopts the correction linear unit (ReLUs) function [20] as the activation function of the smoke detection deep convolutional neural network model, as shown in formula 1. This function has high applicability to deep data convolutional network models with large data volume.
When the parameter a is an positive number, the equation is computed as 1. Thus, the computation expression of one neural in the convolution layer can be expressed by formula 2. Where the parameter M denotes the depth of the filter, v and n is the weight vector and the bias term.
3.2 CNN for Motion Features
The details of the motion feature network are similar to the structure of the previous network, but the difference lies in the input data set and pool layer. The movement characteristics of the smoke are closely related to the continuity of the smoke video. It is necessary to effectively compare and analyze the previous frame image and the next frame image and obtain the motion area of the image. In most fires, in the initial stage of combustion, the smoke produced is heated and floats upward, so the most obvious feature of smoke is upward movement. In the network initialization stage, the early smoke video of the fire is selected, and the motion feature network is input in chronological order. Since the video smoke image and the infrared smoke image have similarities in the motion characteristics of the smoke, the method of combining the video image and the infrared image is used to improve the processing efficiency and at the same time solve the problem of insufficient infrared video images. Figure 5 shows part of the smoke image used for learning enhancement in the fire video.
In the case that the amount of smoke video data is large enough, in order to avoid overfitting, we added a pooling layer to the second CNN network. Figure 6 shows the details of the CNN network used for motion learning from video and infrared frames. 3 convolutional layers and 2 pooling layers are introduced and each convolutional layer is followed by a nonlinear ReLu layer. The middle part is the maximum pooling layer and the convolutional layer, repeated twice, and finally the fully connected layer. The step size in all middle layers are limited to 2. The original input frames of the network are infrared thermal image after gray processing.
3.3 Dual CNN Model for Smoke Detection
The block diagram of the dual CNN model for smoke detection is shown in Fig. 7. First, manually tag the video and infrared smoke images, and then train the first CNN separately for motion feature extraction. The second CNN is trained by the infrared frames individually and output the texture feature of the smoke. After that, the output of CNN1 and CNN2 are concatenated to formulate the joint features by different combination methods.
When the output feature maps have the same resolution, the combination of the two can be achieved through simple superposition operations. In order to better preserve the two smoke features, we propose a superposition method to describe the joint features, as shown below.
The stack operation is implemented after the full-connection layer for the consideration of the complexity of network structures. The result of the concatenation is demonstrated in our experiments. The training process is solved through the parameter θ which can be achieved by minimum calculation of the loss function [21]. The loss function is computed in the end with the estimated value inputs and the identities which are depicted in L(θ).
where θ denotes the weights of the vector in current network. The training aim is to get a θ corresponding to the minimum L(θ). The formula is processed by stochastic gradient funciton with prpogations [22]. At the beginning of training, the network model is loaded first, and then the model parameters are initialized. The momentum factor is set to 0.935, the weight attenuation coefficient is set to 0.0005, the initial learning rate is set to 0.01, the training times are set to 300 epochs, and the batch size is set to 32 according to the CPU and video memory.
4 Experiments
4.1 Experiment Data
The data sets used in the CNN2 of motion feature learning experiment is partly from Toreyin et al. [6], and the rest from the experiments and collection online. The data sets for CNN1 is all from our own experiments. The input infrared and video frames are initialized by 238 × 238 size. Considering the different characteristics of CNN1 and CNN2 structure learning objects in the algorithm proposed in this paper, we set the learning rate of CNN1 and CNN2 to 0.01 and 0.02 respectively. The Train1 contains 6000 images which get from the smoke videos on the website of Key Laboratory of fire science, University of science and technology of China. The data sets contain training and testing frames respectively, and nominated as TrainX and TestX for CNN1 and CNN2 training. The details of data sets used in the paper is shown in Table 1.
4.2 Evaluation Protocol
The algorithms in this paper are evaluated through the famous criteria in image level. Firstly, we use the true positive ratio (TPR) and the true negative ratio (TNR) to evaluate our method as depicted in following.
where TP is the number of images which contains smoke and correctly identified. TN is the number of images which are doesn’t contain smoke and correctly identified. FN is the number of images which contains smoke but misidentified. FP is the number of images which doesn’t contain smoke but misidentified.
The false alarm rate can be computed as follows.
In addition, as in [23], we exploit the receiver operator characteristic (ROC) to evaluate the performance of smoke detection. ROC curves are achieved based on the experiments of our data sets.
4.3 Results and Analysis
The dual CNN model is experimented under two different conditions with both video and infrared frames and infrared frames only. The TPR and TNR are computed according to Eq. (5) and (6), as shown in Fig. 8. It can be observed that the accuracy gets higher when use video frames as an aid for training.
The relationship between the number of iterations and the value of the loss during network learning period and the ROC curves of the final classification output under these conditions are shown in Fig. 9. The CNN1 is trained under two different conditions, the one is with infrared images only, the other is with both video and infrared images. We can see that it is useful to train the CNN1 network for the motion characteristic extraction. The detection results obtained by our algorithm are shown in Fig. 10. It is evident for the method we proposed works well in smoke detection. It has good performance under various real scenes.
We use ablation experiments to verify the role of the selected characteristic parameters in this paper. When the detection accuracy and detection speed are used as the performance index parameters of the detection model, the detection accuracy evaluation indexes include precision, recall, F1 curve, PR curve, and map (mean average precision); The detection speed indicators include frames per second (FPS) and model floating-point operations (flops). When the map is higher, the FPS is larger, which means the model detection performance is stronger. In this paper, the network model using smoke motion features and texture features is trained and compared with the network model using motion features, texture features, diffusion features and edge features. The results are shown in Table 2.
In Table 2, Ex1 represents the model without the edge feature, Ex2 represents the model without the diffusion feature, Ex3 represents the model without the texture feature and Ex4 represents the model without the motion feature. From Table 2, we can see that the three weight files are in precision, recall and map_ The results on 0.5 are similar, but map_ 0.5:0.95, there is a big gap between the three, map_ 0.5:0.95 means to calculate the map under 10 IOU thresholds, and then calculate the average value. This index can better reflect the accuracy of the model. cThrough comparative analysis, we know that EX3 and EX4 have better effects, because the motion and texture features of infrared images are easier to capture than edge features and diffusion features, and can better indicate whether a fire occurs.
We compare the algorithm proposed in this paper with other advanced deep learning algorithms in two performance parameters of fire image detection which are recall and precision of the detetion process, as shown in Fig. 11. As can be seen from the figure, FCN-8 s algorithm has the highest accuracy and is better than the other three algorithms. The performance of the algorithm proposed in this paper is second only to FCN-8 s algorithm and better than GoogLeNet and VGG-16 algorithm. However, the computation of FCN-8 s algorithm is much higher than that of the algorithm used in this paper, and the training time is the longest. The algorithm in this paper gradually converges after the number of iterations exceeds 8000, while the former needs more than 10,000. Combining the performance parameters such as convergence speed and accuracy, the comprehensive performance of this algorithm is the best. The computation of this algorithm is larger than VGG-16 algorithm and lower than GooLeNet algorithm. This is because the algorithm proposed in this paper has two convolution network channels, which are trained with different characteristic graphs respectively, so as to improve the calculation speed and reduce the network complexity. At the same time, this dual channel structure also separates the training of video images and infrared images, avoiding the mutual interference of their features, so as to ensure the accuracy and reliability of the network.
5 Conclusion
Smoke varies greatly in many features and is vulnerable to interference, so accurately identifying smoke at the early stages in real scenes remains a challenging task. In this paper, the dual convolutional neural network model is proposed and successfully applied to fire smoke detection with infrared frames. Part of the video frames have been added to expand the data sets. The learning process is conducted with both visual and infrared frames, which solves the limitation of infrared frames for motion feature extraction of smoke and improves the timeliness of detection. Experiment results shown that the dual CNN is effective for smoke detection where data is scarce, and the accuracy of classification is guaranteed. Our algorithm has a good application prospect where video detection means are limited such as dark closed environment fire detection. In future research, the model can be further adjusted and improved according to the specific application environment.
Data availability
The public part of the image datas used in this paper can be extracted from the URL: http://smoke.ustc.edu.cn/datasets.htm.
References
Blake D (2000) Aircraft Cargo Compartment Smoke Detector Alarm Incidents on U.S-Registered Aircraft, 1974–1999; FAA Report, DOT/FAA/AR-TN00/29; United States Federal Aviation Administration: Washington, DC, USA.
Toreyin BU (2018) Smoke detection in compressed video. Applications of Digital Image Processing XLI 10752:32
Chenebert A, Breckon TP, Gaszczak A, (2011) A non-temporal texture driven approach to real-time fire detection, in Proceedings of 2011 18th IEEE International Conference on Image Processing, vol. 263, pp. 1741–1744.
Fujiwara N, Terada K (2004) Extraction of a smoke region using fractal coding. Proceedings of IEEE International Symposium on Communications and Information Technology 2:659–662
Yuan FN (2008) A fast accumulative motion orientation model based on integral image for video smoke detection. Pattern Recogn Lett 29:925–932
Toreyin BU, Dedeoglu Y, Cetin AE (2015) Wavelet based real-time smoke detection in video, in Proceedings of 13th European Signal Processing Conference, Antalya, pp. 1–4.
Tung TX, Kim JM (2011) An effective four-stage smoke-detection algorithm using video images for early fire-alarm system. Fire Saf J 46:276–282
Xu Z, Xu J (2008) Automatic Fire Smoke Detection Based on Image Visual Features, in Proceedings of International Conference on Computational Intelligence and Security Workshops, pp. 316–319.
Frizzi S, Kaabi R, Bouchouicha M (2016) Convolutional neural network for video fire and smoke detection, in Proceedings of Conference of the IEEE industrial Electronics Society, pp. 877–882.
Wang H, Ge H, Zhang Z, Bu Z (2023) Research on fire-detection algorithm for airplane cargo compartment based on typical characteristic parameters. Sensors 23(21):8797
Cheoi K (2015) An intelligent fire learning and detection system, Journal of Korea Multimedia Society, pp. 359–367.
Maksymiv O, Rak T, Peleshko D (2017) Real-time fire detection method combining adaboost, lbp and convolutional neural network in video sequence, Experience of Designing and Application of CAD Systems in Microelectronics, pp. 351–353.
Yin Z, Wan B, Yuan F (2017) A deep normalization and convolutional neural network for image smoke detection. IEEE Access 99:1–1
Chang J, Song DL, Wnag RR et al (2012) Airborne infrared fire detection optical system with tilted porthole. Opti Commun 285(6):937–940
Hackner A, Oberpriller H, Ohnesorge A et al (2016) Heterogeneous sensor arrays: merging cameras and gas sensors into innovative fire detection systems. Sen Actuators B Chem 231:497–505
Emily J, Fusco John T, Finn John T et al (2019) Detection rates and biases of fire detection of fire observations from MODIS and agency reports in the conterminous United States. Remote Sens Environ 220:30–40
Eckle K, Schmidt-Hieber J (2019) A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw 110:232–242
Gomez-Rios A, Tabik S, Luengo J, Shihavuddin ASM, Krawczyk B, Herrera F (2019) Towards highly accurate coral texture images classification using deep convolutional neural networks and data augmentation. Expert Syst Appl 118:315–328
Mopuri KR, Garg U, Babu RV (2019) CNN fixations: an unraveling approach to visualize the discriminative image regions. IEEE Trans Image Process 28:2116–2125
Bawa VS, Kumar V (2019) Linearized sigmoidal activation: A novel activation function with tractable non-linear characteristics to boost representation capability. Expert Syst Appl 120:346–356
Le BT, Xiao D, Mao YC, Song L, He DK, Liu SJ (2018) Coal classification based on visible, near-infrared spectroscopy and CNN-ELM algorithm. Spectrosc Spect Anal 38:2107–2112
Thillaikkarasi R, Saravanan S (2019) An enhancement of deep learning algorithm for brain tumor segmentation using kernel based CNNwith M-SVM. J Med Syst 43:78–84
Tian HD, Li WQ, Ogunbona PO, Wang L (2018) Detection and separation of smoke from single image frames. IEEE Trans Image Process 27:1164–1177
Funding
This work was financially supported by National Natural Science Foundation of China (NO: U2033206,U1933105), the funding of Civil Aircraft Fire Science and Safety Engineering Key Laboratory of Sichuan Province (NO: MZ2022JB01) and Deyang City Science and Technology Bureau key research and development project (2021SZ001).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization orentity with any financial interest or non-financial interest in the subject matter ormaterials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
Consent to Participate
Consent for participate was obtained from all participants.
Consent for Publication
Consent for publication was obtained from all participants.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Deng, L., Chen, Q., Sui, X. et al. Smoke Detection with Dual Convolutional Networks From Infrared Frames. Int J Netw Distrib Comput 12, 153–163 (2024). https://doi.org/10.1007/s44227-024-00026-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s44227-024-00026-z