1 Introduction

In this paper, we propose an infrared camera system that analyzes image sequence from a camera to detect invisible leakage of gas. Methane gas, the main ingredient of natural gas, is combustible and explosive, so safety management is essential for chemical installations and gas facilities. Gas leakage cannot be visually detected but can be verified by other means (Vollmer and Möllmann 2018), such as detection with soapy water, frame-ionization detection, and laser detection. By conventional methods, rapid response to leakage and accurate location of leaks is not possible. However, the proposed infrared camera equipment makes gas detection possible by means of digital image processing. One can monitor and prevent leakage of gas by using a camera instead of monitoring it directly at the spot.

For decades, infrared cameras have revolutionized the maintenance of many industries, and have proved to be the best technology to detect flaws before and after electrical and mechanical faults arise. Infrared cameras can play an important role in helping to reduce environmental impact, as well as improving workers’ safety and the quality of processes. They also can be used successfully to detect environmentally hazardous gas leakages. Not only does gas leakage adversely affect the environment, but also there is considerable economic loss associated with leakages. In light of this, we have developed a gas detection camera that can detect many kinds of gases based on particular center wavelengths, including methane gas. The developed product can be relatively small and inexpensive by combining the advantages of laser and infrared sensors. It is also highly likely to be utilized in the field because of increased portability and reduced failure rates. The developed camera is a revolutionary infrared camera capable of detecting methane gas in all areas of production, transportation, and distribution of oil and natural gas. This camera can scan a wide range of areas and visualize potential gas leaks in real time, allowing multiple components to be checked with one scan. Instead of using the laser of conventional methane gas detectors, a medium wavelength LED with a center wavelength of approximately 3300 nm is used for the light source of the gas detector. Furthermore, a compact cooling unit with low power consumption can also be achieved by applying a small electronic part using a Peltier element instead of a larger cooling unit. Currently, a popular gas detection device with a camera image is produced by FLIR company, but it is expensive and cooling gas needs to be replenished. In response we have developed a durable low-cost gas detection imaging device. A comparison of preliminary tests under various conditions proves that the proposed camera system can afford fast and efficient gas detection.

There have been many studies on how to obtain clear infrared images from sensors. Zuo et al. presented a technique based on separating the acquired image into base and detail parts using a bilateral filter, then processing each part independently (Zuo et al. 2011). The technique effectively maps the raw acquired infrared image to an 8-bit domain based on a bilateral filter and dynamic range partitioning method. It has prospects for application, but may not work for some particular conditions. Peng et al. suggested detail enhancement for the infrared image (Peng et al. 2016). The raw infrared image is decomposed into the base and detail layers by using a propagated filter. This produces results similar to that of the preceding theory. Yoo et al. proposed a novel digital binning algorithm of image enhancement for low-light scenes in an environment with insufficient illusion (Yoo et al. 2015). It can easily be built into an existing image signal processor since the proposed algorithm does not use an iterative computation. However, its computational complexity depends on the sorting algorithm. Here, we propose a log function to achieve 8-bit infrared images similar to those of the bilateral filter. Recently, a logarithmic histogram modification has been proposed to improve image contrast with natural conservation (Bhandari 2019). The proposed method can uniformly enhance image contrast with fewer parameters without losing the basic features. In addition, safety-related services based on a single depth camera installed in the user’s home were designed and evaluated for safety management using a camera (Mettel et al. 2019). The goal of ARKTOS project is to build an intelligent knowledge-based system to classify satellite sea ice images (Soh and Tsatsoulis 2002). We aim to eventually develop an image analysis system that combines such image processing and knowledge engineering techniques.

This paper is presented in the following order: Sect. 2 describes the overall developed camera system. Section 3 explains the non-uniformity correction with automatic gain control. Section 4 presents the algorithm of the contrast limited adaptive histogram equalization. Section 5 presents the adaptive algorithm for background extraction, and compares it with other methods. Section 6 explains comparative experimental results. Finally, Sect. 7 presents the conclusions of this research.

2 The infrared camera system

The proposed infrared camera is a preventive maintenance solution for detecting leakage of piping, flanges and connections from petroleum chemicals. It enables faster recovery and restoration of operations. Infrared energy from an object is concentrated with an infrared detector via an optical lens. The detector sends information to the electronic sensor element for image processing. The electronic devices convert the data from the detector into images that can be viewed on a standard liquid crystal display (LCD) screen. Instead of using laser as in conventional methane gas detectors, a medium wavelength LED with a center wavelength of approximately 3300 nm is used for the light source of the gas detector. Furthermore, a compact cooling unit with low power consumption can also be achieved by utilizing a small electronic part with a Peltier element instead of a larger cooling unit. The cooling system should be considered together with the degradation of the camera’s life. In some systems, cooling gas needs to be replenished; it is very expensive and cumbersome process. Most importantly, with the proposed system we can monitor potentially dangerous leaks from a few meters away. This camera will greatly improve work safety, environmental and regulatory compliance by locating leakage. Figure 1 shows the prototype camera. The developed camera is a relatively large size of 15 cm in height and width, 20 cm in length, and 1.5 kg in weight, respectively, but will decrease to a smaller size with development.

Fig. 1
figure 1

Image of the prototype camera specially developed for monitoring methane gas leakage

Figure 2 shows the overall flow chart of the proposed real-time gas monitoring system based on image processing techniques. From the input video sequence, we obtain still images with the size of 640 pixels by 480 pixels. The input image data is streamed directly from the sensor to the 14-bit values for each pixel in the array. Since it is difficult to directly display 14-bit images, the video is converted to a 256 level image rather than the total 16,384 resolution of the sensor, using a non-uniformity correction with automatic gain control. Note that in Fig. 2, the inside of the dotted line is processing 14 bits of data and the rest of the system performs an 8-bit operation.

Fig. 2
figure 2

Flow chart of an infrared camera system for gas detection

Input images are poor in quality. Therefore, it is almost impossible to identify gas leakage from an unenhanced image. We apply the contrast limited adaptive histogram equalization (CLAHE) algorithm to improve the quality of the image (Zuiderveld 1994). We then apply an adaptive background extraction method to extract the background from the image. Because the camera for gas detection is stationary, we use the adaptive average method to detect background images. Once the background of the image is detected, the gas image can be extracted by subtracting the background from the input image. Because of the nature of gas images, the resolution of the output images is typically poor, thus requiring additional image processing to enhance the quality of the image. The following sections provide detailed descriptions of the algorithms mentioned here.

3 Non-uniformity correction

Modern infrared imaging systems are used in many spheres, including military, civilian, fire surveillance, security, and aerospace. The core of this technology is focal plane array (FPA). FPA uses temperature detectors, which sense heat, arrayed as a mosaic at the focal area of the imaging lens. It calculates the heat sensed by one detector and describes it as one of the faded pixels (one temperature value). The values for each detector to detect and output infrared are expressed as a linear equation as follows:

$$O_{\text{r}} (i,j) = \lambda \; \times \;I_{\text{r}} (i,j)\; \pm \;\delta ,$$
(1)

where Or(i, j) is the output value (measured temperature value) to be represented in one pixel, λ is the gain of the infrared energy with each detector, Ir(i, j) is the input radiation, and δ is the value deducted according to the sensitivity of each detector, respectively. Pixels have different sensitivity rates due to differences in the gain and the offset of each pixel. This phenomenon causes spatial non-uniformity on the thermal image. The non-uniformity is a time-dependent noise which occurs due to lack of sensor equalization (Tendero et al. 2012). To obtain a meaningful infrared image, each gain and offset needs to be calibrated to a normalized value. Through this process, the relative temperature of the target or the entire scene can be created as a result of degradation images. Only accurate non-uniformity calibration can be measured to allow accurate actual temperature measurements.

We perform a uniformity test to ensure that all elements within the sensor produce the same results. A problem associated with this test is that there is no standardized test for performing uniformity, which results in a difference in reported values. Here we deal with two common ways of specifying uniformity: 2-point non-uniformity calibration (NUC) and 1-point offset calibration. The offset uniformity is calculated by capturing a single image and determining the values reported for all columns and rows within the sensor or sensor area. It is entirely based on measurements of all pixels in the sensor or region. The response uniformity method uses images taken from two different temperatures to analyze the reactions of all pixels, and then calculates the temperature changes for all pixels in the sensor (Ratliff et al. 2002). The 2-point non-uniformity calibration, also known as a calibration-based correction, uses a correction table computed by all pixels in order to compensate for the non-uniformity between pixels. This method is commonly referred to as 2-point NUC because it uses tables calculated from two different temperature points (hence 2-point) of a blackbody source, measured at different temperature points under optimal working conditions. If a thermal image is invoked and a correction table is applied, the compensation table becomes calibrated. When the pixels differ between different sensitivity levels and are offset by different sensitivity factors, they are calibrated based on a blackbody source and then the same output is exported under the same conditions (Tendero et al. 2010).

The concept of a 1-point offset correction, also known as a scene-based correction, is not significantly different from a 2-point NUC for calibration purposes. Because only one absolute reference value is used, we leave the gain value alone and adjust the offset values only. Scene-based correction techniques are optimized methods to adjust images effectively using numerical analysis algorithms instead of using a blackbody or heat source under optimal conditions. This technique is applied to cases where thermal images are stored continuously or when movement is traced in the event of thermal images. In this case, the scene-based correction technique applies the reference values for various temperature values that each detector can display, and then they are normalized so as to be expressed as optimal thermal data in the thermal image.

The detector data is streamed directly from the sensor to the 14-bit values for each pixel in the array. Almost all commercially available devices display analog images using an 8-bit value. In other words, the video is displayed in a 256 scale rather than a full 16,384 resolution of the sensor. Therefore, data must be imported in a format that allows data to be displayed. Vickers used the Plateau histogram equalization for these transformations (Vickers 1996). The Plateau histogram equalization algorithm maximizes the dynamic range available for the scene content. It uses a transfer function for conversion based on the number of pixels in each bin and allocates more 8-bit ranges for that bin. When the plateau value is small, the automatic gain control approaches linear algorithms that maintain linear mapping between 12 bits and 8 bits of data. The objective of the automatic algorithm is to make each of the 256 levels identical to the same number of pixels, which provide the best contrast for the given scene. The higher the plateau values, the better the algorithm can redistribute data to achieve this goal. Therefore, it is clear that the histogram does not occur in areas where there are no contents in the scene, and that the peak is much smoother and the data is much more uniformly distributed than the original data. Note that higher plateau value distorts relationships between the physical temperature of the scene and the gray levels of the image, which are well preserved in a linear case (Raju et al. 2013). The gain and level values have been adjusted properly in order to point out special low contrast gas.

The plateau algorithm is an admirable solution, but there are problems depending on whether or not we need to select a decent gain or level value. Failure to select the appropriate values may result in undesirable results. Therefore, we commonly use a log function with base two that allows us to perform similarly to those of the plateau algorithm. Log with base ten or the natural logarithm is also available because they show similar results. The histogram algorithm performs a log transform from 14-bit to 8-bit of the form:

$$8{\text{-bit}} = \alpha \; \times \;\log 14{\text{-bit}}\; \pm \;\beta ,$$
(2)

where the gain, α and the offset, β values are properly chosen for applications. With this conversion, the sensor data of 14-bit is easily converted into 8-bit visual image data. The features of log function mean improvements in low level of gray values as shown in Fig. 3. Apply the proposed log-based automatic gain control process after the histogram equalization to improve the performance of the images. It is also important to note that humans are sensitive to changes in gray values at low levels of illumination. We clip the histogram and limit the maximum slope of the mapping function so that the gray value of the scene is not dragged to a specific portion of the scene (Toet and Wu 2015).

Fig. 3
figure 3

Log functions for automatic gain control

4 Contrast limited adaptive histogram equalization

To create an image that is bright and decodable, the gray level needs to be distributed properly. The most common way to resolve this kind of problem is histogram equalization (Raju et al. 2013). The common histogram equalization uses the same transformation to convert all pixels in an image. This method is effective if the distribution of pixel values across an image is similar. However, for images that contain significantly brighter or darker areas than their surroundings, the contrast between these areas does not improve sufficiently. Adaptive histogram equalization (AHE) solves these problems by converting each pixel into a transform function derived from a nearby area (Pizer et al. 1987). This is one of the useful image processing techniques used to improve the local contrast of images rather than the entire image. It differs from ordinary histogram equalization in terms of how the adaptation method calculates multiple histograms. Each of the multiple histograms corresponds to a specific portion of the image and reconfigures the brightness of the image. This makes it suitable for improving local contrast and clarifying edges in each local area of the image. However, adaptive histogram equalization method tends to amplify noise in relatively homogeneous areas of images. Contrast limited adaptive histogram equalization (CLAHE) method, which is a variant of adaptive histogram equalization, improves the contrast enhancement performance by inhibiting amplification according to certain conditions (Zuiderveld 1994). It divides the image into small areas (known as tiling), and applies local histogram equalization to each tile to achieve uniformity across the entire image.

To prevent the excess amplification of noise, the contrast limiting procedure must be applied to areas where each conversion function has been created. The contrast amplification near the given pixel values is given by the slope of the cumulative distribution function (CDF). CLAHE limits the histogram by clipping the histogram (the so called the clip-limit) to a predefined value before calculating the CDF. As a result, the slope of the CDF is limited and therefore the conversion function is limited. The numerical value of the clip-limit varies according to the normalization of the histogram and therefore depends on the size of the surrounding area. The sizes of the neighborhood are properly selected to ensure that the background is suitably extracted. One must adjust the tile size and function parameters to maximize contrast and structural visibility whenever images are acquired. Adjacent tiles are merged using bilinear interpolation to eliminate artificially induced boundaries (Hong et al. 2017).

Figure 4 shows the 100th frames of input image sequences in various circumstances as examples. The images are so dark that it is difficult to tell the difference between objects. Figure 5 also shows the 100th frames of two input image sequences and their histograms. Since the gray values are mostly clustered at the lower end of the histogram, we can expect that the quality of the image will be poor, and indeed it is so.

Fig. 4
figure 4

Examples of input images in various circumstances

Fig. 5
figure 5

Examples of the 100th frames of two input image sequences and their histograms

Figure 6 shows examples of images and histograms extracted from the input video sequence after applying a CLAHE algorithm. For all these images, the algorithm works fine and in almost all cases produces a much better image than the simple application of histogram equalization.

Fig. 6
figure 6

Examples of the 100th frames of two input image sequences and their histograms after applying a CLAHE algorithm

5 Adaptive background extraction

We have compared methods from simple techniques such as the conventional average method, to more complex probabilistic modeling techniques such as mixture of Gaussians (MoG) (Piccardi 2004; Sobral and Vacavant 2014). The MoG is one of the most popular recursive modelling techniques if there is no prior information about the circumstances (Bouwmans et al. 2008; Stauffer and Grimson 2000). The average method is one of the most frequently used background modeling techniques (Cucchiara et al. 2003). Multiple images can be obtained from the same location if the camera is at the same point several times. At this point, the simplest approach to finding a background image is basically to take the average of all images. The background estimates are defined as average values in each pixel position of every frame in the buffer. The supposition is that the pixel remains in the background in more than half of the frames in the buffer. The means of obtaining background values as a result of the average or median are simple and fast, but they require a lot of memory. Increasing the number of frames increases the amount of memory required.

Assume that there are n images of the same background with different amounts and speeds of vehicles. Then, the image of the ith frame becomes B + Ni, which contains the ideal background image B and the noise Ni. The average value B′ of these images can be obtained by means of a typical average equation such as Eq. (3)

$$B^{\prime} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {B + N_{\text{i}} } \right)} = B + \frac{1}{n}\sum\limits_{i = 1}^{n} {N_{\text{i}} } ,$$
(3)

where Ni can be interpreted as normally distributed with mean 0. It is readily proven that the average of all Ni is close to zero. The greater the number n, the closer it is to zero. We have made n = 500, i.e., the first 500 input image frames. Thus, the average image is very close to the ideal background image and will approach the average approximation if more input images are used. However, storing and handling many frames in a video sequence requires enormous amounts of memory. The average or median of the previous frame is somewhat simple, but is a very simple memory consumption process. When you process multiple frames and obtain a single background image, the memory increases proportionally to the number of frames.

The MoG tracks multiple Gaussian distributions simultaneously. The probability of a certain pixel with a value xK at a particular time K can be verified by the sum of M weighted Gaussians:

$$p(x_{\text{K}} ) = \sum\limits_{i = 1}^{M} {w_{\text{i}} \; \times \;\lambda ( \cdot } ),$$
(4)

where wi is the weight of the ith Gaussian and \(\lambda \left( \cdot \right)\) is the normal distribution. We can define a variety of Gaussian models that show different attributes of each pixel. Each Gaussian sets a different weight depending on how often the same type appears. One of the main advantages of this method is that it does not destroy existing background models. For a given image sequence, the gray scale of each pixel is modelled as a mixture of K Gaussian distributions over time. Each of the different Gaussians assumes that each pixel features different shapes, and each Gaussian is weighted depending on the frequency of the Gaussian showing the same shape. The model parameters of MoG can be adjusted without having to maintain a large video sequence. However, MoG is not suitable for real-time surveillance methods because it requires significantly higher computational complexity compared to other methods.

We present an adaptive algorithm to overcome the complicated computation process of the MoG method. If a pixel in the current frame It has a value larger than the corresponding pixel in the background image Bt, the pixel in the next background image Bt+1 is incremented by a small value. Likewise, if a pixel in the current frame has a value smaller than the corresponding pixel in the background image, the pixel in the next background image is decremented by a small value as follows:

$$\left\{ {\begin{array}{*{20}c} {B_{\text{t + 1}} (i,j) = B_{\text{t}} (i,j) + \varepsilon , \, if \, I_{\text{t}} (i,j) \ge B_{\text{t}} (i,j)} \\ {B_{\text{t + 1}} (i,j) = B_{\text{t}} (i,j) - \varepsilon , \, if \, I_{\text{t}} (i,j) < B_{\text{t}} (i,j)} \\ \end{array} } \right.,$$
(5)

where ε is the properly chosen update rate. It can thus be applied to real-time processing. Because the update values are corrected in pixels, an error can be prevented from passing continuously to the next frame. Because of this simple process of getting close to background images, it is easy to create an embedded infrared imaging device.

6 Experiments and discussions

Figure 7 shows examples of the 300th input image and its background images using the average, MoG, and the proposed method, respectively. It is difficult to differentiate between the background images due to the low quality nature of the infrared images. However, mathematical calculations of the performance show a difference. To effectively establish a background modeling comparison, we need to capture a stationary background image to use as a baseline source. In this particular test, we have used average images because it is possible to create a solid background image with them. We can acquire the first 500 video frames and calculate the average image, and use it as a reference image for comparisons.

Fig. 7
figure 7

Examples of the 300th input image and its background images using the average, mixture of Gaussians, and the proposed method, respectively

The commonly used distortion measure is the root-mean-square-error between the input image and the reference image as a peak signal-to-noise ratio (PSNR) in decibels. Figure 8 illustrates the PSNR values of the first 400 images from the four video sequences in various circumstances, for ordinary average, MoG, and the proposed method respectively. Obviously, the higher the number of frames of the input images, the higher the PSNR value. This is the expected result because the larger the amount of data, the closer it is to the ideal mean value. The average method increases gradually throughout the whole sequence. In contrast, MoG method increases very slowly despite the amount of calculation. However, the proposed method very quickly approaches the average values. Of course, there is some fluctuation in the nature of the adaptive average method. In the case of the first scene, it does not get any better as it gets closer to a certain value, but it does not pose a big problem for the characteristics of infrared images. In the case of the second scene, the pace of value increase is very slow due to substantial noise in the background. Most average values increase gradually as the number of frames increases, but in the second case, they drop significantly as a result of sudden contrast changes and then increase again. In all cases, the results of the ordinary average method are the best, but it is difficult to use on-site because the average method requires a lot of memory. In the case of MoG method, there is the problem excessive calculation, rendering the result unacceptable. The proposed method offers a comprehensive set of results, and has great utility in the field, since it does not require much memory and computation.

Fig. 8
figure 8

PSNR values of the first 400 images from the four video sequences in various circumstances, for ordinary average, mixture of Gaussians, and the proposed method respectively

Background images have become much more readable than before. The foreground image, i.e. the gas leakage image (within red boxes), can be obtained if the background image is subtracted properly from the input image sequences as shown in Fig. 9. Although the output images are still noisy, one can extract the gas leakage part from the invisible gas image.

Fig. 9
figure 9

Two sets of enlarged result images: without gas leakage, the detected area, and detected gas leakage, respectively

7 Conclusions

In this paper, we proposed an infrared camera system that analyzes image sequences to detect invisible gas leaks. To increase the mobility of infrared cameras, instead of using laser as in conventional methane gas detectors, a medium wavelength LED with a center wavelength of approximately 3300 nm is used as the light source of the gas detector. In addition, a compact cooling unit with low power consumption is made possible by applying a small electronic part using a Peltier element instead of a larger cooling unit. Some cooling gas needs to be replenished, which is a very expensive and cumbersome process. Rather than using a complex plateau algorithm, we have proposed a log-based automatic gain control process after the histogram equalization to improve the performance of images. The developed prototype has not yet reached a satisfactory level of size and weight. However, progress in underway to make it more compact and lighter, making it easier to carry. Instead of monitoring directly on-site, the camera can monitor gas leakage extensively from a distance.

The leakage of invisible gas can be detected by the proposed infrared camera system. The results show that the proposed system is a simple but efficient system for monitoring gas leaks. To overcome the complicated computational process of the mixture of Gaussians (MoG) method, we have presented an algorithm based on adaptive average method to efficiently process and to detect the background in real time. Comparison of prequalified testing using various methods demonstrates that the proposed system can provide rapid and efficient background extraction. Such simplified algorithms appear to exhibit better performance with lower memory requirements compared to established techniques. Preliminary experimental results confirm the potential and effectiveness of the developed equipment to be commercialized for use in real-time gas monitoring systems. Further research is expected to lead to more precise background extraction processes and a lightweight camera for portable gas detection devices.

The problem of expressing knowledge within a computer, or making deductions with a computer using knowledge, is a problem in the field of knowledge engineering. In the branch of computer vision, deep learning techniques are becoming very important and highly developed tools for understanding images, and they outperform other techniques in terms of results. It is vitally important to extract necessary information by image processing in order to develop an application of artificial intelligence based on computer science. In this paper, we have presented video processing techniques for background extraction and gas detection, which can later lead to the study of automatic classification using machine learning. It is beyond question that teaching a computer using background and foreground images extracted from these algorithms can produce better recognition results. We continue to experiment in using Deep Convolution Neural Networks for background image extraction and detection of gas leakages. Since a large amount of image database is required to do this, we have only performed preliminary experiments with potential for this paper and will produce results based on Convolution Neural Networks after conducting more experiments. The goal of the future research is to develop learning knowledge algorithms that allow computers to increase their recognition rate based on these images. In the ambient service environment, we share information on the status of all things in real time, and since all places and objects have sensors, we perform intelligent activities such as making decisions on our own. When the proposed camera system has a built-in detection sensor and is connected to a high-performance computer network, people are provided with gas leakage information in real time.