Infrared LSS-Target Detection Via Adaptive TCAIE-LGM Smoothing and Pixel-Based Background Subtraction

Infrared small target detection is a significant and challenging topic for daily security. This paper proposes a novel model to detect LSS-target (low altitude, slow speed, and small target) under the complicated background. Firstly, the fundamental constituents of an infrared image including the complexity and entropy are calculated, which are invoked as adaptive control parameters of smoothness. Secondly, the adaptive L0 gradient minimization smoothing based on texture complexity and information entropy (TCAIE-LGM) is proposed in order to remove noises and suppress low-amplitude details in infrared image abstraction. Finally, difference of Gaussian (DoG) map is incorporated into the pixel-based adaptive segmentation (PBAS) background modeling algorithm, which can differ LSS-target from the sophisticated background. Experimental results demonstrate that the proposed novel model has a high detection rate and produces fewer false alarms, which outperforms most state-of-the-art methods.


Introduction
Owing to the dominant position of the thermal infrared imaging system to run in dark and low light environments, infrared cameras have gained popularity for missile guidance, military night vision, airborne early warning, etc. Accordingly, LSS-targets (low altitude, slow speed, and small target) can be captured due to their hot temperature. Nevertheless, infrared images are frequently of poor quality, as a result of salt-and-pepper noises, less texture, and non-uniformity noise, which render it rather difficult to detect LSS-targets. To make things more intricate, LSS-targets without fixed moving trajectories are often submerged in heavy noises or complex backgrounds. Thus, it is a concerned and challenging topic in the infrared detection field.
In order to detect such LSS-targets accurately, scholars in various countries have done a lot of researches and put forward diverse algorithms. Hu et al. [1] used the non-local mean filter based on circular mask to establish a background estimation model. By linking the gray scale distribution of the image to the temporal information, infrared dim targets can be extracted successfully. Yang et al. [2] simplified a two-dimensional median filter to a Photonic Sensors 180 one-dimensional median filter, which could eliminate the background and false targets. He et al. [3] realized small target detection by local entropy (LE) and one-dimensional empirical mode decomposition (EMD). Abdelkawy et al. [4] used two-dimensional Gauss function to construct a dictionary and proposed a two-dimensional orthogonal search (2D-FOS) algorithm. The time complexity of the algorithm is mainly dependent on the number of candidates and the size of the image in the dictionary. Compared with other orthogonal methods, the computation time is significantly reduced. An infrared small target detection algorithm based on the peer group filter (PGF), two-dimensional empirical mode decomposition (TDEMD), and local inverse entropy (LE) was proposed by Xie et al. [5]. Huang et al. [6] improved a dynamic programming track-before-detect algorithm (DP-TBD), which can simultaneously detect and track maneuvering dim targets. Gao et al. [7] built a co-detection model based on nonlinear weight and entry-wise weighted robust principal component analysis (RPCA), which can extract real targets accurately and suppress background clutters efficiently. Li et al. [8] proposed an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary, which can not only capture significant features of background clutters and dim targets, but also strengthen the sparse feature difference between the background and target. Kim et al. [9] analyzed the characteristics of regional cluster and removed the false detection by means of spatial attribute-based classifications, the heterogeneous background removal filter, and temporal consistency filter. Motivated by the background classification and coastal region detection, Kim et al. [10] proposed a novel scene-dependent small target detection strategy involving the relationships between the geometric horizon and the image horizon. By classifying the infrared background types and detecting the littoral regions in omni-directional images, coastal regions can be detected by fusing the region map and curve map.
As discussed above, these methods can be classified into two categories: one is for the single-frame image target detection, which uses a filtering algorithm to eliminate complex backgrounds and estimate the foreground target. Morphology top-hat transform [11], high-pass filter [12], and two-dimensional entropy [13] can achieve real-time detection of targets, but the accuracy is not high under low signal to noise ratio (SNR) conditions. The detection accuracies of the matched filter [14], the wavelet transform [15], partial differential equations (PDE) [16], and the probabilistic principal component analysis matrix [17] (PCA) are high, but it is almost impossible to achieve real-time detection. Other methods such as particle filtering [18], mobile weighted pipeline filtering [19,20], and likelihood ratio test [21] are based on the multi-frame image, which need to achieve target detection through inter-frame context information. This kind of algorithms has superior accuracy, but aims can't be found effectively if the targets are submerged in the backgrounds or noises [22].
Statistical background modeling is a fundamental and important part of many visual searching systems and other computer vision applications. This article focuses on how to remove noise and smooth the background for infrared images and then detect the real target according to the inter-frame difference from static complex background. Firstly, this paper analyzes the characteristics of infrared images. Secondly, a novel adaptive texture complexity and information entropy (TCAIE-LGM) smoothing filter is proposed, which can remove stripe noises and salt-and-pepper noises of infrared images. Finally, a pixel-based background subtraction is introduced to remove the complex background and detect the real target.

IR noise suppression
Because of the abnormity of material's internal structure and crystal defects, non-uniformity is a natural defect in the infrared imaging system. What's more, the LSS-target detection is ordinarily applied under the background of the sky or sky-earth junction in many cases. The infrared image shows obviously inhomogeneous character due to low temperature radiation from the sky. Therefore, noise suppression must be done before using infrared cameras. This section reviews the related papers in terms of noise suppression and proposes the TCAIE-LGM algorithm, which can adaptively suppress the low frequency details and noises of the IR image.

Related work in terms of noise suppression
Scribner et al. [23,24] proposed two adaptive non-uniformity correction algorithms based on the human visual system. One is the time domain high pass filter, which refers to the low-pass filtering characteristics of human eye horizontal cells in optical signals and constructs time-domain high pass filter to correct the offset coefficient. This method is simple and easy to implement, but leads to target degradation and ghost phenomenon in stationary scene. The other algorithm is the representation of space operation, which takes advantages of the neural network structure and the steepest descent method to adaptively suppress the stripe noise. The algorithm has a good effect on the spatial high frequency noise suppression, but the rate of convergence is slow. Harris and Chiang studied the non-uniformity correction algorithm with constant parameter statistics, in which thousands of images were needed for training [25]. Qin et al. [26] utilized wavelet transform in terms of clutter rejection, but the speed of convergence was slow as well.

L0 gradient minimization
Image smoothing is an important instrument for computing photography. Its function is to eliminate unimportant details and retain larger image edges. Xu et al. [27] proposed the L0 gradient minimization (LGM), which was a global smoothing filter based on the sparse strategy.
LGM suppresses low-amplitude details and sharpens salient edges, which can remove noise, unimportant details, and make the results immediately usable in background subtraction. Distinct from other filters, this method can faithfully maintain small-resolution objects and thin edges.
The LGM enhances the highest contrast edges and removes small amplitude gradients globally by confining the number of non-zero gradients. In the 2D image, what needs to be done is to constrain the number of gradients in the horizontal and vertical directions. The function representation is as where I is the input image, and S is the computed result.
is the gradient of the image, which is calculated between neighboring pixels along horizontal and vertical directions. p is the counts of pixels whose magnitude is not zero.

Adaptive TCAIE-LGM smoothing
Although LGM can remove the noise and eliminate details, the degree of smoothness can't be controlled effectively. Adaptive L0 gradient minimization smoothing based on texture complexity and information entropy (TCAIE-LGM) is purposed in this paper in order to remove noise and suppress low-amplitude details in the infrared image abstraction.

Information entropy of image
Entropy was first proposed by Shannon [28] and applied in thermodynamics, which refers to the degree of chaos in the system. It has significant applications in the fields of cybernetics, probability theory, number theory, astrophysics, life science, and so on. It is a very critical parameter in various fields and has a more specific definition as well. Information entropy is a random measure of information contained in an image. It is a statistical form of characteristics, which reflects the average amount of information in an image.
The one-dimensional information entropy of the image represents the information contained in the gray scale distribution of the image, which represents the aggregate characteristics of the image. The entropy of a gray image is defined as where P i indicates the proportion of pixels in the image with a gray value of i, which can be obtained from the histogram. The one-dimensional information entropy of images can represent the aggregate characteristics of the gray scale distribution in the image, but can't reflect the spatial characteristics of the image. In order to demonstrate the spatial feature, the two-dimensional entropy of the image is composed on the basis of one-dimensional entropy, which can reflect the features of the spatial distribution of the gray level. The two-dimensional information entropy can address the complexity and the inhomogeneity of the image. When the image is a pure color graph, there is only one gray value. At this time, the two-dimensional information entropy of the image is the minimum, and the amount of information about the image is zero. If the gray value of each pixel is different, the two-dimensional information entropy of the image is the biggest, and the information of the image is the largest. For two-dimensional digital images in discrete form, the two-dimensional information entropy formula is as where i represents the gray value of current pixels, and j represents the gray value of current pixels in the neighborhood. The mean value of neighborhood in the image is chosen as the spatial characteristic of the gray distribution and forms a two-element feature group with the pixel gray level of the image, which is recorded as f(i, j). M is the scale of the image, P is the frequency of appearance of each gray level, and P(i, j) is the frequency of appearance of the two-element feature group (i, j). The two-dimensional entropy can reflect the position information of the gray pixel in the image and the comprehensive characteristics of the gray value distribution in the neighborhood under the premise that the amount of image information is not zero. If the probability distribution of pixel value is balanced, the outline of the object is clear, and the entropy value is large. Otherwise, if the elements of the co-occurrence matrix are different, the entropy value is small.

Texture complexity
Texture is a kind of visual feature reflecting the homogeneous phenomenon in the image, which embodies the structure and arrangement properties of surface structure with gradual change or periodic change. Texture feature is a global feature, which describes the surface properties of the scene corresponding to the image or image area. As a statistical feature, texture features have rotation invariance and strong resistance to noise. The angular second moment is a measure of the gray scale change of the image texture, which reflects the uniformity of the gray scale distribution and the thickness of the texture. Therefore, this paper uses the angular second moment of the image histogram to calculate the texture complexity of the image.
Firstly, a normalized one-dimensional histogram of infrared images is necessary. The histogram of the image whose gray level is [0, L-1] is a discrete function. The following statistics are made on the image as ( ) , 0, 1, , 1 where z i is the gray value of level i, and n i is the number of pixels z i in the image. The normalized histogram is calculated as where n is the total number of pixels in the image, and p(z i ) is the probability value of the gray level z i . The mean value of the gray level can be calculated as follows: The angular second moment of the histogram is as follows:

Improved methods
In our purposed method, information entropy is used to evaluate the energy distribution of the image, and the angular second moment is calculated to evaluate the texture complexity of the image. These two parameters are introduced into (1), which replace the regularization parameter  in the original equation and adaptively smooth the IR image according to the complexity of the image.
where к 1 and к 2 are adaptive variables depending on the information entropy and the angular second moment and φ and φ are empirical parameters. As the L 0 -norm is not guided, the global optimal problem is a non-deterministic polynomial (NP) hard problem. So the variable splitting variable is applied here as a result turning the problem into the quadratic programming problem, each of which has its closed-form solutions as where F is an FFT operator, and * denotes the complex conjugate. Fixing S and h, v is calculated as follows: (18) , otherwise The preprocessing method can remove the small non-zero gradient, smooth the unimportant details, and enhance the saliency edge of the image, which plays an important role in the goal of the pixel-based background subtraction (PBAS) algorithm. The preprocessing method can also effectively eliminate the texture information which may cause interference in the process of background modeling and highlight the boundary features of the target. This makes the detection result clearer and reduces the noise and background interference effectively.

LSS-target detection using PBAS
PBAS is to automatically generate a binary mask which divides a set of pixels into the set of foreground and a set of background pixels [29]. This paper presents an improved nonparametric method aiming at IR images, which can overcome the influence of background changes and smear phenomenon.
In our case, the front N frame pixels xi are collected as background models B(x i ).
(23) To calculate the foreground segmentation mask, the discriminant formula is expressed as follows: If the pixel x i satisfying the formula is less than the threshold #min, x i is decided to be the foreground. Otherwise, pixel x i is defined as the background. R(x i ) is represented as the judgment threshold.
The gradient magnitude at the pixel I m (x i ) is calculated as x is the pixel value of the background history, and ( ) v i I x is the average gradient magnitude over the last observed frame. The currently observed minimal distance is expressed as follows: An array of minimal decision distances is created as follows: When the minimum distance between the new pixel value and the sample set is smaller than the threshold, it is suggested that the pixel may be the background. At this time, the minimum distance can describe the complexity of the background. The larger the distance is, the more complex the background is. Therefore, the average value of the minimum distance in the sample set is invoked as a measure of background complexity. The decision threshold can be dynamically adapted as follows: inc dec min scale inc dec In order to respond to changes in the background, it is necessary to update the background template B(x i ). Updating means that for a certain index, the corresponding background model value is replaced by the current pixel value. This means that the current pixel value may be placed into the background model with the update rate T(x i ). We define the update rate as where T inc and T dec are fixed parameters, and the observed minimal distance can be calculated as (1). F(x i ) = 1 implies foreground, and F(x i ) = 0 implies background. The higher T(x i ) is, the likely a pixel will be updated. Because the foreground area can't be used to update the model, the background model can only be used to update those pixels that currently belong to the background. The updating process is comparable to that of the visual background extractor (ViBe) algorithm [30]. When a pixel is considered as a background point, it is updated. Different from the ViBe algorithm, the update rate T(x i ) is adaptive, and the neighboring pixel can be updated by its current pixel value randomly as well.

Experiments and results
In order to verify the effectiveness of our algorithm, 4 groups of video are tested, and the experimental results are compared with the three frame difference algorithms, mixture of Gaussian [31], and Kernel density estimation (KDE) algorithm [32]. As shown in Table 1 Our method consists of eleven tunable parameters which need to be well tuned for specific test videos under specific scenarios. This paper gives a set of optimized parameters, which has a good effect in the test dataset, as shown in Table 2. Table 2 Parameters setting in our experiments.

Serial number
Parameter Tupper 50 The calculation method of the local signal to clutter ratio (SCR) is as follows: Among them,  t is the average value of the pixels in the local area of targets, and  tb and  b are the average value and standard deviation of the pixels in the neighborhood of targets, respectively. The average signal to heterozygous ratio ( SCR ) [33] is as where J t is the number of targets, and SCR i is the SCR of target i.
In order to test the degree of noise suppression, we use Video 1, which includes obvious stripe noise, to test our algorithm. The result is shown in Fig. 1. Figure 1(a) is the origin image, and Fig. 1(b) is the experimental result of our method. Through our method, the SNR of image is obviously improved. By calculating the correlation between two points at the same distance and to the same direction in either image, the gray level co-occurrence matrix (GLCM) can reflect the comprehensive information including the direction, interval, and amplitude of the image. GLCM is a symmetric matrix, and the more convergent the diagonal matrix is, the better the correlation between pixels is. Figure 2(a) is the GLCM of the origin image, and Fig. 2(b) is the GLCM of the result image through our method. It can be seen from the contrast that noises have been greatly suppressed.  In order to quantitatively analyze the performance of the detection algorithm, the detection probability (P d ), the false alarm rate (F a ), and the time complexity of the algorithm (time complexity) are used in the experiment to evaluate the detection effect of the algorithm. The detection probability and the false alarm rate are defined as follows: number of true detections number of actual targets number of false detections number of frame insequence The detection probability and false alarm rate are obtained and shown in Tables 3 and 4. The time complexity of the algorithm is shown in Table 5. The average time is as follows: average total time consumption frame number of video t  .
(38) In order to illustrate the comparison results more explicitly, the experimental results of our method and the comparison algorithm are given in Fig. 2. Among them, Fig. 2(a) is the original test video sequence (Video 1-Video 4), Fig. 2(b) is the experimental result of the frame difference

Conclusions
In order to detect the low altitude, slow speed, and small target from the infrared complex background, this paper proposes a novel detection method via TCAIE-LGM smoothing and improved pixel-based background subtraction. Complexity and entropy are calculated and introduced into L0 gradient minimization smoothing, which can remove noises and suppress low-amplitude details in infrared image abstraction. In addition, difference of Gaussian map is integrated into the adaptive pixel-based segmentation background modeling algorithm, which can differentiate LSS-Target from the sophisticated background. Experimental results show that the proposed method significantly outperforms the existing methods in detection accuracy and can detect the infrared LSS-target with the static complex background.
However, our method also has some limitations. Once the target remains stationary for a long time, the moving target will gradually be absorbed by the background. The updating of our model occupies a great deal of memories and resources, which leads to low computational efficiency of the algorithm. Our method needs to be further optimized to solve the above problems in the future.