Focus-pixel estimation and optimization for multi-focus image fusion

To integrate the effective information and improve the quality of multi-source images, many spatial or transform domain-based image fusion methods have been proposed in the field of information fusion. The key purpose of multi-focus image fusion is to integrate the focused pixels and remove redundant information of each source image. Theoretically, if the focused pixels and complementary information of different images are detected completely, the fusion image with best quality can be obtained. For this goal, we propose a focus-pixel estimation and optimization based multi-focus image fusion framework in this paper. Because the focused pixels of an image are in the same depth of field (DOF), we propose a multi-scale focus-measure algorithm for the focused pixels matting to integrate the focused region firstly. Then, the boundaries of focused and defocused regions are obtained accurately by the proposed optimizing strategy. And the boundaries are also fused to reduce the influence of insufficient boundary precision. The experimental results demonstrate that the proposed method outperforms some previous typical methods in both objective evaluations and visual perception.


Introduction
The development of sensor technology has promoted the expansion of image processing in many applications [49,53]. Because of the limitation of depth of field (DOF) in sensor imaging, the optical elements can only capture the focused images of part of the DOF scene [2]. Since the defocused region of multi-focus image contains a lot of redundant information, which results in the failure of the direct application of the multi-focus image to other computer vision tasks. In order to solve above problems, multi-focus image fusion technology has emerged and attracted many researchers' attention. For example, in video surveillance, the quality of the images captured by the camera can be improved by image fusion technology, which contributes to the realization of computer-vision based target detection, recognition and other tasks. In recent years, many multi-focus image fusion methods have been proposed [14,35,43,51], among which the algorithms based on spatial and transform domain have been widely concerned and studied [1,20,22].
Most of the transform domain-based methods use different decomposition tools to decompose the source multi-focus image into different scales. After Laplacian pyramid (LP) based method was widely used, a lot of decomposition tools were proposed, such as wavelet transform (WT), discrete wavelet transform (DWT), nonsubsampled shearlet transform (NSST) and nonsubsampled contourlet transform (NSCT) [7,8,25,27,31,39]. Most multiscale transform (MST)domain-based methods decompose the source image into different detail layers and fuse the pixels by different fusion strategies. For these transform-domainbased methods, the details of pixels can be better analyzed in each layer of the transform domain, but the interference caused by noise pixels and image misregistration cannot be effectively eliminated.
The spatial domain-based methods usually include pixel based and block-based image fusion methods [11-13, 19, 28, 30, 36, 40, 42]. The latter uses different measurement strategies to analyze and fuse each block to produce fused image. The advantage of them is that all pixels in the fused block are preserved directly without loss of information. In spatial domain-based methods, spatial frequency (SF) [29], mutual information (MI) [45] and other indicators are often used to evaluate the quality of image. For block-based methods, the block size and the selected evaluation index will directly affect the final fusion quality. So, the focusmeasure cannot always determine the focused region completely.
Recently, some deep learning-based approaches have been proposed and widely used in many applications, such as intrusion detection [24], 3D object recognition [5]. Liu et. al proposed a convolutional neural network (CNN) for multi-focus image fusion tasks [37], after which, many neural network-based models have been applied in image fusion tasks, such as U-Shaped networks-based method, and FuseGAN-based method [15,16,34]. The advantage of these network-based methods is that the decision map of focused region can be found effectively. However, still some problems need to be considered such as the requirement of huge training samples, a large amount of computing and memory resources.
To overcome the shortage of training data for deep learning-based methods, and the lack of focused region detection accuracy of pixel-level based fusion schemes, we propose a novel focus-pixel estimation and optimization based multi-focus image fusion framework. Firstly, we propose a multi-scale focus-measure based method to matting the focused pixels to obtain the focused map. Then, the focused regions are optimized by the proposed quality evaluationbased optimization method, and the boundary of focused and defocused regions are also fused to reduce the influence of insufficient boundary precision. The effectiveness of the proposed method is validated by comparing it with some existing methods. The experimental results show that the proposed method is superior to the compared methods in both objective evaluations and visual perception.
The rest of this paper is organized as follows: The imaging models and characteristics of multi-focus images and the evaluation methods are briefly reviewed in Section 2. The proposed fusion algorithm is presented in Section 3. Section 4 presents several groups of experiments and the corresponding analysis results. Finally, discussions and conclusions are summarized in Section 5.

Multi-focus imaging model
Multi-focus image can be divided into focused region and defocused region and it can be modeled as follows: where f(x, y) denotes the focused pixels andf ξ (x, y)denotes the defocused pixels; and δ(x, y)represents the properties of the pixels, and can be defined as: where S F denotes the focused regions and S D denotes the union of defocused pixels. The defocused image f ξ (x, y)can be modeled approximately as follow: where ξ(x, y) denotes noise, ⊗denotes convolution operating, and h(x, y)denotes 2D Gaussian filter which is given by: where σ represents the parameter to control the blurriness [4,18]. As shown in Fig. 1, the multi-focus image can be decomposed into focused regionS F , defocused region S D and boundary regionS B . The source image can be represented as: In Fig. 1, if the segmentation of the S F and S D are accurate enough, that is,S B = 0, the white region in the decision map is as small as possible, the ideal fused all-in-focus image I F can be generated as follow: where n denotes the number of multi-focus images. Since it is difficult to obtain the boundary accurately, in our proposed fusion scheme, the optimization-based boundary feathering method is used to make the obtained boundary region as accurate as possible. For better fusion result, we also fuse the boundary regions by guided filtering to reduce the influence of boundary segmentation imprecision.

Quality evaluation and analysis
There are many non-reference quality evaluation methods for single image, such as Sum-Modified-LaplacianSML, information entropy IE, mean gradient G, and spatial frequency SF [6,21,23,48,52]. The mean gradient can reflect the detail differentiation of the image, and can be defined as: where [i, j] denote the position of pixels. And the SFcan be described as follow: where RF and CF are the spatial row and column frequency, and can be computed by: Sum-Modified-Laplacian SML [3,9,46] can reflect the detailed texture contrast of the image, and can be described by: where n represents the local window size, and S ML (i, j) is the obtained value within a local window of n × n centered on pixel (i, j).
To verify the performance of the above evaluation method, we selected four images for testing. They are shown in Fig. 2. Figure 2a, c are multi-focus images, Fig. 2b, d are fully focused images. By (3), we generated 50 images for each test image with the degree of blur deepening in turn. Then we calculated the normalized IE, G, SFand SMLvalues respectively, and the results are shown in Fig. 3. In Fig. 3, the horizontal axis represents the image indices, and the vertical axis represents the normalized evaluation values of each image. The normalized values of different evaluation metrics for each image can be calculated by: where ηdenotes the obtained results by IE, G, SFand SML.
As seen from Fig. 3a, b, the IE and G cannot effectively predict the clarities of multi-focus images. However, SFand SMLvalues accurately reflect the blurring degrees. The red lines in Fig. 3c, d show that SMLcan discern variations more subtly in blurring degrees. Therefore, based on multi-scaleSML, we proposed a focus-pixel matting algorithm to determine the focus regions, which will be detailed introduced in Section 3.
Different from non-reference image evaluation method, the reference quality evaluationbased method is usually used to measure the fusion quality in the fusion algorithm comparison. Mutual information (MI) [45], visual information fidelity (VIF) [47] and the gradient-based metric Q AB/F [50] are the commonly used reference quality evaluation metrics. In the optimization part, we use these three metrics to optimize our fusion results for higher evaluation scores.

Proposed fusion method
The schematic diagram of the proposed multi-focus image fusion method is shown in Fig. 4. Firstly, the source multi-focus images are processed by multi-scale SML to produce the focuspixel matting maps. Next, the focused region is estimated based on the focus-pixel matting result. Then we refine the decision maps according to the evaluation function. Our goal is to obtain the fused image by (6). But given that the boundary region we got was larger than the real boundary, that is, S B ≠ 0, the obtained boundary regions of source images are also fused to reduce the influence of insufficient boundary precision. The ideal fused all-in-focus image I F can be generated as follow: The final fusion result can be obtained by integrating the fused boundary region with the focused regions in source images.

Focused region estimation
It has been introduced in Section 2 that the clarity of the image can be measured by some quality evaluation metrics. Here we propose a focus-pixel matting method based on multiscale SML and mathematical morphological processing. It has been shown in Fig. 3d that the SML values can measure the clarity of the image well, but the validity of the measurement is related to the size of source image. The normalized values of SML for test images with different sizes are given in Fig. 5 showing that the scores can effectively reflect the clarity when the size is large enough. Otherwise, when the size is small, for example, size of 2 × 3 or , the scores cannot be used as the clarity evaluation. For the above reasons, we propose a multi-scale focus-pixel matting method. We use the different size n in (11) to obtain the information in local windows with different scales. Then we integrate the results and process them further by mathematical morphology. Figure 6 shows the multi-scale SML results, and the third row of Fig. 6b-e is the focus-pixel matting result F Mi which can be obtained by: As shown in Fig. 6, more accurate boundary can be obtained with small local window size. But the accuracy of focused pixel matting is not enough. With the increase of local window size, the matting accuracy can be improved. However, the boundary drift will occur.
The roughly focused region matting map F M can be produced as follow: where ⊕denotes exclusive OR, and the roughly focused region mapF M of Fig. 6 is given in Fig. 7a. Due to the focused and defocused pixels are regional aggregation. We can improve the final matting accuracy by processing the pixel results in local region. Figure 7b are the partitioning results of Fig. 7a. Figure 7c shows the process of mathematical morphology operation mainly containing two operations, filtering out the independent focused pixels and filling the independent defocused pixels in the local region. The refinement result of the focused region is given in Fig. 7d. Here, we set the pixel q and p to the same property if q ∈ N 8 (p) andsum(N 8 (p)) ≥ 5, where p is the focused pixel. Then the different property partitioning results {γ 1 , γ 2 , ⋯, γ i }can be obtained, where γ i ∈ F M . The discrete small regions in the red rectangle are removed firstly, which can be denoted as: where γ δ is the focused region. Next, as shown in blue rectangle in Fig. 8b, those small regions within the biggest connected region are classified into this region, that is: The final refinement result of focused region is shown in Fig. 7d. Fig. 7 Focused region estimation result. a Focused region matting map obtained by (16). b The partitioning results of (a). c Focused region refining process. d The refinement result of the focused region

Decision optimizing
In general, if we can obtain a more accurate estimation of focused region, that is, the boundary region S B is very close to the real boundary, the fused result can be directly computed by (6). However, the boundary region obtained by our method is not exactly consistent with the real boundary sometimes, so we propose a fusing quality evaluation-based optimization method. Figure 8 shows how to generate the final fusion image using the matting result. Figure 8c is the focused region obtained by our method. It is not the same as the real boundary. We feather Fig. 8c so that the boundary region can completely cover the real boundary. Figure 8d is the feathering result. Figure 8e, f are the focused regions. Figure 8g is the fused boundary region, and the fusion method will be introduced in the next Section. The fusion result Fig. 8h can be obtained by integrating Fig. 8e-g. Since the final fusion effect depends on the accuracy of the boundary region, we propose a fusion quality evaluation-based method to optimize the feathering size. As shown in Fig. 9a, the red line represents the real boundary, and the yellow point represents the boundary pixel obtained by our matting algorithm. The boundary region obtained by feathering the yellow point with size 2 contains the real boundary. The pixel values within the window size (2n + 1) × (2n + 1)around the yellow point are computed by: here, the pixels with values outside 0 and 1 are regarded as the boundary pixels. Figure 9b shows the fusing scores obtained by different feathering size, and Fig. 9c is the fusion result with highest score. The feathering size is optimized by the fusing scores.
To demonstrate the effectiveness of decision optimizing strategy in the proposed fusion scheme, Table 1 gives the fusing scores of optimized fusion results with different feathering sizes. It can be seen that the fusion performance can be improved with a suitable feathering size. For Fig. 8, the best fusion result is obtained with n = 21.
Three most popular fusion quality metrics are used as the optimization functions in our proposed fusion scheme, MI, VIF and Q AB/F . The MI metric of fused image F, source image A and B can be calculated as: where MI of two images αand βcan be computed by: (a) (b) (c) Fig. 9 Optimized fusion result. a Illustration of feathering. b Evaluation scores by the decision map with different feathering size, note that the feathering size goes never beyond image size. The real boundary can be covered with a smaller feathering size to obtain the highest evaluation score. c Fusion result with highest score here H(⋅)denotes the entropy of the image, and is defined as: and VIF measure can be expressed as: can be computed by: The detailed explanations of these parameters can be found in Ref. [47]. The gradient-based metric Q AB/F is calculated as: Then the optimizing function f(Q) is defined as: where Nor(·)denotes the normalized operations.

Boundary fusing
Because the boundary region we obtained is larger than the real boundary, so we fuse the obtained boundary region by guided filter (GF) [10,17,32]to improve the final fusion quality. The fused image can be obtained by: where ω k is the local window entered pixel k, a k and b k are constant coefficients and can be computed by: where ε is the regularization parameter, μ k and σ k are the mean and variance of the pixels in local windowω k , P k is the mean value of input image Pin local windowω k . Figure 10 shows how to obtain the final fusion result, where the white region is the boundary region that needs to be fused by (28). The best case is that the white region is 0, while the worst case is that the decision map we obtained is completely inaccurate with the white extending to the whole image. Hence, regardless of the accuracy of our focused region matting, our fusion result will not be worse than GF algorithm.

Experiments and analysis
To illustrate the performance of the proposed fusion scheme, some typical and efficient image fusion algorithms are compared in this section. All the experimental simulations are performed on a PC with 1.8GHZ Inter Core i7 CPU with 8GB RAM.
The proposed fusion method is compared with eight multi-focus image fusion algorithms namely Non-subsampled contourlet transform based method (NSCT) [51], multi-scale weighted gradient-based method (MWGF) [53], image matting-based method (IFM) [33], cross bilateral filter based fusion method (CBF) [26], guided filter based fusion method (GFF) [32], convolutional neural networks-based method (CNN) [37], boosted random walks-based method (BRW) [38], and guided filter-region detection based method (GFDF) [44]. MI, Q AB/F and VIF are used as the objective image fusion quality metrics in the experiments. MI measures the dependence between fused image and the source images. Q AB/F is the gradient-based evaluation metric and used to measure the details and gradient information. VIF measures the visual information fidelity between the fused image and the source images. The larger these metrics, the better the quality of the fused image.

Experimental results and analysis
To illustrate the visual effectiveness of the fused results obtained by our algorithm, some fusion results of different methods are presented in Fig. 12, 13, 14, 15, 16, 17 and 18. Figure 12 shows the effect of the proposed fusion algorithm in dealing with unregistered images. It can be seen from Fig. 12a, b that the source image is not fully registered, which will lead to the artifacts in the fused results obtained by the pixel-based fusion methods. The magnified regions of different results are shown in red rectangles which shows that the visual effects of NSCT, CBF and GFF are worse than other algorithms. The evaluation metrics are given in Table 2. It shows that the quality of the fused image obtained by our algorithm is better than the other algorithms. Figure 13 shows the fusion results by different methods on pair 2. To better illustrate the performance of these algorithms, the normalized difference images between the fused result and source images are given in Fig. 14. The white point represents that this region is not as same as the corresponding region in source image, indicating that the loss of the information is in this region. The MWGF algorithm cannot fuse the edge information effectively. Compared with GFF, IMF and CNN, the proposed method can integrate the information of source image more completely. Figure 15 presents the third set of experimental results. To better compare the fusion performance of the different algorithms, the normalized difference images between the fused result and source image of pair 3 are given in Fig. 16 which shows that the proposed method can fully detect the focused regions. Figure 17 shows the optimization process of the boundary region by our proposed algorithm. According to the evaluation score computed by (27), the optimal boundary region is selected and the fusion result with higher score can be obtained. In general, when the value of the evaluation score decreases continuously, the optimizing iteration can be stopped and the computed highest value can be regarded as the best fusion result. Figure 18 shows the fusion results of other image sets, and the evaluation metrics are given in Table 3. The average objective assessments of each method are given in Table 4, and the bar graph of the average objective assessments are given in Fig. 19. The maximum value in each row in Table 4 are marked in bold. It can be seen that our method can obtain the largest average values on MI, Q AB/F and VIF, which means that the fusion results by our method contain more information, more detail texture, and have better visual fidelity. The above subjective and objective evaluations show that the proposed fusion algorithm outperforms the compared methods. Fig. 19 Bar graph of the average objective assessments

Conclusion
A novel focus-pixel matting and optimizing based multi-focus image fusion method is proposed in this paper. The proposed algorithm aims to integrate the focused pixels to generate all-in-focus image. We propose a multi-scale focus-measure based algorithm for the focused pixels matting and estimating the focused regions. Because the ideal fused image can be produced by integrating the focused regions of source images, a quality evaluation-based optimization strategy is employed to optimize the boundary regions to reduce the influence of insufficient focused boundary precision. Experimental results demonstrate that the proposed method outperforms some state-of-the-art fusion methods in both objective evaluation and visual perception.