Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Image processing algorithms are commonly used nowadays in different fields, from medicine to sports (Aggarwal and Ryoo [1]; Ekin et al. [6]; Li et al. [8]). For example, in Sáenz et al. [12], the authors present applications of behavior analysis of cells in cancerous tissue, where an automatic cell segmentation and tracking system was proposed to assist researchers to achieve a faster and more accurate understanding of the behavior of a group of cells. The algorithms implemented in such systems for image enhancement, object segmentation and tracking, usually involve high computational cost, frequently making usage of these systems impractical. Hence, it becomes an important issue to optimize the execution time of cell tracking solutions in order to provide researchers in the field an efficient method, as researchers often need to process thousands of images from cell activity. The first stage in the proposed cell tracking pipeline in Sáenz et al. [12] corresponds to the preprocessing of the input image, designed to enhance edges and contrast and also to remove noise, for improving the cell segmentation and tracking stages performance.

This work presents a computational optimization of the Deceived Non Local Means filter (DNLM), a filter proposed in the Deceived Weighted Averaging Filter Framework (DeWAFF) (Calderón et al. [3]), for image enhancement and noise removal. The proposed optimization named as DNLM-IFFT, uses integral images and the Fast Fourier Transform (FFT), as proposed for the original Non Local Means (NLM) filter in Wang et al. [16]. We present an evaluation of the proposed DNLM-IFFT approach for a speedup assessment. As demonstrated in Calderón et al. [3], the DNLM presented the best results in comparison to the deceived bilateral and the deceived scaled bilateral filters (part of the DeWAFF), as it is based in the NLM noise removal approach (Buades et al. [2]). However, the computational complexity of the DNLM brute force implementation often makes its usage impractical, as researchers frequently need to analyze thousands of images. For this reason, in this paper we focus on the optimization of the DNLM algorithm.

In Sect. 2, we describe important aspects of the DNLM filter. In Sect. 3, we present the proposed computational optimization of the DNLM. Section 5 presents the experiments and test results, comparing our strategy to the previous brute force implementation in order to demonstrate the viability of the proposed modification. Finally in Sect. 6 we present the conclusions and future work related.

2 Background

As presented in Calderón et al. [3], the DeWAFF consists in an image abstraction framework. Image abstraction is defined in Calderón et al. [3] as the enhancement of perceptually important characteristics of the image as contrast and edges, and the simplification of less relevant details for the application domain and noise. As demonstrated in Calderón and Siles [4]; Calderón et al. [3]; Sáenz et al. [12], image abstraction based preprocessing techniques are able to improve the performance of object segmentation, tracking and classification tasks.

The DeWAFF proposes the combination of an Adaptive Unsharp Masking (AUSM) based approach, as the one proposed in Ramponi and Polesel [11], with a weighted averaging filter, like the bilateral filter or the NLM filter. Such combination aims to achieve both contrast and edge enhancement, as also noise suppression. The combination proposed uses the AUSM image output \(F_{\text {AUSM}}\) as input of the weighted averaging filter algorithm, but uses the original input image U for weighting, deceiving the weighted averaging filter. Such combination diminishes the ringing effect, often present in USM filtered images. The following equation presents the general DeWAFF grasp for an input image U of \(a\times b=c\) pixels, with \(U^a = U\) and \(U^b = F_{\text {AUSM}}\) for ringing suppression purposes:

$$\begin{aligned} U'(p)=\left( \sum _{i\in \varOmega _p}w_k\left( U^a, p, i\right) \right) ^{-1} \left( \sum _{i\in \varOmega _p}w_k\left( U^a, p, i\right) U^b(i)\right) . \end{aligned}$$
(1)

For notation simplicity, an image pixel is represented as \(i = (x_i, y_i)\), and \(\varOmega _p\) stands for the \(n \times n\) window centered in pixel p.

The weighting functions studied in Calderón et al. [3] were the Bilateral Filter (BF), the Scaled Bilateral Filter (SBF) and NLM filter:

$$\begin{aligned} w_{1} =w_{\text {BF}}, w_{2} =w_{\text {SBF}}, w_{3} = w_{\text {NLM}}. \end{aligned}$$
(2)

The NLM kernel when used in Eq. 1 provides the best results under heavy and moderated noise conditions, given the similarity approach of weighting the neighborhood \(\varPhi _i\) of pixel i with \(m \times m\) pixels, instead of using the pixels intensity similarity, as in the bilateral filter and the SBF approaches. The following equation presents the original non local means weighting grasp:

$$\begin{aligned} w_{\text {NLM}}(p,i) = \text {exp}\left( -\frac{ D(i,j)}{h^2}\right) , \end{aligned}$$
(3)

with the Euclidean distance between the neighborhoods given by:

$$\begin{aligned} D(i,j) = \left\| \displaystyle \ \varPhi _i - \varPhi _j \right\| ^2 , \end{aligned}$$
(4)

where h is the kernel parameter which defines neighborhood weighting discrimination.

In the worst case scenario of using a window of \(a\times b =c\) and a neighborhood of \(a\times b =c\), pixels, the computational complexity of the NLM is \(O(c^3)\), where c corresponds to the total number of pixels in the input image U.

Let us further examine the Euclidean distance between neighborhoods:

$$\begin{aligned} D\left( i,j\right) =\sum _{u}\left( \varPhi _{i}\left( u\right) -\varPhi _{j}\left( u\right) \right) ^{2}, \end{aligned}$$
(5)

where \(\sum _{u}\) stands for the summation of the pixels within the difference of the two neighborhoods \(\varPhi _i\) and \(\varPhi _j\).

Developing the squared difference and rewritting the quadratic terms as \(\varPhi _{i}^{2}=\sum _{u}\varPhi _{i}\left( u\right) ^{2}\), we get:

$$\begin{aligned} D\left( i,j\right) =\varPhi _{i}^{2}-2\sum _{u}\varPhi _{j}\left( u\right) \varPhi _{i}\left( u\right) +\varPhi _{j}^{2}=\varPhi _{i}^{2}-2\varPhi _{j}\cdot \varPhi _{i}+\varPhi _{j}^{2}, \end{aligned}$$
(6)

where \(\varPhi _{j}\cdot \varPhi _{i}\) refers to the matrix dot product. The following example illustrates the calculation of the Euclidean distance between neighborhoods with an input image \(U\in \mathbb {R}^{5\times 5}\) defined in Table 1.

Table 1. Image example U.

For such illustration purposes, we take a window \(\varOmega _{\left( 3,3\right) }\in \mathbb {R}^{3\times 3}\) and perform the calculation of the Euclidean distance between the neighborhoods \(\varPhi _{\left( 3,3\right) }\) and \(\varPhi _{\left( 2,2\right) }\):

$$\begin{aligned} \begin{array}{c} \left\| \varPhi _{\left( 3,3\right) }-\varPhi _{\left( 2,2\right) }\right\| ^{2}=\left\| \begin{array}{ccc} 2 &{} 3 &{} 1\\ 1 &{} 2 &{} 3\\ 3 &{} 2 &{} 1 \end{array}-\begin{array}{ccc} 5 &{} 12 &{} 1\\ 5 &{} 2 &{} 3\\ 3 &{} 1 &{} 2 \end{array}\right\| \end{array} =10.3923^{2}=108. \end{aligned}$$
(7)

Given the development of the quadratic difference between neighborhoods in the right part of Eq. 6, we can perform such calculation based in the element wise squared matrix of \(U\in \mathbb {R}^{5\times 5}\). This allows calculating the Euclidean distance done previously equivalent to perform:

$$\begin{aligned} \left\| \varPhi _{\left( 3,3\right) }-\varPhi _{\left( 2,2\right) }\right\| =\varPhi _{\left( 3,3\right) }^{2}-2\varPhi _{\left( 2,2\right) }\cdot \varPhi _{\left( 3,3\right) }+\varPhi _{\left( 2,2\right) }^{2} \end{aligned}$$
(8)
$$\begin{aligned} =\sum \left( \begin{array}{ccc} 4 &{} 9 &{} 1\\ 1 &{} 4 &{} 9\\ 9 &{} 4 &{} 1 \end{array}\right) -2\left( \begin{array}{ccc} 2 &{} 3 &{} 1\\ 1 &{} 2 &{} 3\\ 3 &{} 2 &{} 1 \end{array}\cdot \begin{array}{ccc} 5 &{} 12 &{} 1\\ 5 &{} 2 &{} 3\\ 3 &{} 1 &{} 2 \end{array}\right) +\sum \left( \begin{array}{ccc} 25 &{} 144 &{} 1\\ 25 &{} 4 &{} 9\\ 9 &{} 1 &{} 4 \end{array}\right) \end{aligned}$$
(9)
$$\begin{aligned} =42+-2\cdot 78+222=42-156+222=108, \end{aligned}$$
(10)

consistent with the result obtained in Eq. 7.

3 Proposed Method: DNLM-IFFT

As seen in previous section, the computational cost of the brute force implementation of the DNLM is very high. To address such problem, there are different approaches which have been designed for the original NLM algorithm, and could be ported to the DNLM in order to lower computational complexity, as seen in Karnati et al. [7]; Liu et al. [9]; Wang et al. [16]. However, approximations or modifications in the computation of the NLM kernel must allow the decoupling of the weighted image \(U^a\) and the filtered image \(U^b\), for a DNLM implementation, as stated in Eq. 1. Also, we prefer an optimization of the NLM over approximations as seen in Dauwe et al. [5]; Vignesh et al. [14]; Xue et al. [17].

In Wang et al. [16], an efficient computation of the weighting function \(w_{\text {NLM}}(p,i)\) is proposed using integral images and the FFT. The approach achieves same numeric results compared to the brute force implementation. The optimization proposed in Wang et al. [16], modifies the computation of the weighting kernel, by performing the calculation of the Euclidean distance defined in Eq. 6 as follows:

  • Use of integral images to calculate the terms \(\varPhi _{i}^{2}\) y \(\varPhi _{j}^{2}\).

  • Calculate the FFT to obtain the term \(-2\varPhi _{j}\cdot \varPhi _{i}\) which corresponds to a sample of the auto-correlation with the signal \(\varPhi _{j}\).

3.1 Integral Images

The integral image of I as defined in Viola and Jones [15], with a pair wise pixel notation \(i=(x,y)\), is given by:

$$\begin{aligned} I_{\Sigma }\left( x,y\right) =\sum _{u\le x,v\le y}I\left( u,v\right) , \end{aligned}$$
(11)

which is the summation of the pixels to the left and above of the parameter pixel \(i=(x,y)\). Following the example exposed previously, the integral image of the pixel wise squared matrix of U is calculated as shown in Table 2. For a more efficient calculation of an integral image, it is worth to note that:

Table 2. Integral image for \(U^2\).
$$\begin{aligned} I_{\Sigma }\left( x,y\right) =I\left( x,y\right) -I_{\Sigma }\left( x-1,y-1\right) +I_{\Sigma }\left( x,y-1\right) +I_{\Sigma }\left( x-1,y\right) , \end{aligned}$$
(12)

which for instance, for the previous example means that:

$$\begin{aligned} I_{\Sigma }\left( 3,3\right) =4-198+208+208=222. \end{aligned}$$
(13)

To use the integral image to calculate the summation over a window limited by pixels \(A=\left( x_{0},y_{0}\right) \), \(B=\left( x_{1},y_{0}\right) \), \(C=\left( x_{0},y_{1}\right) \) and \(D=\left( x_{1},y_{1}\right) \), as illustrated in Table 2, we compute the following in general:

$$\begin{aligned} \sum _{x_{0}<x\le x_{1},y_{0}<y\le y_{1}}I\left( x,y\right) =I\left( D\right) +I\left( A\right) -I\left( B\right) -I\left( C\right) , \end{aligned}$$
(14)

and for the example developed in this paper, for a window of dimensions \(3\times 3\) around the pixel (3, 3) the corner pixels would be given as, \(A=\left( 1,1\right) \), \(B=\left( 4,1\right) \), \(C=\left( 1,4\right) \) y \(D=\left( 4,4\right) \), as specified in Table 2. Thus, the calculation of the summation of the pixels over such window is calculated as follows:

$$\begin{aligned} \sum _{x_{0}<x\le x_{1},y_{0}<y\le y_{1}}I\left( 3,3\right) =271+25-179-75=42, \end{aligned}$$
(15)

which is consistent with the first term in the Eq. 10. Using the integral image to calculate the summation over a window of an input matrix with c pixels presents a computational cost of O(c), and must be made only once for the NLM filtering.

4 Correlation and the Fast Fourier Transform

As stated previously, the term in the quadratic equation of the Euclidean distance between the neighborhoods of pixels i and j:

$$\begin{aligned} S\left( i,j\right) =N_{i}^{2}-2\sum _{a=0}^{m-1}\sum _{b=0}^{m-1}N_{j}\left( a,b\right) N_{i}\left( a,b\right) +N_{j}^{2} =N_{i}^{2}-2N_{j}\cdot N_{i}+N_{j}^{2}, \end{aligned}$$
(16)

corresponding to the dot product of matrices \(-2N_{j}\cdot N_{i}\), can be extracted by using the correlation of the neighborhood from pixel i, \(N_{i}\). We can compute the dot product term using the FFT with a logarithmic computational complexity, lowering the overall DNLM computational cost.

5 Experiments and Results

We tested the computational execution time of the algorithm using images with resolutions of 480p, 720p and 1080p executed 10 times each, for a typical window size of \(21\times 21\) and a neighborhood size of \(7\times 7\), as recommended in Baudes et al. [2]. The results are shown in Table 4. The second experiment consists in evaluating the speedup by using different window and neighborhood sizes in an image of \(1200\times 900\) pixels. Its results are shown in Table 3.

Table 3. Time in seconds achieved with different image resolutions for the DNLM with a brute force implementation and the DNLM-IFFT.

All tests were executed in a desktop computer with an AMD Phenom FX-6300 processor at 4.6 GHz. Average values for these executions are displayed in Table 4.

Table 4. Time in seconds achieved with different image resolutions for the DNLM with a brute force implementation and the DNLM-IFFT.

6 Conclusions and Future Work

The proposed DNLM-IFFT approach achieves an effective speed-up up to an average factor of 10, as seen in Table 4. The results of the experiment according to Table 3, suggests that the DNLM-IFFT becomes more attractive with smaller neighborhood sizes, given a smaller speedup with bigger neighborhoods. For preprocessing low resolution video footages, execution time is reasonable, however, biomedical images and video analysis often require the analysis of thousands of images. As already mentioned, microbiology researchers from the University of Costa Rica need to analyze 170 000 images from glioblastoma tissue. This means that the preprocessing stage with the proposed DNLM-IFFT could take months, which is not appropriate.

As future work, we expect to explore alternative parallelization technologies, such as MPI and OpenMP for a cluster based on the Intel KNL architecture. In Cuomo [13], a GPU based parallelization approach for the Non Local Means filter was proposed, which suggests that a GPU based implementation of the DNLM might achieve greater performance gain when compared with a CPU based parallelization approach. Also, we can consider mixing the DNLM-IFFT proposed approach with other approximation grasps of the NLM, for instance using look-up tables, as seen in Mahmoudi and Sapiro [10].

The MATLAB code implemented for this paper can be downloaded from https://github.com/manu3193/DNLM-IFFTT.