1 Introduction

Nowadays, images are captured using multichannel imaging sensors, including color, multispectral and hyperspectral images. These images are composed of individual image channels, and the commonly known color images consists of three channel images, such as RGB color images. Some daily life examples are satellite images, color TV systems and images taken from commonly used cameras. The better image quality ensures high performance in many image processing and post-processing algorithms such as detecting, labeling, segmentation, feature extraction, texture analysis and image interpretation. Under ideal conditions, image is noise free; however, noise is an unavoidable phenomenon and ideal conditions are invalid in any image acquisition and processing system. Image denoising is an important preprocessing task before further downstream image processing steps, such as facial emotion recognition (Lajevardi and Wu 2012; Arora and Kumar 2021; Bidani et al. 2020), ancient character recognition, text image processing (Narang et al. 2021; Thanh et al. 2019) and object recognition (Gevers and Smeulders 1999; Bansal et al. 2021; Kumar et al. 2021; Chhabra et al. 2020; Gupta et al. 2021) to name a few, and filters are ranked similar to the work by Ghosh et al. (2021).

Initially, image denoising methods were developed for gray images and these methods focus on a particular noise type, such as Gaussian noise, additive noise, additive white Gaussian noise (AWGN), impulse and speckle noise. These methods are extended to deal the multichannel images in many ways, for example cross channel denoising, channel coupling and channel-by-channel denoising. These methods have revolutionary contribution in specific noise reduction from color images. In prior literature, a survey (Lukac et al. 2005a) encapsulates the applications of vectorial filtering methods for color images. Few other survey works are found in specific areas of color image denoising, such as application of variational (Moreno et al. 2015) and partial differential equations (PDEs) (Tschumperlé and Deriche 2007; Pal et al. 2015; Guichard et al. 2002; Surya Prasath and Singh 2010; Surya Prasath et al. 2013), filtering methods (Venetsanopoulos and Plataniotis 2000; Buades et al. 2005; Fan et al. 2019; Goyal et al. 2020) and sparse representation (Dabov et al. 2007). The majority of these collections of works cover diffusion PDEs-based filtering for image denoising and restoration. Neural network and deep learning-based models are presented in Shaheed et al. (2022), and medical images denoising algorithms review (Sagheer and George 2020) and a recent survey in this field along with the newly emerging area of artificial neural networks (ANN) and deep earning (DL) models can be found in Tian et al. (2020); Salamat et al. (2021). In this review, we consider important filtering approaches applied to color image denoising, and we analyze and present important models during the last two decades. The main objectives of the paper are:

  • This paper presents a review of filtering methods in color image denoising, especially published over the last two decades.

  • We analyze the computational denoising algorithms which can mitigate the noise from the image in multichannel images.

  • The main objective of this review is to provide a comprehensive survey and to highlight the pros and cons on up-to-date techniques.

  • This paper also classifies the existing methods, based on filters used, domain and strategy used in filter designing.

The methods are compared in terms of the main methodology utilized, image noise models and various noise levels. All the methods are compared using different image quality assessment measures and we next briefly explain them for completeness. Our review is structured as follows. Section 2 introduces common image noise models, and image quality metrics. We provide a main classification of various filtering methods for color image denoising. These classifications are further divided and subclasses are developed in the following separate sections. Filtering methods related to spatial or neighborhood approaches are grouped in Sect. 3, switching filters are discussed in Sect. 4, and Sect. 5 consists of wavelet filtering methods. Section 6 concludes the review.

2 Filtering-based methods for color image denoising

The latent images are corrupted and image denoising filters is required to achieve better image quality for middle- and high-level computer vision tasks. The image is denoised using an algorithm and these algorithms are divided into classes such as filtering based, partial differential equations (PDEs) or diffusion equation-based image denoising and the artificial neural networks (ANN) and deep learning-based methods (Salamat et al. 2021). In this review, we analyze filters-based image denoising methods applied to color images. All these methods are further divided into subclasses based on the mathematical tool, the roles of respective filters and their mathematical models for pixel representations. We next briefly review the common image noise models and image quality metrics utilized in the reviewed works.

2.1 Image noise models

The accuracy in image acquisition faces two problems, image denoising and blurring. These both are interrelated and image denoising algorithms remove the noise from images. The noise in images is of many types. These noise models are discussed in detail in Boncelet (2009). Let \(y(\cdot )\) denote the output image and \(x(\cdot )\) represent the original image, and noise is represented as q(.). The common decomposition is additive and multiplication

$$\begin{aligned} y(t)= x(t) + q(t) \end{aligned}$$

and

$$\begin{aligned} y(t) = x(t). q(t). \end{aligned}$$

The multiplicative model can be transferred into additive and vice versa by applying the logarithm and exponential operators. Some of these are represented as:

  • Additive noise model In this model, noise is independent in the model from the output image. Examples are the thermal noise (additive Gaussian noise), quantization noise and photographic noise. Quantization noise is produced during the quantization of random variables to the discrete variables.

  • Multiplicative noise This model represents noise when the output image is noise dependent.

  • Mixed noise (neither independent nor dependent noise There are some types of noise which do not fall into any category. The salt and pepper noise which is produced during transmitting image. Poisson noise is an example of this type of noise. These are further explained as:

  • Salt and pepper noise (Mukhopadhyay and Mandal 2014): The noise is called salt and pepper (SPN) when the dark regions have bright pixels and bright regions have dark pixels. This type of noise may be caused by analog to digital converter errorsa and bit errors in transmission.

  • Random-valued noise model: Noise in this model appears with a fixed length of m, by two fixed ranges. For example, if \(m= 30,\) noise will be equally likely to be any values in the range of either [0, 29] or [226, 255]. Similarly, if m is 256, then noise will equally likely be any value in the range of [0, 255]. Mathematically, it is defined as

    $$\begin{aligned} f(n) = {\left\{ \begin{array}{ll} \frac{p1}{2} &{} \text {if } 0 \le x \le m,\\ 1-p &{} \text {for } x= y, \\ \frac{p2}{m} &{} \text {for } 255 - m< x < 255, \end{array}\right. } \end{aligned}$$
    (1)

    where p is the density of noise.

Initially, the majority of the image filtering methods were developed for gray images and these methods focus on a particular noise type, such as Gaussian noise, additive noise, additive white Gaussian noise (AWGN), and impulse and speckle noise. These methods are extended to deal with the multichannel images in many ways, for example cross channel denoising, channel coupling and channel-by-channel denoising. These methods have revolutionary contribution to specific noise reduction from color images.

2.2 Image quality assessment metrics

Image quality is an image character through which image degradation, distortion or artifacts are measured with reference to the original image. Statistical measures are used to measure the objective image quality. Commonly used image quality measures in the literature are (X and Y are original and processed images, respectively):

  • Mean absolute error: The mean absolute error (MAE) between two images X and Y is defined as

    $$\begin{aligned} \text {MAE}(X,Y)= \frac{1}{MN}\sum _{i=1}^m \sum _{j=1}^n (x_{ij}- y_{ij}). \end{aligned}$$
    (2)

    This measure computes the mean difference at pixel level, and similar images have lower mean absolute error.

  • Mean square error (MSE): This statistical measure computes the mean square error between the original and distorted image. Mathematically:

    $$\begin{aligned} \text {MSE}(X, Y)= \frac{1}{M N}\sum _{i=1}^m \sum _{j=1}^n (x_{ij}- y_{ij})^2. \end{aligned}$$
    (3)

    For color images, MSE for each channel is computed and mean value is taken into account; similar images have lower mean square error. The MSE in \(\text {Lab}\) color space is called normalized color difference (NCD) (more details in Melange et al. (2011a)) defined as:

    $$\begin{aligned} NCD = \frac{\sum _{j=1}^N \left( \sqrt{ ( L_j - \hat{L}j) (a_j - \hat{ a}_j)( b_j -\hat{ b}_j) } \right) }{\sum _{j=1}^N \sqrt{ ( L_j^2 + a^2_j + b^2_j) } },\nonumber \\ \end{aligned}$$
    (4)

    where \(L_j, a_j, b_j, \hat{L}_j, \hat{a}_j, \hat{b}_j \) are \( \text {Lab}\) coordinates of original and restored image pixels. The similarity between images is inversely proportional to the NCD value, i.e., higher NCD values indicate that images are more dissimilar and vice versa.

  • Peak signal to noise ratio: The peak signal to noise ratio (PSNR) is a classical measure, defined as the ratio between the maximum possible value of the signal and noise.

    $$\begin{aligned} \text {PSNR}(X,Y) = 10\log _{10} \frac{255^2}{\text {MSE}(X,Y)}. \end{aligned}$$
  • Structural similarity index measure: The structure similarity index measure (SSIM) is expressed as

    $$\begin{aligned} \text {SSIM}(X,Y)= \frac{(\mu _X \mu _Y + C_1 )(2 \sigma _{XY} +C_2 )}{\mu _X^2 + \mu _Y^2 + C_1)(\sigma _X+ \sigma _Y+ C_2)}, \end{aligned}$$
    (5)

    where \(\mu _X, \; \mu _Y\) and \(\sigma _X^2 , \; \sigma _Y^2\) are the means and variance of X and Y,  respectively. The \(\sigma _{XY}\) is the covariance of XY and \(C_1= (K_1L)^2 , \; C_2 = (K_2L)^2,\) where L is the dynamic range of the pixel values, \(K_1= 0.01\) and \(K_2=0.03\) by default. This measure is extended to multiscale structural similarity index (MSSIM) in Wang et al. (2003) and Hassan and Bhagvati (2012).

  • Fuzzy color structural similarity: The fuzzy color similarity measure (FCSS) (Grecova and Morillas 2016) is the patchwise similarity in contrast, structure and luminance channels. Let \(x_i\) and \(y_i\) be pixels and \(\overline{X}_w\) and \(\overline{Y}_w\) be the mean color vectors in fuzzy metrics \(X_W\) and \(Y_W.\)

    The contrast is defined as \( C_{X_W} = \max (M_{x_i} ) - \min (M_{x_i});\) \( i = 1; \ldots ; n,\) similarly for \( C_{Y_W}.\) The fuzzy contrast similarity is defined as

    $$\begin{aligned} \text {SC}(X_W; Y_W) = 1 - | C_{X_W} - C_{Y_W} |, \end{aligned}$$
    (6)

    structure similarity is defined as

    $$\begin{aligned} \text {SS}(X_W; Y_W) = \frac{ \sum _{i=1}^n (1- | M_{x_i} - M_{y_i} |) }{n}, \end{aligned}$$
    (7)

    luminance in spherical coordinates is expressed as

    $$\begin{aligned} L(x_i) = \sqrt{ (x_{i1})^2 + (x_{i2})^2 + (x_{i3})^2 }, \end{aligned}$$

    and luminance similarity is defined as

    $$\begin{aligned} \text {SL}(X_W ; Y_W ) = \frac{2 \overline{L_{XW} L_{YW}}}{\overline{L}_{XW}^ 2 + \overline{L}_{YW}^2}. \end{aligned}$$
    (8)

    Finally, FCSS is defined by combining the above three measures as

    $$\begin{aligned} S(X_W ; Y_W )= & {} \text {SC}(X_W ; Y_W )^{\alpha } \text {SS}(X_W ; Y_W )^{\beta }\nonumber \\{} & {} \times \text {SL}(X_W ; Y_W )^{\gamma }, \end{aligned}$$
    (9)

    where \(\alpha , \beta , and\gamma > 0\) are parameters used to adjust the relative importance of the three components. The similarity measure will be high if all three similarities are high and it is directly proportional to the similarity between images.

  • Feature similarity index: The feature similarity index measure (FSIM) (Zhang et al. 2011) is computed in two stages. Phase congruency is computed for maximum Fourier coefficients. The local amplitude is defined as \(X_{n,\theta _j}(x)=\sqrt{ e_{n, \theta _j} (x)^2 + O_{n, \theta _j} (x)^2}\) and local energy along orientation \(\theta _j\) is \(E_{\theta _j}= \sqrt{ F_{\theta _j}(x)^2 + H_{\theta _j}(x)^2 },\) where \(F_{\theta _j}(x)= \sum _n e_{n, \theta _j} (x)\) and \(H_{\theta _j}(X)= \sum _n O_{n, \theta _j}(x).\) Then, 2D PC is computed as

    $$\begin{aligned} \text {PC}(x)= \frac{\sum _j E_{\theta _j}(x)}{\sum _n\sum _j A_{n,\theta _j}(x)}, \end{aligned}$$

    where \(\text {PC} \in [0,1].\) The similarity measure of \(\text {PC}\) of two images is defined as

    $$\begin{aligned} S_{\textrm{PC}}(x)= \frac{2 \text {PC}_1(x). \text {PC}_2(x) +T_1}{\text {PC}_1^2+ \text {PC}_2^2 +T_1}, \end{aligned}$$

    where \(T_1\) is a positive constant and its value depends upon the dynamic range of \(\text {PC}.\) Similarly, gradient feature of an image \(G= \sqrt{G_x^2+ G_y^2}.\) The gradient features of two images are represented by \(G_1(x)\) and \(G_2(x),\) then the gradient similarity measure is computed as

    $$\begin{aligned} S_{G}(x)= \frac{2 G_1(x). G_2(x) +T_2}{G_1^2+ G_2^2 +T_2}, \end{aligned}$$

    where \(T_2\) is a positive constant and its value depends upon the GM range. Then these two values are combined \(S_l(x)=[S_{\textrm{PC}}(x)]^{\alpha }[S_G(x)]^{\beta },\) where \(\alpha and \beta \) are adjustment parameters.

    $$\begin{aligned} \text {FSIM}(X,Y)= \frac{\sum _{x \in \Omega } S_l(x) \cdot \text {PC}_m(x)}{\sum _{x \in \Omega } \text {PC}_m(x)}, \end{aligned}$$
    (10)

    where \(\text {PC}_m(x)= \max (\text {PC}_1, \text {PC}_2)\) and \(\Omega \) represents the whole image domain. This measure works for the grayscale or luminance image. This measure is also extended for color images.

  • Independent feature similarity Independent feature similarity (IFS) index consists of two parts. The features are computed by matrix multiplication \(F^{\textrm{fea}}= WA, F^{\textrm{dis}}= W B,\) where F describes the two independent feature sets extracted from W. Then the feature similarity:

    $$\begin{aligned} \text {IFS}_{\textrm{fea}}(X,Y)= \frac{1}{MN} \sum _{i=1}^M \sum _{j=1}^N \left( \frac{ 2 X_{ij} Y_{ij} +c }{( X_{ij})^2 +(Y_{ij})^2 +c } \right) . \end{aligned}$$

    Its luminance part is defined as:

    $$\begin{aligned} \text {IFS}_{\textrm{lum}}{=} \frac{\sum _i (m_i^{\textrm{ref}}{-} \mu (m^{\textrm{ref}} ). (m_i^{\textrm{dis}}{-} \mu (m^{\textrm{dis}} ){+} c_m }{\sqrt{\sum _i (m_i^{\textrm{ref}}{-} \mu (m^{\textrm{ref}} )^2.(m_i^{\textrm{dis}}- \mu (m^{\textrm{dis}} )^2 } + c_m}. \end{aligned}$$

    The \(\text {IFS}\) is a combination of \(\text {IFS}_{\textrm{lum}}\) and \(\text {IFS}_{\textrm{fea}}\) into a quality score:

    $$\begin{aligned} \text {IFS} = \left( \text {IFS}_{\textrm{lum}}^{\alpha } \text {IFS}_{\textrm{fea}}^{\beta } \right) ^{\frac{1}{\alpha + \beta }}, \end{aligned}$$

    where \(\alpha and \beta \) have positive values used as balancing index between luminance and feature components.

2.3 Filter classification

There are many sources that can produce noise and its presence is detected in almost every image processing system. The noise is removed from image-by-image denoising algorithms, called filters. The image denoising models are divided into classes for detailed study. These classification depends upon factors such as (i) noise model and representation of color image; (ii) application of filter, either the filter is applied directly or before the application of filter, and the pixels are transferred into wavelet domain; (iii) the method for grouping the input pixel values may be the weighted mean or median; (iv) the way the filter treats the pixels, single pixel or considers the pixels in blocks; (v) stages of the filters, either it is a single- or two-stage filter. Based on these criteria, the image denoising filters are divided into three important classes:

  • spatial domain filtering methods,

  • switching filters methods,

  • wavelet filtering methods.

We describe all these three classes of filters in detail in terms of the contribution summaries, comparison of experimental results based on noise type, levels, standard images, and image quality metrics. We note that we cover the state-of-the-art filters in all the subdivided categories including the performance comparisons.

3 Spatial domain filtering methods

Spatial filtering algorithms are those directly applied at pixel level for image denoising. These methods are directly applicable at the pixel level and can be easily manipulated. These methods consist of a large number of filtering methodologies and approaches and is further subdivided into the following classes as:

  • nonlocal mean filters,

  • median filters,

  • bilateral filters,

  • guided filters,

  • block matching filters.

3.1 Nonlocal mean filters

Mean filters are effective in eliminating the Gaussian noise from images, while these filters oversmooth the image. The nonlocal mean filters are extensively used for image denoising. The nonlocal mean filters replace the central pixel by computing the mean of nonlocal windows from different regions of the image, which have the same structure and texture. This filter preserves the edges effectively. The nonlocal mean filter, using Gaussian noise, in continuous and discrete domain is defined by Buades et al. (2005) as:

$$\begin{aligned} I(x){} & {} = \frac{1}{C(p)}\int _{\Omega } f(x,y) u(y) {\textrm{d}}y\nonumber \\{} & {} = \frac{1}{C(p)}\sum _{y\in \Omega } f(x,y) u(y) {\textrm{d}}y, \end{aligned}$$
(11)

where C(x) is the normalization factor of weighted function f(xy). This filter is extended for denoising color images in Said et al. (2016) and Wang et al. (2018). Some modifications are proposed to enhance the effectiveness of the filter and adoptability for color images. It is represented as

$$\begin{aligned} g_k(x,y)= \frac{\sum _{u,v \in N} (f_k (x+u, y+v) \text {exp}^{-\frac{\sum _{p\in P} f_k (x+p, y+p )- f_k (x_0+p, y_0+p) \Vert _2^2 }{h^2}}}{\sum _{u,v \in N} \text {exp}^{-\frac{\sum _{p\in P} f_k (x+p, y+p )- f_k (x_0+p, y_0+p) \Vert _2^2 }{h^2}} }. \end{aligned}$$
(12)

Here, Np are search and patch window sizes, respectively. Wang et al. (2018) implemented for \(k=3\) in RGB space and modify the weight function as

$$\begin{aligned} S= \frac{1}{\sqrt{3}p}\sum _{p=0}^{p-1} \Vert f_k(x,y)- g_k(x,y)\Vert , \end{aligned}$$
(13)

where \(f_k(x,y) \) belongs to patch window and weights are defined as

$$\begin{aligned} w_1(x,y,x_0,y_0) = \exp {\left( -\frac{(s_{x,y} - s_{x_0,y_0})^2}{h_1}\right) } \end{aligned}$$
(14)

and

$$\begin{aligned} w_2(x,y,x_0,y_0) =\exp {\left( -\frac{(x-x_0)^2+(y-y_0)^2}{h_2}\right) }. \end{aligned}$$
(15)

The Eq. (14) is used to discriminate the similarity of regional structures between neighbor and central patch image and Eq. (15) is used to measure the similarity of pixels between neighbor and central image patches. Then,

$$\begin{aligned} w&{=} \text {exp}\nonumber \\&\quad \left( {-}\frac{ \sum _{p\in P} \Vert f_k(x{+}p,y{+}p){-} f_k(x_0{+}p, y_0{+}p) \Vert _2^2 }{h} \right) \nonumber \\&\quad \times w_1.w_2. \end{aligned}$$
(16)

This filter is also extended as optimized vector nonlocal mean filter (OVNLMF) by adding some additional information. Wang et al. (2016) proposed a modified nonlocal means filter by adding standard deviation into weight. This modification in weight function improves the filter efficiency.

3.2 Median filters

The median filters replace the central pixel by the median pixel of a square window. The impulse response of the median filter is zero; this property is helpful for removing impulse noise. These filters are robust and suit well in image smoothing. Astola et al. (1990) have developed the concept of vector median filter (VMF), which is widely implemented in the field of signal and image processing. For example \(\{X_i\},\) Astola’s VM is defined as:

$$\begin{aligned} Y=\text {argmin}_{X \in \{X_i\}}\sum _{i=1}^N \Vert X - X_i \Vert , \end{aligned}$$
(17)

where \(\Vert .\Vert \) denotes the \(L^p\) norm. The output of VM filter is a pixel whose distance from other sample pixels is minimum. This filter is extended to weighted vector median (WVM) by Viero et al. (1994), for color image denoising. This minimization is written as:

$$\begin{aligned} Y=\text {argmin}_{X \in \{X_i\}}\sum _{i=1}^N W_i \Vert X - X_i \Vert . \end{aligned}$$
(18)

This is the weighted mean of participating image channels. Another extension is proposed by Li et al. (2006). In this filter, interchannel and intrachannel weights are computed to handle the cross channel dependency. The extensions in median filters are proposed in Khryashchev et al. (2011) and Muthukumar et al. (2010), where the adaptive weights are implemented. This change in weights reduces the computation cost, and the filter proposed in Muthukumar et al. (2010) separate the noisy pixels using band pass filter. Another extension is proposed in Yadav (2015), called modified adaptive threshold median filter (MATMF). In this algorithm, window size is changed adaptively according to the noise level. Zhong et al. (2010) extended the median filter for color image denoising. This algorithm using ensemble learning for deciding noisy and noise-free images and finally image is reconstructed by Minkowski norm.

3.3 Bilateral filters

The bilateral filters work in the domain and range of a filter. These are spatially invariant filters and the output of this filter is the weighted sum of all the pixel values. The weights are computed using the geometric distances and deviation from the central pixel. These filters preserve the high-frequency image contents and filters most effectively the low-frequency noise information. Sometimes, it faces the gradient reverse problem and the gradient becomes unstable at the point where the edge pixels resemble the surrounding pixels. This filter was developed by Tomasi and Manduchi (1998) and the filter removes noise while preserving the edges. The efficiency of the filter depends on the filter parameters. The output of the bilateral filter g(x) is expressed as:

$$\begin{aligned} g(x)= \frac{ \int _{-\infty }^{\infty } \int _{-\infty }^{\infty } c(y,x) s(f(y), f(x) ) f(y) d(y)}{\int _{-\infty }^{\infty } \int _{-\infty }^{\infty } c(y,x) s(f(y), f(x) ) d(y)}, \end{aligned}$$
(19)

and c(yx) represents geometric closeness, in the discrete domain

$$\begin{aligned} g(x)= \frac{ \sum _{y\in N(x)} \text {e}^{-\frac{ \Vert y-x \Vert ^2}{2\sigma _d^2} } \text {e}^{-\frac{ \Vert I( y)-I(x) \Vert ^2}{2\sigma _r^2} } I(y)}{ \sum _{y\in N(x)} \text {e}^{-\frac{ \Vert y-x\Vert ^2}{2\sigma _d^2}} \text {e}^{-\frac{ \Vert I( y)-I(x)\Vert ^2}{2\sigma _r^2}}}, \end{aligned}$$
(20)

where \(\sigma _d,\sigma _r\) are controlling parameters in spatial and intensity domains in spatial neighborhood N(x). This filter counts the photometric similarity of an image and replaces pixel value with similar nearby pixels. These filters have high computational cost in the high dimensions and multichannel images.

Peng et al. (2014) and Peng and Rao (2009) extended the above method for multispectral and hyperspectral images. For this purpose, they extended the similarity measures to the spectral distance between pixels I(x) and I(y) is defined as \( D (x,y) = I(x) - I (y) \) and Eq. (20) is rewritten as

$$\begin{aligned} g(x)= \frac{ \sum _{y\in N(x)} g_d (| y-x |), \sigma _d) g_s(D( y,x ), \sigma _s) I(y)}{ \sum _{y\in N(x)} g_d(| y-x |), \sigma _d) g_s(D( y,x ), \sigma _s) },\nonumber \\ \end{aligned}$$
(21)

where differences are defined as Gaussian function.

Another extension was proposed by Meher (2010) who utilized a circular window for computing the gray value differences. This is implemented in \(YC_b C_r\) image channels separately for color image denoising. Zhang and Gunturk (2008) implemented bilateral filtering at multiple resolutions via the wavelet frequency domain. The image is decomposed into frequency subbands using wavelet decomposition filters then separate bilateral filters are used and denoising is performed. Finally the inverse wavelet is used for image reconstruction. A wavelet threshold can also be implemented for better performance of bilateral filter and this combines the wavelet thresholding and the bilateral filter. For the color image, the filtering is applied using \(\text {Lab}\) color spaces.

3.4 Block matching filters

The collaborative filters are the special type of BM3D filters (Lebrun 2012), which are extensively used for motion detection in videos. These filters deal the image as a 3D group of pixels and three successive steps are involved in image denoising, namely the 3D transformation of image patch, shrinkage and inverse 3D transformation. It is the special type of grouping and filter searches the similar patches in the image. In these filters, the pixel value is estimated from regions similar to the region inside windows, called block matching filters. The similarity among patches is used to improve the degraded pixel and these filters preserve the finest details over the blocks. These filters over smooth the image details. The block matching algorithm is introduced by Dabov et al. (2007) and block distance measure is defined as,

$$\begin{aligned} d(x_{x_1},x_{x_2}) = N^{-1} \Vert \Upsilon ( \Gamma _{2D} (Z_{x_1}, \beta )-\Upsilon ( \Gamma _{2D} (Z_{x_2}, \beta )\Vert _2\nonumber \\ \end{aligned}$$
(22)

where \(x_1, x_2 \in X\) and \( \Gamma _{2{\textrm{D}}}\) is a 2D linear unitary transformation and \(\Upsilon \) and \(\beta =\lambda _{{\textrm{thr}}2{\textrm{D}}} \sigma \sqrt{2\log (N_1^2) } \) are hard thresholds. Some extensions have been proposed to improve the efficiency, such as the one proposed by Hasan and El-Sakka (2018) who implemented the structural similarity (SSIM) image quality measure in the optimization stage within Block Matching 3D (BM3D) algorithm. In this approach, expectation function of Wiener filter is replaced with the SSIM measure, i.e., \(e = E\{\text {ssim}(f ,g)\},\) and Shen et al. (2016) introduced a frame based BM algorithm for color image restoration based on the paradigm of sparse representations.

3.5 Guided filters

The name guided filter is proposed due to the property that this type of algorithms are implemented using the property of guidance, called the guidance image. This can be input image itself, for example in anisotropic diffusion (Perona and Malik 1990), the gradient of the filtering image is used as guidance. In color images, local color line analysis shows that pixels constitute a straight line or cluster around a single point. This property of color line is used as guidance to the algorithm and the guided image filter (He et al. 2013) with color guidance is defined as,

$$\begin{aligned} q_i&= \frac{1}{| w| } \sum _{k{\setminus } i \in W} (a_k^{\textrm{T}} I_i +b_k), \end{aligned}$$
(23)
$$\begin{aligned} a_k&= \left( \sum _k +\epsilon U \right) ^{-1} \left( \frac{1}{| w|} \sum _{i \in W} (I_iP_i -\mu _k \overline{P_k})\right) ,\nonumber \\ b_k&= \overline{P_k} - a_k^{\textrm{T}} \mu _k, \end{aligned}$$
(24)

where \(\sum \) represents the \( 3 \times 3 \) covariance matrix, \(w_k\) is the guidance image and IU are identity matrix, \(\epsilon \) is the smoothing controlling parameter. Tsai et al. (2015) implement the guided filters for color image denoising with modification in Eq. (23). Dinh et al. (2016) used the texture information as guidance for image denoising. This model is implemented on luma-chroma color space and texture information obtained form luma channel is transferred to the chroma channels by LP. These filters transform the pixels linearly and preserve the edges and image details. These filters can also be used for other purpose such as image enhancement, smoothing and in painting etc.

3.6 Results comparison for spatial filters

We provide a comparison of experimental results for various spatial domain filters based on some common noise types, noise levels, and test images in terms of image quality metrics. The spatial filtering methods are compared in Table 1. We note that different approaches consider different benchmark images and we have identified the standard test images as commonly available in the literature for uniform comparison of image quality measures.

Table 1 Results comparison of spatial domain based filtering algorithms for color image denoising

4 Switching type filters

Switching type filters are a special case of spatial filters in which a denoising filter and identity filters acts simultaneously and algorithm switches between these two filters based on certain criteria. These filters are typically involve two stage filters. At first stage, the image denoising problem is considered as the classification problem and image is divided into two classes, noisy and noise free pixels. The second stage consist of any denoising algorithm which is implemented to noisy class only. These filters actually applied to reduce temporal complexity of the existing algorithms. This procedure can be performed by many ways, and following are the possible subclasses of switching filters based on:

  • Order statistics

  • Partitioning

  • Fuzzy techniques

  • Quaternions

  • Peer group

  • Supperpixel segmentation

  • Mathematical morphology.

All these switching filters are used in concatenation with other denoising filters and the contribution summaries are mentioned bellow.

4.1 Order statistics

The ordered statistics operations (David 1970) are used to produce a rank order vector from sample vector, such as a sample vector is represented as,

$$\begin{aligned} S=[ x_1, x_2,\ldots , x_N ], \end{aligned}$$
(25)

then order statistics can produce a ordered vector from the above sample as,

$$\begin{aligned} S_r=[ x_{(1)},x_{(2)},x_{(3)},\ldots , x_{(N)}]^{\textrm{T}}. \end{aligned}$$
(26)

This new vector is arranged with order \(x_{(1)} \le x_{(1)} \le x_{(1)} \le \cdots \le x_{(N)}.\) The peers and switching filters are developed using the order statistics and image data is classified into noisy and noise free classes by these filters. Ponomaryov et al. (2006) implement the order statistics vector directional filters to denoise multichannel images. The angles between the RGB image components is calculated and a descending order vector of directions is obtained by order statistics and peer group method is used to separate noisy and noise free pixels of an image. The noisy pixels are treated using median filter and identity filter is used for noise free class. This filter is also extended to denoise the color image sequence also, by consider three frames and adjusting these frames as RGB colors in an image.

Pei et al. (2018) used component signal noise model, represented as,

$$\begin{aligned} f_i(x) = n_i^D(x) + n_i^s(x) + f_i^s(x), \end{aligned}$$
(27)

for \(i=1,2,\ldots ,m\) and \(n_i^D(x), n_i^s(x), f_i^s, f_i\) are the ith components so in vector form,

$$\begin{aligned} F(x) = N^D(x) + N^s(x) + F^s(x), \end{aligned}$$
(28)

where \(F^S(x)\) represents noise-free features of the object, \( N^S(x)\) is the color-noise image of source noise and \(N^D(x)\) is the noisy image created at the component devices with independent components. The components of f(x) is positive and impulse noise is detected using fuzzy inference rule. The center excluded window pixels ar center x is defined as \(w(x)= \{ x_k| k =0,1,2 \ldots , N \}- \{ x\}\) and the distance from point x is defined as \(d_i= | (f(x), f_i(x_k))| x_k \in w(x).\) These distances are arranged in ascending order, \(D_{r1} \le D _{r2} \le \cdots \le D_{rN} \) following Eq. (26) and half neighborhoods distance is defined as,

$$\begin{aligned} d_A(x)= \left| f_i(x) - \frac{2}{N} \sum _{k=1}^{ \frac{N}{2}} f_i(x_{rk} \right| . \end{aligned}$$
(29)

To differentiate edge and noisy pixels, a distance based fuzzy membership function is defined as,

$$\begin{aligned} u_i^A(d_A(x)) = {\left\{ \begin{array}{ll} 1 &{} \text {s.t.} \ d_A(x) \ge T_2, \\ \frac{ d_A(x) - T_1}{T_2- T_1} &{} \text {s.t.} \ T_1 \le d_A(x) \le T_2, \\ 0 &{} \text {if } d_A(x) \le T_1. \end{array}\right. } \end{aligned}$$
(30)

Here, \(0 \le T_1 < T_2\) are two positive numbers and \( Q_1 = | f_i(x) - Q_{\textrm{min}}| , Q_2 = | Q_{\textrm{max}} - f_i(x)| ,\) and \(Q_{\textrm{min}} \)and \(Q_{\textrm{max}}\) are the minimum and maximum output of the component imaging device quantizer and \( 0 \le T_3 < T_4\) then the second fuzzy membership function is defined as,

$$\begin{aligned} u_i^B(f_i(x)) = {\left\{ \begin{array}{ll} 1 &{} \text {s.t.} \ \min {(Q_1, Q_2)} < T_3, \\ \frac{ T_4 - \min {Q_1, Q_2} }{T_4-T_3} &{} \text {s.t.} \ T_3 \le \min {(Q_1, Q_2)} \le T_4, \\ 0 &{} \text {otherwise } \min {(Q_1, Q_2)} \ge T_4. \end{array}\right. }\nonumber \\ \end{aligned}$$
(31)

The membership functions are combined using AND operator, mathematically expressed as \(u_i(x)= u_i^A(d_A(x))\) \( . u_i^B(f_i(x))\) and the fuzzy inference rule is stated as,

  1. 1.

    If \(u_i(x) > 0.5 \) then pixel is effected by impulse noise and it is defuzzified as

    $$\begin{aligned} U_i(x) = {\left\{ \begin{array}{ll} 1 &{} \text {s.t.} \ u_i(x) \ge 0.5,\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
    (32)
  2. 2.

    If \(\sum _{i=1}^3 U_i(x) \ne 0\) then pixel is declared as noisy (RGB pixel).

The noise free set of neighborhood vectors is defined as \(w_p(x)= \{ x_{pk}| x_{pk} \in W(x), \sum _{i=1}^3 U_i(x_{pk}) =0 \}- \{ x\}.\) The distance between pixels f(x) and \(f(x_k) \) is defined as

$$\begin{aligned} d_u(x, x_{pk}) = \sum _{i=1}^3 (1-u_i(x) ) (f_i(x) - f_i(x_{pk}))^2, \end{aligned}$$
(33)

where \(x_{pk} \in w_p(x)\) then Noise Free Component Similarity Weighted Filter (NFCSWF) is defined as,

$$\begin{aligned} g_i(x) {=} {\left\{ \begin{array}{ll} (1-U_i(x) )f_i(x) {+} U_i(x) f_i(x_{pk}) \; \exists x_{pk} \in W_p(x) \\ \text {s.t.} \ d_u(x, x_{pk})= 0, \\ (1-U_i(x) )f_i(x) + U_i(x) \cdot \alpha _i \\ \text {otherwise}. \end{array}\right. } \end{aligned}$$
(34)

Here, \( \alpha _i [\sum _{x_{pk} \in W_p(x)} S(x, x_{pk}) f_i(x_{pk}) ] \; \exists x_{pk} \in W_p(x)\) and partial component similarity index \(s(x,x_{pk})\) between \(f(x), f(x_{pk})\) is expressed as

$$\begin{aligned} s(x, x_{pk}) = \frac{\frac{1}{d_u(x,x_{pk})}}{\sum _{x_{pj} \in w_p(x)} \left[ \frac{1}{d_u(x,x_{pj})}\right] }. \end{aligned}$$
(35)

The component wise out put of Noise Free-Components Distance Weighted Filter (NFCDWF) is defined as,

$$\begin{aligned} g_i(x)&= (1-U_i(x)) f_i(x) + U_i(x)\nonumber \\&\quad \times \left( \frac{\sum _{x_k \in W(x)}[w_k(x) (1-u_i(x_k)) f_i(x_k)]}{\sum _{x_k \in W(x)}[w_k(x) (1-u_i(x_k)] }\right) . \end{aligned}$$
(36)

The set of noise free neighborhoods of f(x) is defined as \(w_p(x)=\{ x_{pk}| x_{pk} \in \sum _{i=1}^3 u_i(x_{pk})\}\) Then apply classical vector median filter to this class of pixels then the result is called Noise-Free Neighborhood Vector Filter (NFNVF). The Neighborhood Analysis Hybrid Vector Filter (NAHVF) has three component filters and all filters are applied to noise free pixels and their neighborhoods to remove noise. This technique has the correlation among color channels. The Partition Based Trimmed Vector Median (PBTVM) filter is introduced by Ma et al. (2006). The window pixels represented as in Eq. (25) and output of CWVM filter is represented as,

$$\begin{aligned} y_k(c)=x_m(c) \in W(c) | m=\text {arg}\min _{1\le i\le N} \{R_i^k \}. \end{aligned}$$
(37)

All these pixels are ranked according to Eq. (26) using distances and integer central weighted distance will be,

$$\begin{aligned} R_i^k{} & {} = (2k-1)\Vert X_i(c) -x(c)\Vert \nonumber \\{} & {} \quad + \sum _{j=1, \ne \frac{N+1}{2}} \Vert x_j(c)- x_i(c), \end{aligned}$$
(38)

where \(k \in [1,\frac{n-1}{2}],\) for \(k=1\) filter becomes the classical VMF. The trimming is applied to window pixels as per central weight. The pixels outside the rank are discarded and the distance is computed using function,

$$\begin{aligned} d_j= \Vert X_j(c) -x(c)\Vert \quad 1\le j \le N, \end{aligned}$$
(39)

where x(c) represents the central pixel and distances are ranked in ascending order and output of CWTVM filter,

$$\begin{aligned} \hat{y}_k(c)= \left\{ x_{(m)}(c) \in W^{\prime }(c) | m =\text {arg}\min _{1\le i\le N} \{\hat{R}_i^k \} \right\} , \end{aligned}$$
(40)

where \(\hat{R}_i^k \) is the modified central weighted distance and error in reference and actual pixels is expressed as

$$\begin{aligned} e_k(c)= \Vert y_k(c) - x(c)\Vert . \end{aligned}$$
(41)

The threshold points are defined as \(\{\text {Tol}_k | k =1,2,\ldots \) \( \frac{N-1}{2} \}\) and output of partition based vector median filter is defined as,

$$\begin{aligned} \hat{x}(c) ={\left\{ \begin{array}{ll} x(c) &{} \text {if all } e_k \le \text {Tol}_k, 1 \le k \le \frac{N-1}{2},\\ y_m(c) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(42)

The PBTVMF combines the CWTVMF and PBWF. In CWTVMF, the reference estimate \(\hat{y}(c)\) with weight k is generated by a classification based on central pixel and error function e(k) is computed for every pixel. The information inside the distance vector are classified by a scalar partition scheme and local data is mapped to one of the M representative image structures as follows,

$$\begin{aligned} \Phi _i= \{ e(c) \in R^k : \rho (e(c))=i\}, \end{aligned}$$
(43)

where \(k=\frac{N-1}{2}, \rho (.)\) stands for dimensions of partition space and partition function respectively and \(\{W_1,W_2, \ldots \) \( ,W_M\}\) is the representative image structures such that,

$$\begin{aligned} w_{i,0} = 1- \tilde{W_i^{\textrm{T}}}(c) I, \end{aligned}$$
(44)

where I is the unitary column matrix and this consist of all inputs of weighted filtering operation and,

$$\begin{aligned} \hat{y}(c) = [ \hat{x^{\textrm{T}}}(c),\hat{y_1^{\textrm{T}}}(c), \hat{y_2^{\textrm{T}}}(c) , \ldots \hat{y_k^{\textrm{T}}}(c)]. \end{aligned}$$
(45)

The output of PBTVM filter is expressed as,

$$\begin{aligned} \hat{X}(c)= w_{i,0} x(c) + \tilde{W_i}(c) \circ \tilde{y}(c). \end{aligned}$$
(46)

A fuzzy extension of this filter is proposed by Ma et al. (2007) and order statics represented in Eqs. (25) and (26) is extended for 2D case as,

$$\begin{aligned} R= \left( \begin{array}{cccc} R_{(1),1} &{} R_{(1),2} &{} \dots &{} R_{(1),N}\\ R_{(2),1} &{} R_{(2),2} &{} \dots &{} R_{(2),N}\\ . &{} . &{} \dots &{}. \\ . &{} . &{} \dots &{}. \\ R_{(N),1} &{} R_{(N),2} &{} \dots &{} R_{(N),N}\\ \end{array}\right) . \end{aligned}$$
(47)

This can be written as \(S_r= RSx\) where R is the spatial ranked order matrix and every component is a binary function. This is defined as,

$$\begin{aligned} R_{(i),j} ={\left\{ \begin{array}{ll} 1 &{} \text {only if } x_{(i)} = x_j,\\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(48)

where \(x_{(i)} = x_j \) means that sample \(x_j \) has the rank order i. The matrix in Eq. 47 can be fuzzified by representing relations in Eq. (47) as,

$$\begin{aligned} \tilde{R}_{(i),j} =\mu (x_{(i)}, x_j), \end{aligned}$$
(49)

where \(\mu (.) \in [0,1]\) describes correlation degree and membership function is modeled as \(\mu (x_{(i)}, x_j) =\text {e}^{-\frac{(a-b)^2 }{2\sigma ^2}}\) where \(\sigma \) denotes the sample spread.

After the fuzzification of \(\tilde{R},\) a fuzzy sample matrix is developed

$$\begin{aligned} \tilde{S_r}= [\tilde{x}_{(1)},\tilde{x_{(2)}}, \tilde{x}_{(3)},\ldots , \tilde{x_{(N)}}]^{\textrm{T}} \end{aligned}$$

using the normalized transformation as \(\tilde{S_r}= \frac{\tilde{R}S_x }{\Vert \tilde{R }| }\) and fuzzy version of Eq. (42) becomes,

$$\begin{aligned} \tilde{x}_{(i)}= \frac{ \sum _{j=1}^n \mu [x_{(i)}, x_j ] x_j }{\sum _{j=1}^n \mu [x_{(i)}, x_j ] }. \end{aligned}$$
(50)

The fuzzy-rank vector partition (FRVP) filter consists of three main modules, fuzzy-ranked reference filtering, structure partitioning, and weighted filtering. The steps consist of computation of Eqs. (37), (41), (43). Let N color pixels of a square sliding window are arranged represented as Eq. (25) = and output of CWVM filter Eqs. (37) and (38) is computed as a result of fuzzification, Eq. (45) becomes,

$$\begin{aligned} \tilde{y_k(c)}= \frac{ \sum _{j=1}^n \mu [y_k(c),x_i(c) ] . x_i(c) }{\sum _{j=1}^n \mu [y_k(c),x_i(c) ] }, \end{aligned}$$
(51)

where \(\mu \) represents the pixel similarity of windowed pixels and \(x_i\) are the pixels of window.

4.2 Partition filters

It is multivariate order statistics based filtering method. The center weighted vector median (CWVM) is proposed by Viero et al. (1994). In this filter, the window centered pixels are represented as \( W(c) = \{x_1(c),x_2(c), \ldots ,x_N(c)\}^2\) where W(c) is a window centered at coordinate c. The output of this filter is computed by Eqs. (37), (38), (39), and (51). The color similarity is estimated using the Gaussian membership function as \( \mu (x,y)= \text {e}^{-\frac{| x-y |}{2 \sigma ^2 }}\) This similarity is defined channel wise for RGB color space.

4.3 Fuzzy based approaches

These filters are based on fuzzy set theory. In this sort of filters, fuzzy sets are used in modeling noise, developing rules for pixel classification etc. Fuzzy methods are implemented for designing switching filters in multiple ways in Wang et al. (2015), Gregori et al. (2018), Pei et al. (2018), Ananthi and Balasubramaniam (2016), Schulte et al. (2007a), Ma et al. (2007), Morillas et al. (2009) and Surya Prasath and Delhibabu (2015). The details of these algorithms is discussed below.

Wang et al. (2015) added switching in median filter and introduced the Modified Switching Median Filter (MSMF)(Wang et al. 2010). A noisy pixel of color image is denoted by f as a 3D vector. The pixels of a square window are represented as raw data following Eq. (26).

$$\begin{aligned} w&=\{f (i-l,j-l), f(i-l,j+1-l), \ldots , \nonumber \\&\quad f(i,j),\ldots , f(i+l,j+l) \}, \end{aligned}$$
(52)

where f(ij) is the central pixel and these pixels are rearranged using rank order statistics as in Eq. (26)

$$\begin{aligned} w^{\prime }=\left\{ \chi _{(1)}, \chi _{(2)}, \ldots , \chi _{\left( \frac{n+1}{2}\right) },\ldots , \chi _{(n^2)}\right\} . \end{aligned}$$
(53)

In the first stage, the AVMF is used to determined the noise contaminated pixels by,

$$\begin{aligned} \left\| \frac{1}{r} \sum _{m=1}^r \chi _{(m)} - f_{(i,j)}\right\| _2 \ge \text {Tol}. \end{aligned}$$
(54)

Then directional Laplacian operator are used and \(z_{(ij)}\) is defined as minimum of these edge values. The \(d_{\textrm{min}} \) and \(d_{\textrm{max}}\) are minimum and maximum pixel value differences then membership function is represented as,

$$\begin{aligned} \mu (i,j) =\frac{d_{\textrm{max}}- z_{ij}}{d_{\textrm{max}} - d_{\textrm{min}}}. \end{aligned}$$
(55)

This defines noisy and noise free pixels and Fuzzy Decision Filter (FDF) is used to separate the noisy pixels, defined as

$$\begin{aligned} y^{\textrm{FDF}} ={\left\{ \begin{array}{ll} f(i,j) &{} \text {if } \mu (i,j) \ge 0.9, \\ \frac{f(i+u,j+v) \mu (i+u,j+v)}{\sum _{u,v \in N} } &{} \text {if } \mu (i,j) \le 0.9,\\ \text { and } \mu (i+u, j+v) \ge 0.8,\\ \frac{f(i+u,j+v) \mu (i+u,j+v)}{\sum _{u,v \in N} } &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(56)

At next stage, NLM is implemented to remove noise to noisy pixels only and Identity filter is imposed to noise free pixels.

Fuzzy inference rule is developed by Ananthi and Balasubramaniam (2016) to separate noisy and noise free pixels. The mean grey level of background and foreground image is defined as,

$$\begin{aligned}{} & {} \text {mb}(l)= \frac{ \sum _{I(i,j)=0}^l I(i,j). h(I(i,j))}{\sum _{I(i,j)=0}^l h(I(i,j))}, \\{} & {} \text {mo}(l)= \frac{\sum _{I(i,j)=l+1}^{L-1} I(i,j). h(I(i,j))}{\sum _{I(i,j)=l+1}^{L-1 } h(I(i,j))}, \end{aligned}$$

where h(ij) are the number of pixels in an image and L is the maximum intensity level. The fuzzy member ship function \(\mu _{A1}(I(i,j) \) is defined as,

$$\begin{aligned} \mu _{A1}(I(i,j)= {\left\{ \begin{array}{ll} f(\text {REF}(I(i,j), \text {mb}(l))) &{} \text {if } I(i,j) \ge l,\\ (\text {REF}(I(i,j), \text {mo}(l))) &{} \text {otherwise}. \end{array}\right. }\nonumber \\ \end{aligned}$$
(57)

The non-belongingness is expressed as a membership function \(\nu _{A1}(I(i,j) = (1-(\mu _{A1}(I(i,j))^{\alpha })^{1-\alpha } ,\) then hesitation degree is expressed as \(\Pi _{A1}(I(i,j)= 1- \mu _{A1}(I(i,j)-\mu _{A1}(I(i,j).\) Finally, upper and lower limits of membership function are defined as,

$$\begin{aligned}{} & {} \text {ML}_{\tilde{F}}(I(i,j))= \mu _{A1}(I(i,j)- \Pi _{A1}(I(i,j)/ n,\\{} & {} \text {MU}_{\tilde{F}}(I(i,j))= \mu _{A1}(I(i,j)- \Pi _{A1}(I(i,j)/n, \end{aligned}$$

respectively. Entropy is computed as,

$$\begin{aligned} E(f_l) = \frac{1}{PQ} \sum _{i=1}^P \sum _{j=1}^Q h(I(i,j)). \text {WM}_{\tilde{F}}(I(i,j)) \end{aligned}$$
(58)

where \( \text {WM}_{\tilde{F}}(I(i,j))= \text {MU}_{\tilde{F}}(I(i,j)) -\text {ML}_{\tilde{F}}(I(i,j)),\) the noise is modeled as entropy measure and this measure is minimized. Initially noise modeling algorithm is designed for grey images, this is used for each channel separately in case of color images. This noise is reduced by using the fuzzy filtering technique and fuzzy median filter is used in practice.

Gregori et al. (2018) introduced Corrected Fuzzy Averaging Filter (CFAF). The color image pixels are represented as f and the pixels inside window are represented according to Eq. (52) and rearranged as in Eq. (53). The similarity measure \(\rho \) is defined as the Euclidian norm,

$$\begin{aligned} L_{\infty }(x_i, x_j)= \max {\{ | x_i^R- x_j^R |, | x_i^G- x_j^G|, | x_i^B- x_j^B| \}}, \end{aligned}$$

and image pixels are rearranged accordingly

$$\begin{aligned} \rho (f(i-l,j-l),\chi (1))&\le \rho (f(i-l,j-l), \chi (2)) \\&\le \cdots \le \rho (x(i-l,j-l),\chi (n)), \end{aligned}$$

then the Random Order Statistics (ROS) is used for rearranging first \((s+1)\) pixels as \(\text {ROD}_s(f(i,j){=} \sum _{0}^s L_{\infty } (f(i,j), \chi _{n}.\) The \(\text {ROD} \) statistics takes integer values from zero to 255 and certainty degree for \(x= \text {ROD}_s\) is defined as,

$$\begin{aligned} \delta (f(i,j) ) ={\left\{ \begin{array}{ll} 0 &{} \text {if } x< k_1, \\ \frac{x-k_1 }{k_2- k_1} &{} \text {if } k_1< x< k_2,\\ 1 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(59)

The Robust Vector Median Filter (RVMF) (Morillas and Gregori 2011) is implemented to remove the noise from noisy pixels and fuzzy averaging is defined as \(\overline{f(i,j)}= (1- \delta (f(i,j))f(i,j) + \delta (f(i,j) F_{\text {RVMF}}\) where \(F_{\text {RVMF}}\) is the output of RVMF.

Schulte et al. (2007a) define couples red-green, green-blue and blue-red as \(\text {rg}(i,j)= (f(i,j,1), f(i,j,2)), \; \text {gb}(i,j)= (f(i,j,2), f(i,j,3))\) and \(\text {br}(i,j)= (f(i,j,3), f(i,j,1))\) where color pixel have three components \( f(i,j, k), k=1,2,3.\) The distances between these couples are computed using Minkowski distances and three fuzzy rules are developed using these distances. These distances are used for computation of weights in Takagi–Sugeno fuzzy model (Takagi and Sugeno 1985) and a closeness degree is associated to a fuzzy membership function and coupling via correlation. The fuzzy weighted filter is defined as,

$$\begin{aligned}&F(i,j,m) \nonumber \\&\ \ = \frac{ \sum _{k=-k}^K \sum _{l=-k}^K w(i+k, j+l,m). f(i+k, j+l,m)}{\sum _{k=-k}^K \sum _{l=-k}^K w(i+k, j+l,1)}, \nonumber \\&\quad m=1,2,3. \end{aligned}$$
(60)

The second filter is designed to reduce the noise ratio in color components and preserving the fine details. The local differences are defined as \((\text {LD}_R(k,l)= F(i+k,j+l)- F(i,j),\) similarly \(\text {LD}_g, \text {LD}_b\) are computed and correction term will be \( \epsilon (k,l) = \frac{1}{3} (\text {LD}_R(k,l)+\text {LD}_G(k,l)+\text {LD}_B(k,l) ) \) and output of second filter is defined as,

$$\begin{aligned} g(i,j,1)= \frac{ \sum _{k=-k}^K \sum _{l=-k}^K F(i+k, j+l,1). \epsilon (k, l,1) }{(2L+1)^2 }. \end{aligned}$$

Similar filters for other components. These are the outputs for each color channel and \(\epsilon \) is the correction term for each component.

4.4 Quaternion based filters

A four dimensional complex number introduced by Hamilton (1866), called quaternion expressed as \(q= a+ bi+cj+ dk \) where abcd are real numbers and ijk are mutually orthogonal unit vectors. These numbers have properties:

  1. 1.

    \(i^2= j^2=k^2= ijk=-1\)

  2. 2.

    \(ij=k, jk=i, ki=j,\)

  3. 3.

    \(ji=-k, kj=-i,ik=-j\)

  4. 4.

    \(\Vert q\Vert = \sqrt{a^2+b^2+c^2+d^2} \)

  5. 5.

    \(\overline{q}= a-bi-cj-dk\) where \(\overline{q} \) is called complex conjugate.

A quaternion is called unit quaternion if \(\Vert q \Vert =1.\) The filters in which the quaternion are used are called quaternion based filters. A pure quaternion if \(a=0\) and used to represent a color pixel by Sangwine and Ell (2000) as \(q(x,y)= r(x,y) i+ g(x,y) j + b(x,y) k\) where q(xy) is a color pixel, which is the linear combination of red r(xy),  green g(xy) and blue b(xy) colors. A pure quaternion with unit magnitude is defined as \(u= \frac{i+j+k}{\sqrt{3}},\) in spherical coordinates \(u= | u| \text {e}^{\mu \theta }= \cos (\theta ) + \mu \sin (\theta ).\) Following are few useful definitions color differences:

  • Quaternion intensity difference between two pixels \(q_1\) and \(q_2\) is defined as,

    $$\begin{aligned} d_1(q_1, q_2){=} \frac{| ( Tq_1 \overline{T} + \overline{T} q_1 T )- ( Tq_2 \overline{T} + \overline{T} q_2 T ) |}{2}.\nonumber \\ \end{aligned}$$
    (61)
  • Quaternion chromatic difference between two pixels \(q_1\) and \(q_2\) is defined as,

    $$\begin{aligned} d_2(q_1, q_2){=} \frac{| ( Tq_1 \overline{T} - \overline{T} q_1 T )- ( Tq_2 \overline{T} - \overline{T} q_2 T ) |}{2}.\nonumber \\ \end{aligned}$$
    (62)
  • The color difference between two pixels \(q_1\) and \(q_2\) is defined as \(d(q_1, q_2)= d_1(q_1, q_2) + d_2(q_1, q_2).\)

  • Quaternion difference between two pixels \(q_1\) and \(q_2\) is defined as,

    $$\begin{aligned} d_3 (q_1, q_2){=} (r_3- \overline{q3})i + (g_3- \overline{q3})j +(b_3- \overline{q3})k,\nonumber \\ \end{aligned}$$
    (63)

    where \(q_3= r_3 i + g_3 j+ b_3k=q_1+ Rq_2 R ^{*}\) and \(\overline{q3}= \frac{r_3+ g_3+ b_3 }{3}.\)

  • Quaternion unit transformation for color pixel q as,

    $$\begin{aligned} Y&= u q \overline{u} \\&= (r_i+ g_j+b_k) \cos (2\theta ) +\frac{2}{\sqrt{3}}\mu (r+g+b)\sin ^2(\theta ) \\&\quad +\frac{1}{\sqrt{3}} [(b-g)i+(r-b)j+(g-r)k]\sin (2\theta ), \end{aligned}$$

    for \(\theta = \frac{\pi }{4}\) first part becomes zero.

These basic definitions are used in designing the quaternion based switching filters. Geng et al. (2012) developed the Quaternion Switching Filter (QSF) by computing the quaternion difference between central pixel and pixels at angle 0, 45,90, 135 of a square window of size 3. This defines a set of four distances \(V_i, i=1,2,3,4\) and at least one distance will be zero for edge and noisy pixels have high values. These distances are used to distinguish the edge and noisy pixel. Let \(v=\min \{v_1, v_2,v_3, v_4 \}\) and T is a fixed threshold. The pixels whose distance is more than threshold, are called noisy pixels and these pixels are further processed and passed through the VMF. This filter is defined as,

$$\begin{aligned} q^{\textrm{VMF}}= \text {argmin}_{q_t \in \{q_1,q_2,\ldots ,q_N\}} \sum _{s=1}^N \Vert q_s- q_t \Vert . \end{aligned}$$
(64)

Then QSF is defined as,

$$\begin{aligned} q^{\textrm{QSF}} ={\left\{ \begin{array}{ll} q^{\textrm{VMF}} &{} \text {if } v > T, \\ q &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(65)

Wang et al. (2014) rearranged the pixels inside window according to order statistics Eqs. (26), (52) and (53)

$$\begin{aligned} q(x-l,y-l), \ldots , q (x,y) ,\ldots , q(x+l, y_l), \end{aligned}$$
(66)

inside the square window of size \(2l+1\) as per rank

$$\begin{aligned} q_{1k}, \ldots , q_{- 1+ \frac{n+1}{2}k},q_{ \frac{n+1}{2}k}, q_{ 1+ \frac{n+1}{2}k}, \ldots , q_{Nk}, \end{aligned}$$
(67)

and computed the quaternion distance using Eq. (62). In peer group filter (FPGF) (Smolka 2010) peers are formulated to divide pixels into noisy and noise free classes. The near pixels are counted \(\text {count}(q, d) < m,\) where d is the distance between central and other window pixels and m is certain pixel count. The output of peer group (for 9 pixel window) is determined as,

$$\begin{aligned} \Omega ^{*} =q_1^{\textrm{proposed}},q_2^{\textrm{proposed}} q_3^{\textrm{proposed}} q_4^{\textrm{proposed}} , q_5 ,\ldots , q_9. \end{aligned}$$
(68)

Further noise is judged using the AVMF and threshold is defined. All noisy pixels in Eq. (68), are replaced by VMF and noise free pixels remain unchanged,

$$\begin{aligned} q^{\textrm{QSF}} ={\left\{ \begin{array}{ll} q^{\textrm{VMF}} &{} \text {if noise pixels},\\ q _{\frac{n+1}{2}} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(69)

Jin et al. (2016) determine sum of two quaternion \(q_1 \text { and } q_2\) as \(q_3= q_1 + u q_2 \overline{u} = r_3 i+ g_3 j+ b_3 k ,\) its magnitude is computed by Eq. (63). The distance of \(q_3\) form grey line \(u= i+j+k\) is computed using new color distance formula, defined as

$$\begin{aligned} d_4(q_1, q_2) ={\left\{ \begin{array}{ll} | I(q_1, q_2)| \\ \text {if } d_3(q_1, q_2)=0,\\ | d_3(q_1, q_2)| \\ \text {if } I(q_1, q_2)=0,\\ | d_3(q_1, q_2)| ^{\alpha } \cdot I(q_1, q_2) | ^{1-\alpha } &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(70)

Here, \(I(q_1, q_2) \) is the identity distance and \(\alpha \in (0,1)\) is a controlling parameter for brightness and chromaticity color difference. Impulse noise is detected using pixels in four directions to central pixel inside window of size N as in Eq. (66) and central pixel is verified for noisy and filter output is rearranged as in Eq. (67). The mean distance is determined by,

$$\begin{aligned} \overline{d_h}=\frac{1}{N-1}\sum _{q \in \Omega ^{*}} d \left( q, q_{\frac{n+1}{2}}\right) , \end{aligned}$$

where distance d is computed using Eq. (70), cardinality of \(\Omega ^{*}\) is determined and filter output is defined as,

$$\begin{aligned} q^{\textrm{filter}}_{\frac{n+1}{2}} ={\left\{ \begin{array}{ll} q^{\textrm{proposed}}_{\frac{n+1}{2}} &{} \text {if } \min (\overline{d_1}, \overline{d_2},\overline{d_3},\overline{d_4}) > D, \\ q _{\frac{n+1}{2}} &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(71)

where \(q^{\textrm{proposed}}_{\frac{n+1}{2}} \) is the next filter stage and D is a predefined threshold. At second stage, only noisy pixels are treated and VMF is performed to noisy pixels only.

Jin et al. (2011) represent color pixels using quaternion and directional vector order statistics. The deviation from median and edge directions are used to decide that the pixel is noisy or noise free. At next step, noisy pixels are treated using VMF and identity filter is implemented to noise free images. The \(n^2\) pixels are represented as rater scans data in quaternion of a square window \(n^2\) according to Eq. (66). Color distances in four dimensions are determined using a square window, the five distances are represented as,

$$\begin{aligned} \Gamma _h= \{ q_{h,1}, q_{h,2}, q_{h,\frac{n+1}{2}},q_{h,4}, q_{h,2} \}, \end{aligned}$$

for \(h=1,2,3,4\) represents the four direction samples and \(q_{h,l}\) represents the color sample.

Similarly,

$$\begin{aligned} \Gamma ^{*}_h= \left\{ q^{\textrm{proposed}}_{h,1}, q^{\textrm{propose}}_{h,2}, q_{h,\frac{n+1}{2}},q_{h,4}, q_{h,2} \right\} , \end{aligned}$$

are determined and \(q^{\textrm{proposed}}_{h,k}, k=1,2\) represents the output of the filter. The vector median of the color pixels in \(\Gamma ^{*}_h\) is

$$\begin{aligned} q_h^{\textrm{VM}}= \arg \min _{q\in \Gamma ^{*}_h}\sum _{q_l \in \Gamma ^{*}_h} | q_l- q |, \end{aligned}$$
(72)

and pixel similarity is defined as \(\pi _h= \{ q \in \Gamma ^{*}_h: d_3(q, q^{\textrm{VM}}_h) \le \text {tol}\}\) where \(\text {tol}\) is the threshold and distance \(d_3\) is computed by Eq. (63). The measure \(\text {card}(\pi _h)\) represents the cardinality of the set \(\pi _h\) and k is the maximum number of similar pixels in the set. The classification filter for noisy pixels,

$$\begin{aligned} q^{\textrm{proposed}}={\left\{ \begin{array}{ll} q_{\frac{n+1}{2}} &{} \text {if } \text {card}(\pi _k) \ge m \; \wedge \; d_3 < \text {tol},\\ q^{\textrm{vmf}} &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(73)

where \(q_{\frac{n+1}{2}} \) is a corrupted noisy pixel with impulse noise.

4.5 Peer group filters

The concept of peer groups is introduced in Kenney et al. (2001). A peer group in image is defined as a group of similar pixels based on distance and similarity measures. The algorithms which use this concept, are called peer group filters and this family of switching filters use detection and replacement option to remove noise from digital image. The peer group P(xmd) is the collection of image pixels of \(n \times n\) filtering window W that obey the condition,

$$\begin{aligned} \Vert x_0- x_j\Vert \le d \quad \text {such that } x_j \in W, \end{aligned}$$
(74)

where \(\Vert \cdot \Vert \) is a Euclidian norm and peer group p is the set of d neighborhood pixels to central pixel \(x_0.\) The pixels of RGB color image in the square window as \(\{ x_1, x_2, \ldots , x_n \}\) and covariance matrix \(\sum =[(x-\overline{x}).(x-\overline{x}) ]\) is defined as,

$$\begin{aligned} \sum&= \left( \begin{array}{ccc} \text {Var}(R) &{} \text {Cov}(R,G) &{} \text {Cov}(R,B) \\ \text {Cov}(G,R) &{} \text {Var}(G) &{} \text {Cov}(G,B) \\ \text {Cov}(B,R) &{} \text {Cov}(B,G) &{} \text {Var}(B) \\ \end{array}\right) \nonumber \\&=\left( \begin{array}{ccc} \sigma _{RR} &{} \sigma _{RG} &{} \sigma _{RB} \\ \sigma _{GR} &{} \sigma _{GG}&{} \sigma _{GB} \\ \sigma _{BR} &{} \sigma _{GB} &{} \sigma _{BB} \\ \end{array}\right) . \end{aligned}$$
(75)

Note that here RGB represents the usual color image channels, E is expectation operator and \(\overline{x}\) is the mean of sample vector. The \(\sigma _{kk}\) is the standard deviation of image channels and dispersion matrix \((\sum )\) is a square, symmetric and a full rank matrix. Sigma Vector Median Filter, proposed by Lukac et al. (2006), can be constructed by multiple ways, using lowest rank, mean based. The variance of vectors (pixels) in a square window w is defined as,

$$\begin{aligned} \Psi _{\gamma }= \frac{\sum _{j=1}^N \Vert x_{(1)}- x_j \Vert _{\gamma }}{ N-1}, \end{aligned}$$

and threshold Tol defined as follows,

$$\begin{aligned} \text {Tol}= L_{(1)} +\lambda \Psi _{\gamma } +\frac{N-1+\lambda }{N-1} L_{(1)}, \end{aligned}$$

where \(L_{(1) }\) is the smallest aggregated Minkowski metric and smoothness is adjusted by \(\lambda \) in SVMF. Then switching filter is defined as,

$$\begin{aligned} y_{\text {SVMF}} ={\left\{ \begin{array}{ll} x_{(1)} &{} \text {if } L_{\frac{N+1}{2}} \ge \text {Tol},\\ x_{\left( \frac{N+1}{2}\right) } &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(76)

where \(L_{\frac{N+1}{2}}\) is the distance measure from the central pixel and pixels are classified as noisy pixel if distance measure \(L_{\frac{N+1}{2}}\) of the central sample \( x_{\frac{N+1}{2}}\) is larger or equal to the threshold \(\text {Tol}.\) The noisy pixels are replaced with the lowest ranked vector \(x_{(1)}.\) The approximated variance is given by \(\Psi _A= \frac{\Omega _{(1)} }{N-1}\) where \(\Omega _{(1)}\) is the smallest hybrid measure and \(\text {Tol}= \Omega _{(1)} + \lambda \Psi _{\gamma A}.\) Two filters SDDF and \(\text {SBVDF}\) are proposed. The threshold is defined as \( \text {Tol}= \alpha _{(1)} + \lambda \Psi _A\) where measure \(\Psi _A= \frac{\alpha _{(1)}}{N-1}.\) Then \(\text {SBVDF}\) is defined as follows,

$$\begin{aligned} y_{\text {SBVDF}} ={\left\{ \begin{array}{ll} x_{(1)} &{} \text {if } \alpha _{\left( \frac{N+1}{2}\right) } \ge \text {Tol},\\ x_{\left( \frac{N+1}{2}\right) } &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(77)

where \(\alpha _{(1)} \) is the smallest angular distance computed and \(y_{\text {SBVDF}}\) is the output of SBVDF filter. The SDDF filter out put is defined as,

$$\begin{aligned} y_{\text {SDDF}} ={\left\{ \begin{array}{ll} x_{(1)} &{} \text {if }\Omega _{\left( \frac{N+1}{2}\right) } \ge \text {Tol},\\ x_{\left( \frac{N+1}{2}\right) } &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(78)

where \(x_{(1)} \) is the output of VMF. The output of adaptive sigma DDF filter,

$$\begin{aligned} y_{\text {ASDDF}} ={\left\{ \begin{array}{ll} x_{(1)} &{} \text {if }\Omega _{\left( \frac{N+1}{2}\right) } \ge \text {Tol},\\ x_{\left( \frac{N+1}{2}\right) } &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(79)

Smolka and Chydzinski (2005) used Euclidian norm as similarity measure for deciding noisy and noise free pixels. The Fast Peer Group Filters (FPGF) using Vector Valued Filter \(\text {FPGF}_{\textrm{VMF}}\) for noise removal, then switching filter is defined as,

$$\begin{aligned} y_1 ={\left\{ \begin{array}{ll} x_1 &{} \text {if } x_1 \in \rho (x_i, m,d),\\ x_{(1)} &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(80)

where \(y_1\) is output of the filtering operation, \(x_1\) and \(x_{(1)}\) are central pixel in W and the output of the VMF. Various filtering designs can be used instead of the VMF.

The above method is revisited by Camarena et al. (2010) and noise detection step is divided into two parts, during first phase, image partition is performed and two disjoint sets are created. If the cardinality of set \(\rho (x_i, d) \) is larger than for a fixed value \(m+1\) then pixel \(x_j\) is declared as noisy and otherwise it is non-diagnosed, then take \(m_1 \in \{1,2,\ldots , m\}\) and every non diagnosed pixel if \(\rho (x_i, d) > m+1\) then \(\forall x_j \in \rho (x_i, d)\) are non corrupted. then similarly filtering process is implemented as in above cited algorithm.

Morillas et al. (2009) used the ordered color pixel representation as in Eq. (25) and define fuzzy similarity measure as,

$$\begin{aligned} \rho (x_i, x_j)= \text {e}^{-\frac{\Vert x_i- x_j \Vert }{x_{\sigma }}}, \quad i,j= 1,2 \ldots m-1. \end{aligned}$$

It is clear that \(\rho \in [0,1]\) and \(\rho = 1 \Leftrightarrow x_i = x_{(i)}.\) The set of color vectors in descending order with respect to similarity will become \(W^{\prime }= \{x_{(0)}, x_{(1)},x_{(2)},\ldots , x_{(n^2-1)}\}\) such that \(\rho (x_0 , x_{(0)}) \ge \rho (x_0 , x_{1}) \ge \rho (x_0 , x_{(2)}) \ge \cdots \ge \rho (x_0 , x_{(n^2-1)}) \) with \(x_0 , x_{(0)}.\) This concept is fuzzified and a fuzzy membership function is introduced as \(C^{f_0}(f_{(i)}=\rho (f_0, F_{(i)}), i= 1,2,3 \ldots , n^2-1.\) It is a monotonically decreasing function that computes certainty that \(x_0\) is similar to \(x_{(0)}\) and accumulated similarity for \(x_{(i)}\) denoted as \(A^{x_0}(x_i)= \sum _{k=0}^i \rho (x_0, x_{(k)}).\) This shows that \(A^{x_0}(x_0)=1\) and it take \(i+1\) values if \(x_{(0)}=x_{(1)} = \cdots = x_ {(i)} .\) The fuzzy membership function

$$\begin{aligned} \mu (x)=-\frac{1}{(n^2-1)^2(x-1)(x-2n^2+1)}, \end{aligned}$$

is defined using \(A^{x_i}(x_{(i)}\) for classification of pixels into noisy and noise free classes. The membership function then becomes,

$$\begin{aligned} L^{x_0} (x_{(i)})&= \mu (A^{x_0}(x_i))\nonumber \\&=-\frac{1}{(n^2-1)^2(A^{x_0}(x_i)-1)(A^{x_0}(x_i)-2n^2+1)}, \end{aligned}$$
(81)

for \(i= 0,1,2 \ldots , n^2-1.\) The cardinality of member pixels set of a peer group are determined using fuzzy logic. For this purpose, a fuzzy peer group \(F(P^{x_0} _m\) is constructed on the set \( \{ F_{(0)},F_{(1)},F_{(2)},\ldots , F_{(m)}\}\) and two filters are used to remove impulse and gaussian noise from image.

Smolka (2010) used an aggregated distance \(r_k=\sum _{j=1}^n\Vert x_k-x_j\Vert _l\) instead of Euclidian distance to remove the impulse noise. The filtering window associated with central pixel is also changed and now window is associated with the median pixel then the Fisher criterion function F(k),  where k denotes the rank of the sorted pixel is defined as,

$$\begin{aligned} F(k)= \frac{[m_1(k)-m_2(k)]^2}{ v_1(k)+ v_2(k)}, \quad k=1,2,\ldots , n-1, \end{aligned}$$

where \(m_1\) and \(m_2\) denote the mean values of the two classes of pixels and \(v_1\) and \(v_2\) stand for their variances. The output of Peer Group Filter (PGF) is the estimation of original image sample based on the pixels that belong to the detected peer group and can be calculated as the mean of the peer group pixels or their vector median. For the classification of pixels into noisy and undisturbed by noise process, the distances \(\rho _k\) can be substituted by the sorted sequence of accumulated distances \(r_k.\) In this way, a powerful family of filters, capable of removing impulsive noise in color images, can be constructed, such as vector median filters.

Fast averaging filters for color image denoising is introduced by Malinski and Smolka (2016). This filter have some similarity to Fast Peer Group Filter (Smolka and Chydzinski 2005) and the Sigma Vector Median Filter (Lukac 2002; Smolka 2010). First of all, the closeness and closed neighborhoods are constructed using the Euclidean distances between color pixels. The neighborhood with radius d is defined as \(m_k= \ne \{ x_j \in W, \Vert x_k - x_j \Vert \le d\},\) where \(\ne \) and \(\Vert . \Vert \) individually represents cardinality and Euclidean norm in RGB spaces and peers are defined. The size of peer group is directly related to distortion in pixel values, lower the value of m,  higher the noise. This divides the pixels into noisy and noise free classes and Weighted Average Filter (WAF) applied to the noisy pixels in the same window and noise free pixels are skipped. The weights are computed as,

$$\begin{aligned} w_i= \frac{\mu _i}{\sum _{i=1}^2 \mu _i}, \end{aligned}$$

with \(\mu _i= m_i^{\gamma },\) n is window size and \(\gamma \) which regulates membership degree, is a second parameter. The WAF is then defined as,

$$\begin{aligned} y_1= \frac{1}{\sum _{i=1}^n w_i} \sum _{i=1}^n w_i x_i. \end{aligned}$$

Hara and Guan (2010) defined a nhbd of RGB image pixel x for pixel \(i=1,2,3\) as,

$$\begin{aligned} E(i)= \{ J | J \in \Omega (i) \wedge \Delta E= (x_i, x_j) \le \epsilon \}, \end{aligned}$$
(82)

where \(\Omega (i)\) and \(\Delta \) are spatial neighborhood and Euclidian distance respectively. The image prior probability is set as,

$$\begin{aligned} p(f_{E_i})=\Pi _{j \in E(i)}\left[ \frac{ \text {Exp} (-0.5(f_j- \mu _i)^{\textrm{T}} F ^{-1}(f_j- \mu _i)) }{\sqrt{92\pi )^3 | F_i |}}\right] ,\nonumber \\ \end{aligned}$$
(83)

where mean and covariance matrix are represented by \(\mu _i , F_i\) and noise-contamination process is modeled as,

$$\begin{aligned}&p(y_{E_i}|f_{E_i})=\Pi _{j \in E(i)}\nonumber \\&\quad \times \left[ P(y_i|f_j)= \frac{ \text {Exp} (-0.5(y_j- f_j)^{\textrm{T}} N^{-1}(y_j- f_j)) }{\sqrt{ 92\pi )^3 | N_i |}}\right] , \end{aligned}$$
(84)

where \(N_i\) is covariance matrix of noise. The original image is obtained by maximizing a posteriori probability and parameters are estimated using the probability of noise free image \(y_i\) and peer is defined using the cardinality of E(i).

4.6 Superpixel segmentation based filters

Clustering techniques are implemented in Jin (2017); Thamilselvan and Sathiaseelan (2018). In both algorithms, pixels are divided into noisy and noise free classes using supper pixel segmentation. In this segmentation method, pixels are grouped into similar ones and noisy pixels appear as rectangular or circular shape and noise free pixels appear in narrow sharp or sharp line objects. The detected noisy pixels block is further processed by Jin (2017) and quaternion distance as described in Eq. (70) is computed. The two stage quaternion filter (Jin et al. 2016) is implemented to decide that the pixel is noisy or noise free. The roundedness measure circularity defined as \(R= \sum _{j=1}^{m_k} \sigma _{k,j}\) where,

$$\begin{aligned} \sigma _{k,j} ={\left\{ \begin{array}{ll} 1 &{} \text {if}\ d(x_{k,j}, \frac{1}{m_k} \sum _{p=1}^k X_{k,p} ),\\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(85)

is used as the selection criteria of noisy super pixels and this is followed by VMF and noise free pixels remained unchanged. Thamilselvan and Sathiaseelan (2018), a clustering technique is performed on superpixels. The similarity measure is defined using the K-mean clustering algorithm and supper pixels are grouped into eight groups. This is used to enhance the accuracy of super pixels and the segmentation is refined using iterative linear classification methodology. Then noisy pixels are removed using the nonlocal means filter.

4.7 Mathematical morphology based filters

The images are dealt with a set theoretical approach, called the mathematical morphology. This allows the extraction of useful image components for analysis in imaging. For this purpose, morphological filters are developed and these filters behave similar to switching filters. In these methods, operator remove is designed.

A morphological filters for color image denoising is introduced by Ruchay and Kober (2017). The morphological filters are constructed using set theory applied at low level image processing task. The grey image is converted into binary image by threshold \(\text {BW}(A, \text {level}).\) The set operations, intersection, union, complement and difference is defined and morphological operation such as dilation, erosion, opening and closing are defined. Let A be a set of pixels and let B be a structuring element then \((\hat{B}_s)\) be the reflection of B about its origin and followed by a shift by s. The dilation \(A \oplus B = \{s | (\hat{B}_s \cap A) \subset A \}\) and erosion \( A \ominus B = \{s| (B_s) \subset A\}\) are the set of all shifts. Similarly closing and opening operators are defined. The object boundary is obtained by apply the morphological remove operation. For color images, each channel is processed individually and a noise is detected using DetectionMethod5(X) as,

$$\begin{aligned} M_1= & {} \cup _{i=r,g,b} (\text {set}1(x_i, \text {mset}) *B) \cup _{i=r,g,b} \\{} & {} (\text {set}2(x_i, \text {pset}) *B, \\ M_2= & {} \text {remove}( \cup _{i=r,g,b} \text {BW} (\text {set}2(x_i, \text {pset}), \text {level})), \\ M_3= & {} \text {remove}( \cup _{i=r,g,b} \text {BW} (\text {set}3(x_i, \text {mset}), \text {level})),\\ M_4= & {} (\text {BW}(\text {rgb}2\text {gray}(X), \text {level}) *B),\\ M_5= & {} (\text {BW}(\text {rgb}2\text {gray}(\text {set}2(X, \text {pset})), \text {level}) *B. \end{aligned}$$

Finally, filter M is set as \(M = M_1 \cup M_2 \cup M_3 \cup M_4 \cup M_5,\) where B is the standard structuring element. The operator \(\text {set}1, \text {set}2\) and \(\text {set}3\) are the subtraction and addition respectively \(\text {pset}, \text {mset}\) are the set of pixels defined as parameters and \(\text {rgb}2\text {gray}\) converts the color image into grey image.

4.8 Graph based filters

The graph based switching filters are designed for denoising the flat and edge regions with separate and effective filters. These filters decide about flat and edge region and switching is performed between Arithmetic Mean Filter (AMF) and Fuzzy Noise Reduction Method (FNRM). Jordan et al. (2012) represented a color pixel as \(F_0\) and its connected components of graph \(G_{F_0}\) by \(L(H_{F_0})\) The parameter \(\text {card}(L(H_{F_0}))\) takes discrete values, between 0 and \(\left( \begin{array}{c} N^2 \\ 2 \end{array}\right) .\) For RGB images, \( N = 3\) and \(\text {card}(L(H_{F_0})) \in \{0, \ldots , 36\}.\) The image pixels are classified in 37 classes with admissibility of \(\text {card}(L(H_{F_0}))\) and built a base \(\beta = \{\beta _i, i= 1,2,\ldots , 37 \}, \beta _i \in [0,1] .\) These \(\beta _i\) are used for defining flat and edge regions and switching between two filters. The filter output \(\text {SSLG}(F_0)\) is defined as

$$\begin{aligned} \text {SSLGD}(F_0) = (1 - \beta _i)\text {AMF}(F_0) +\beta _i\text {FNRM}(F_0). \end{aligned}$$
(86)

The value \(\beta _i\) is pixel dependent, for homogenous region, it is large and lower for non homogenous region. This is defined by maximizing the PSNR between the filter output and the original noise free image. Pérez-Benito et al. (2018) proposed an improvement in this filter and used both filters for undecided regions.

4.9 Adaptive switching median filters

In this class of filters, the median filters and switching statistics are merged and switching strategy is developed to classify the pixels into noisy and noise free classes. The RGB pixel is represented as \(f_k(i,j), k=1,2,3\) and window pixels representation and computation of tolerance is followed as in Eqs. (52), (53) and (54).

Wang et al. (2010) introduced Modified Switching Median Filter (MSMF). The input pixel is convoluted with four convolution kernels \(W_p\) and minimum difference of four convolution \(z_{ij}\) is used for edge detection, can be represented as,

$$\begin{aligned}{} & {} z_{ij}= \{\min { f_k(i,j)\otimes w_p}, p=1,2,3 4 \}, \nonumber \\{} & {} y^{\textrm{MSMF}} ={\left\{ \begin{array}{ll} y^{\textrm{VMF}} &{} \text {for } z_{ij} \ge T,\\ f_k(i,j) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(87)

Lukac et al. (2005b) A Statistically-Switched Adaptive Vector Median Filter (SSAVMF). The window pixels are represented as in Eqs. (52), (53) and inter-channel distance is determined by,

$$\begin{aligned} \Vert x_i - x_j \Vert _2 = \sqrt{\sum _{k=1}^m (x_{ik}- x_{jk})^2}, \end{aligned}$$

these distances are arranged in descending order following Eq. (54) and \(L_{\overline{x}}= \sum _{j=1}^N \Vert \overline{x} - x_j \Vert _2\) where \(\overline{x}\) is the sample mean. The output of SSAVMF is defined as,

$$\begin{aligned} y^{\textrm{SSAVMF}} ={\left\{ \begin{array}{ll} x_{(1)} &{} \text {for } L_{\left( \frac{N+1}{2}\right) } \ge T_2, \\ x_{\left( \frac{N+1}{2}\right) } &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(88)

where \(T_2 = L_x+ \lambda _2 \Psi \) and \(\lambda _2\) is the adjusting parameter in AVMF method.

Noise Adaptive Soft-switching Median Filter (NASWMF) is developed by Eng and Ma (2001). For every image pixel, three level noise detection is performed. A pixel is categorized into four categories as (i) uncorrupted pixel, (ii) isolated impulse noise, (iii) non isolated impulse noise, and (iv) edge pixel using a hierarchical soft-switching noise-detection procedure. The isolated noisy pixels are filtered using the Standard Median (SM) filters and fuzzy logic is used to separate the un-isolated pixels from edge pixels. These pixels are denoised using the Fuzzy Weighted Median (FWM) filters. The output of NASWMF is defined as,

$$\begin{aligned} g(i,j) ={\left\{ \begin{array}{ll} 1 &{} \text {No filtering},\\ 2 &{} \text {SM filtering}, \\ 3 &{} \text {FWM filtering}, \end{array}\right. } \end{aligned}$$
(89)

where ij are pixel coordinates. The SM filter is defined as \(Y(i,j)= \text {Median}\{X(i-s,j-t) | (s,t) \in W \}.\) The associated weight factors are defined as

$$\begin{aligned} w(s,t) ={\left\{ \begin{array}{ll} \frac{\mu (s,t)}{X } &{} \text {for } (s,t)= (0,0),\\ \frac{\mu _c }{X} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(90)

Here, \(X= \sum \mu (s,t) +\mu _c \) and \(\mu _c\) is derived using the parameter K is defined as

$$\begin{aligned} K= \sum _{s,t} \left( \frac{\mu (s,t)}{X }\right) ^2 +\left( \frac{\mu _c }{X} \right) ^2. \end{aligned}$$

Then FWM filter is defined as

$$\begin{aligned}{} & {} Y(i,j)= \text {Median}\{W(i-s,j-t) \diamond \\{} & {} X(i-s,j-t) | (s,t) \in W \}, \end{aligned}$$

where \(\diamond \) represents the duplicate operation.

4.10 Result comparison for switching type filters

Table 2 Comparison of switching filters for color image denoising

The results for switching filters are described in the Table 2. We note that different noise models and levels are considered in the switching type filters. Also, we have summarized the filters in terms where they are utilized as 1st are 2nd.

5 Wavelet filtering methods

In frequency domain based approaches, images are converted into a frequency domain by different transforms. Some of the well known transformations are based on Fourier, sine, cosine or wavelets. In these methods, the transformation is applied to each pixel and its value is converted into a frequency. For color images, commonly known methods are wavelet shrinkage methods introduced in Sun et al. (2017), Sun and Xu (Nov 2015), Saeedi and Abedi (2010), Gai (2018) and Kim et al. (2010). In adaptive threshold algorithm, noisy image is decomposed using the discrete wavelet transformation. At first level, it decompose image into \(\text {LL}, \; \text {LH}, \text {HL}\) and \( \text {HH},\) called low, horizontal high, vertical high and diagonal high frequencies respectively, and \(\text {LL}\) band is further decomposed. Similarly at second level, it produces eight sub images. The wavelet threshold shrinking algorithm is used for image denoising based on information extracted from these frequencies. The shrinkage function is used for extracting the structure information in adaptive wavelet transformation. A threshold range is determined using the structure information such as energy and area.

5.1 Contribution summaries

Sun et al. (2017) introduced the energy based shrinking function. The energy for a square window with width R is defined as,

$$\begin{aligned} S^2_{j,k}= \frac{1}{R^2} \sum _{m=-R}^R \sum _{n=-R}^R d^2{m,n}, \end{aligned}$$
(91)

and shrinkage function is defined as,

$$\begin{aligned} \hat{ d_{j,k}}= \left\{ \begin{array}{ll} d_{j,k}(1- \alpha *\frac{\lambda ^2}{ S^2_{j,k}} ), &{} \quad \text {if } S^2_{j,k}> \beta *\lambda ^2, \\ 0, &{} \quad \text {otherwise}, \end{array}\right. \end{aligned}$$
(92)

where \(\lambda \) is a constant depends on the noise variance and \(\alpha , \beta \) are constants. The image is denoised by adaptive wavelet threshold shrink function. At next step, then image is reconstructed and remaining noise amount is removed using guided filters.

Sun and Xu (Nov 2015) proposed the iterative shrinkage model. The shrinking model is formulated using product of experts model (Hinton 1999). The maximum likelihood is obtained using the iterative gradient method and gradient decent method is interpreted as wavelet shrinkage model and noise as product of experts model. Three components of the algorithm are: patch update, 3D shrinkage over patch groups, and aggregation of patches for image reconstruction, mathematically these steps are represented as,

$$\begin{aligned} Y_k^t= & {} P_k I^t \\ X_k^t= & {} Y_k^t - \sum _{i=1}^k \sum _{j=1}^k w_i \rho _{i,j} (w_i^{\textrm{T}} Y_k^t v_j)v_j^{\textrm{T}}, \\ I^{t+1}= & {} \left( \sum _k P_k P_k^{\textrm{T}}\right) ^{-1} \sum _k P_k^TX_k^t, \end{aligned}$$

where \(t= 0,1,2, \ldots , M-1.\) In discriminative learning, the 3D path matrix is not updated at every iteration and shrinkage function is defined as,

$$\begin{aligned} \rho _{ij}(x) = \frac{\sum _{n=1}^{N_s} G_h(x - x_n^{ij}) y_n^{ij} }{\sum _{k=1}^{N_s} G_h(x - x_k^{ij}) } = \sum _{k=1}^{N_s} \eta _n^{ij} (x) y_n^{ij}. \end{aligned}$$

Here, \(G_h\) is a Gaussian kernel with bandwidth h over discrete samples \((x_n^{ij}, y_n^{ij}).\) The learning of shrinkage function is determined by finding a set of values \(\{y_n^{ij}\}^N_{ n=1}\) over given sample positions \(x^{ij}_ n\) for basis \(w_i\) and \(v_j.\) The stochastic gradient descent algorithm is used to minimize PSNR values and determine the parameters and models are trained for luminance and chrominance channels.

Saeedi and Abedi (2010) proposed the fuzzy rules for channel correlation. A noisy image is decomposed using wavelet transformation and neighborhood coefficients are used for designing two similarity measures, magnitude similarity and spatial similarity. For central coefficients \((y_{s,d}( i, j,c))\) neighborhood coefficients \((y_{s,d }( i + l, j + k,c),\) wavelet subbands of channel c,  fuzzy magnitude similarity are defined as:

$$\begin{aligned} m(l,k,c)P{=} \exp \left( {-} \left( \frac{Y_{s,d}(i,j,c), Y_{s,d}(i{+}l,j{+}k,c) }{\text {Thr} } \right) ^2 \right) . \end{aligned}$$

Similarly, the spatial similarity is

$$\begin{aligned} s(l,k,c)= \exp \left( -\frac{l^2+ k^2 }{N } \right) , \end{aligned}$$

where the threshold is noise dependent \( \text {Thr} = K \times \hat{ \sigma }_n^c\) and \(\hat{\sigma }_c ,\) and adaptive weight is defined as \(w(l,k,c)=s(l,k,c) \cdot s(l,k,c).\) Then, two fuzzy features \((f_1)\) for each wavelet subimages and multichannel denoising fuzzy feature \(f_2\) are defined as:

$$\begin{aligned} f_1(i,j,c){=} \frac{\sum _{l=-k}^k \sum _{k=-k}^k w(l,k,c) | y_{s,d }( i + l, j + k,c) | }{\sum _{l=-k}^k \sum _{k=-k}^k w(l,k,c) }, \end{aligned}$$

and

$$\begin{aligned}{} & {} f_2(i,j){=}\\{} & {} \quad \frac{\sum _{c{=}1}^C( \sum _{l{=}{-}k}^k \sum _{k{=}{-}k}^k ( \prod _{c{=}1}^C w(l,k,c)) | y_{s,d }( i {+} l, j {+} k,c) | )}{\sum _{l{=}{-}k}^k \sum _{k{=}{-}k}^k \left( \prod _{c=1}^C w(l,k,c)\right) }. \end{aligned}$$

Then fuzzy membership functions are designed based on the above measures and the fuzzy decision rule is developed and fuzzy conjunction is implemented. This rule is implemented in each channel for coefficient shrinking. For conjunction, the algebraic product is used and a noise-free image is recovered, and artifacts are removed by fuzzy filters. \(T(i,j,c)= f_1(i,j,c)\) AND \(f_2(i,j).\) Finally, the noise-free image is estimated as \(x_{s,d}= T(i,j,c) \times y_{s,d}(i,j,c),\) where sd are scale and orientation of the image wavelet subband. Then inverse wavelet transformations are used to recover the final image.

The monogenic 2D signal is defined as Riesz transformation defined in \(L^2\) sense as \(\hat{R_v} f(w)= i\frac{w_v}{| w| }\hat{f}(w),\) where f and \(R_v f \) are real-valued \(R^2\) functions. A pair of orthogonal transformation \(R_1 f, R_2f\) is used and

$$\begin{aligned} f_m= f- i R_1 f - j R_2 f= \left( \begin{array}{l} A \cos (\phi ) \\ A \sin (\phi ) \cos (\theta ) \\ A \cos (\phi ) \cos (\theta ) \end{array} \right) , \end{aligned}$$

where amplitude is represented as \(A{=}\sqrt{ f^2 {+} | R_1f |^2 {+} | R_2f| ^2 }\) and phase \(\theta = \text {Arg}\{R_1f, R_2 f\} \in [-\pi , \pi ]\) and local phase \( \phi = \text {arg}\{ f- i | R_1 f | - j | R_2 f| \} \in [ 0, \pi ].\) Then monogenic signal. This definition is extended to color images and curvelet transformation and are extended for color monogenic signals, called Color monogenic curvelet transform (CMCT). These transformations are used for multichannel image denoising by Gai (2018). The 2D curvelet transforms are defined as

$$\begin{aligned} U_j(r, \theta ) = 2^{-\frac{3}{2}} W(2^{- j} r) V \left( \frac{2 \lfloor j/2 \rfloor \theta }{2\pi }\right) . \end{aligned}$$

This transformation represents data that is polar coordinated, where \(U_j\) is in the Fourier domain and xw are spatial and frequency variables, respectively. The polar coordinates are represented by \(r, \theta .\) The \(\lfloor j/2 \rfloor \) is the integral part of \(\frac{j}{2}\) \( r \in [1/2, 1],\) where \(t \in [-1, 1].\) The windows W and V obey admissibility and curvelet transformations are defined as \(\phi _{ j,k,l}(x) = \phi _ j(R_{\theta _t} (x-x_k^{(j,l)})\) with

$$\begin{aligned} R_{\theta } =\left( \begin{array}{cc} \cos (\theta ) &{} \sin (\theta ) \\ -\sin (\theta ) &{} \cos (\theta ) \end{array} \right) , \end{aligned}$$

and \( R_{\theta }^{-1} =R_{\theta }^{\textrm{T}} = R_{-\theta }\) with \(R_{\theta }\) and \(R_{\theta }^1\) are orthogonal rotation matrix and its inverse and inner product of an element \(f \in L^2\) space and curvelet \(\phi _{j,k,l}\) are called curvelet coefficient. As monogenic signals are generalization of one-dimensional analytical signals, similarly monogenic curvelets are the generalization of analytical curvelets. The CMCT is designed by combining real-valued curvelets with Riesz transforms as imaginary parts and a new quaternion-valued transform. The frame of color monogenic curvelet transform is designed by Riesz transforms which projects a frame of real-valued elements into a frame of quaternion-valued elements. The final inverse color monogenic curvelet transformations are applied to restore the noise-free image.

Kim et al. (2010) used wavelet transformation coefficients to establish the alpha map,

$$\begin{aligned} V(x,y)= \frac{1}{PQ}\sum _{x,y \in S} ( f(x,y)- m_{x,y})^2, \end{aligned}$$

where S represents a rectangular region with dimensions PQ,  the pixel values are represented by f(xy) and \(m_{xy}\) is the local mean. The weight for each pixel is computed as \(\alpha (x,y)= \frac{1}{1+ \sigma V(x,y)}\) where uniform distribution is achieved by a tuning parameter \(\sigma .\) The \(\sigma \) is adjusted depending on the noise ratio. The image is divided into two classes, flat (pure noise) and edge regions,

$$\begin{aligned} r_{\textrm{flat}}(m,n)= & {} \alpha W_{\psi }^i(m,n) \quad \text {and} \quad r_{\textrm{edge}}(m,n)\\= & {} (1-\alpha ) W_{\psi }^i (m,n), \end{aligned}$$

for \(\alpha \in [0,1],\) where \(W_{\psi }^i\) is a tri-dimensional subband. The adaptive wavelet shrinking method is implemented to denoise fat region and entropy filters are used to denoise the edge regions. The local entropy of each wavelet coefficients in \(r_{\textrm{edge}}\) edge contains information of local complexity in high-frequency region and regions with higher entropy contains more noise. Thus, noise is removed by shrinking wavelet coefficients with higher entropy. 2D scaling and three 2D directional wavelet functions are generated from a 1D scaling and wavelet function, called symlet functions. The used filter bases are directly proportional to noise ratio.

Lian et al. (2005) proposed minimum cut method with wavelets and represented the wavelet coefficient as \(x_i,\) noisy wavelet coefficient \(\tilde{x_i}\) and noise \(n_i.\) The states are represented by \(s_i\) then MAP estimation by states will be

$$\begin{aligned} J(s)= - \log (P(s{\setminus } \tilde{x})-\sum _{i=1}^N \log (f (\tilde{x_i}{\setminus } s) - \log P(s), \end{aligned}$$
(93)

where conditional probability is defined as \(P(s_i| s)=P(s_i| \delta s_i)\) and the edges \(V_c \) are denoted by a set of potential functions \(V_c(s_c),\) then \(-\log P(s)=\log Z + \sum _{c \in V_c} V_c(s_c)\) where Z is the normalization constant in the above functional.

Stationary wavelet transform and adaptive wiener filter is described by Alwan (2012). This filter can also be used as local Wiener filters. After one-level decomposition of the image, noise is estimated by

$$\begin{aligned} \sigma _n^2= \frac{\text {median}( | Y_i|)}{0.6745}, \end{aligned}$$
(94)

where \(y_i \in \) each subband and 0.6745 is a trial number. In other subband images, soft thresholding is implemented, and general shrinkage rule is \(\sigma _{\lambda }(w)= \text {sig}(w) (| w| - \lambda )_{+}, \) where \(\lambda \) is a threshold and coefficients larger than the threshold are computed The scaling parameter is defined as \(\beta = \sqrt{ \log (L_k/ J)},\) where \(L _k \) is the length of subband at scale k and standard deviation \(T_{NS}= \frac{\beta \hat{\sigma }^2_x}{\hat{\sigma }^2_y}\) is computed for each subband and finally the image is reconstructed using the LL subband only. This algorithm is implemented in each channel separately. These methods assume that the true signals (images) can be reconstructed by linear combination of some basis functions.

5.2 Comparison results of wavelet-based methods

The results for the wavelet-based algorithms are compared in Table 3. We note that the methods considered here all consider Gaussian noise as the standard noise model, albeit with varying noise levels reported in the experimental results.

Table 3 Result comparison for wavelet based filters for color image denoising. All these filters are applied on AWGN

6 Conclusion

Many image denoising algorithms have been developed using various disciplines and methodology to represent the image in image processing system. In this paper, we reviewed the articles for color image denoising using filtering methods published during the last two decades. These methods are classified in multiple classes and subclasses based on the filter type, application domain, filter stages, working methodology and wavelet methods. This review showed that the fuzzy switching filters and quaternion filters are widely used in this field and much less importance is given to the classical methods such as the graphical approaches or order statistics. The experimental results are compared using multiple image quality measures at varying noise levels on standard test images. The effectiveness of the methods are compared with other methods by the standard image quality measures discussed in the literature. The color image denoising filters covered in this review require coding-associated parameters that are dataset dependent and require modifications.

It is observed that in recent era, preferences are given to neural networks and deep learning state-of-the-art algorithms and less attention has been paid to these traditional image denoising algorithms. However, these learning-based models require large-scale paired training data and domain-specific data across natural, medical and remote sensing areas. We envisage hybrid setup wherein the advantages of filtering methods with trained deep learning models can be leveraged to obtain meaningful denoising results on various color image processing problems. Thus, our clear classification of filters will be helpful for researchers working in image processing domain for selecting appropriate filters for their image restoration tasks.