1 Introduction

Synthetic aperture radar (SAR) can be used in a wide variety of applications in the military, geology, scientific discoveries, mapping, and surveillance of Earth. The main advantages of SAR is its ability to operate under diverse weather conditions such as darkness, rain, snow, fog, and dust, where SAR exhibits speckle noise. It must be stressed that speckle is noise-like, but it is not noise; it is a real electromagnetic measurement. Therefore, when the radar scans a uniform surface, the SAR images emerge as dramatic changes in gray, with some resolution cells shown as a dark spot, and others shown as a bright spot, depicting granular ups and downs. The spots rooted in a coherent superposition of the radar echo are called speckle noise.

For any coherent imaging like SAR, such as sonar and ultrasound, despeckling is an important process for image enhancement. Removing speckles and preserving edges are the main goals of enhancement approaches. In general, the despeckling of SAR images is carried out in either the spatial or transformed domain [1]. Despite their low computational complexity, the performance of spatial domain filters is often not as well as the transformed domain algorithms [2]. Wavelet [3] is a well-known multiscale transform that can effectively mitigate point singularities for one-dimensional signals. For linear singularities in images, two-dimensional separable wavelets were used. However, the lack of directionality motivated researchers to propose the curvelet [4] and contourlet [5, 6] methods, which use the multiscale transform followed by the directional filter bank. Their basis functions with wedge-shaped or rectangular support regions provide good sparse representations for high dimensional singularities. Recently, shearlet transform (ST) based on an affine system [7], which can sparsely represent an image and has flexible orientation, has been proposed [8]. This new representation is based on a simple and rigorous mathematical framework that not only provides a more flexible theoretical tool for the geometric representation of multidimensional data, but is also easy to implement. In addition, shearlet exhibits highly directional sensitivity and is spatially localized [8,9,10]. ST has been applied in various practical problems such as total variation for denoising [11], deconvolution [12], SAR despeckling [2, 13], and Bayesian shearlet shrinkage for SAR despeckling via sparse representation [14]. Further, Markarian and Ghofrani [15] proposed a new method based on compressive sensing for speckle reduction of SAR images. However, image processing and video coding have made remarkable progress in recent years [16, 17].

Thresholding is a common method for denoising in the transform domain [18], where finding the optimum threshold value is the main problem. Regarding the methods, VISUShrink [19] obtains the universal threshold value, whereas SUREShrink [20], BayesShrink [21, 22], and bivariate shrinkage (BiShrink) [23,24,25,26] obtain the threshold values adaptively for every subband. Among these approaches, BayesShrink is a well-known method used in the nonsubsampled shearlet transform (NSST) domain [27], and BiShrink functions using the Bayesian estimation theory applied in the wavelet [20, 23, 28], contourlet [24, 29], and shearlet [25] domains are used as well.

In this paper, we compare the performances of BayesShrink, BiShrink, weighted BayesShrink, and weighted BiShrink in NSST and stationary wavelet transform (SWT) domains in terms of subjective and objective image assessment. As BayesShrink tries to find the optimum threshold for every subband, BiShrink uses coefficients named parent to clean up coefficients called child, and the weighted methods consider the coefficients’ noise efficiency, which imply that the subbands in the transform domain may be affected by noise differently. Two models for considering the parent in the NSST domain are proposed. In addition, for both BayesShrink and BiShrink, considering the weighting factor (coefficients noise efficiency) would improve the performance of the corresponding methods as well. The novel Bishrink despeckling method named BI-NSST is developed, and the weighted Bishrink are used in NSST and SWT domains for the first time, where the approaches are named WBI-NSST and WBI-SWT, respectively. Considering the coefficients’ noise efficiency in SWT and NSST to obtain the weighting factor and the optimum threshold value is the main contribution of this paper. However, the performance of three proposed methods in SWT and four new approaches in NSST are compared with five state-of-the-art papers ([2, 13, 15, 27, 30]) in terms of subjective and objective image evaluations when artificially speckled and real SAR images are denoised.

The paper is organized as follows: Section 2 presents the preliminary of the speckle noise model, the BayesShrink and BiShrink methods, as well as noise estimations and signal variances. Further, ST and the following NSST are explained in Section 3. The proposed methods based on BiShrink in the NSST domain are explained in Section 4. Section 5 shows the experimental results and finally, Section 6 concludes the paper.

2 Threshold-based SAR image despeckling

Determining the optimum threshold value is the main problem in any thresholding-based method. VISUShrink [19] obtains the universal threshold value whereas SUREShrink [20], BayesShrink [21, 22, 28], and BiShrink [23,24,25,26, 31] determine the adaptive threshold values for every subband. In the following, the speckle noise model is introduced, the BayesShrink and BiShrink methods are explained, and finally, the median for power noise estimation in the transformed domain is expressed.

2.1 Speckle noise model

In general, for any coherent imaging systems such as SAR, multiplicative speckle noise degrades the image. The speckle noise is modeled as,

$$ {I}_y={I}_x+{I}_x{N}_s $$
(1)

where I x and I y refer to the noise free signal and the observed signal, and N s is speckle noise in spatial domain. Writing Eq. (1) as, I y  = I x (1 + N s ). in order to convert the multiplicative noise into additive noise, the homomorphic framework is used, which means using the logarithm transform before processing and the exponential transform at the end (see Fig. 1),

$$ y=x+n $$
(2)

where y = log(I y ), x = log(I x ), and n = log(1 + N s ) are in order noisy signal, noise free signal, and additive noise.

Fig. 1
figure 1

Block diagram of BiShrink despeckling in transformed domain

2.2 BayesShrink denoising in transformed domain

As said above, the multiplicative noise is converted into the additive noise by using logarithm. So, applying a linear transform to Eq. (2), we have,

$$ {Y}_k={X}_k+{N}_k $$
(3)

where k refers to the decomposition subband, Y k , X k , and N k are noisy, clean, and noise coefficients in order. The goal of Bayesian denoising method [21] is estimating \( {\widehat{X}}_k \), from the observed data, Y k . In order to simplify the notations, the superscript k, which indicates the subband, is dropped off (i.e., Y = X + N). For Eq. (3), Bayesian maximum a posteriori (MAP) estimator is [5, 23, 32],

$$ \widehat{X}(Y)=\arg \underset{X}{\max \Big[}{p}_{X\mid Y}\left(X|Y\right)\Big] $$
(4)

According to the Bayes rule, the conditional probability density function (PDF) is \( p\left(X|Y\right)=\frac{p\left(Y|X\right)p(X)}{p(Y)} \); therefore, ignoring p(Y) because of being constant, \( \widehat{X}(Y) \) is,

$$ \widehat{X}(Y)=\arg \underset{X}{\max}\left[{p}_{Y\mid X}\left(Y|X\right){p}_X(X)\right]=\arg \underset{X}{\max}\left[{p}_N\left(Y-X\right){p}_X(X)\right] $$
(5)

where p X is the prior distribution of the noise free coefficients and p N is the noise PDF assumed zero-mean Gaussian withvariance \( {\sigma}_N^2 \) [13, 27], \( {p}_N(N)=\frac{1}{\sigma_N\sqrt{2\pi }}\exp \left(-\frac{N^2}{2{\sigma}_N^2}\right) \).

By applying the logarithm function, Eq. (5) is written as,

$$ \widehat{X}(Y)=\arg \underset{X}{\max}\left[-\frac{{\left(Y-X\right)}^2}{2{\sigma}_N^2}+f(X)\right] $$
(6)

where f(X) = log(p X (X)) [33]. Finding \( \widehat{X} \) is equivalent to solve \( \frac{Y-\widehat{X}}{\sigma_N^2}+{f}^{\prime}\left(\widehat{X}\right)=0 \).

If p X (X) is assumed Gaussian with zero mean and variance σ2, then \( f(X)=-\log \left(\sqrt{2\pi}\sigma \right)-{X}^2/2{\sigma}^2 \) and the estimated \( \widehat{X}(Y) \) is [23, 34],

$$ \widehat{X}(Y)=\frac{\sigma^2}{\sigma^2+{\sigma}_N^2}Y $$
(7)

If p X (X) is assumed Laplace with zero mean and variance σ2, \( {p}_X(X)=\frac{1}{\sqrt{2}\sigma}\exp \left(\frac{-\sqrt{2}\mid X\mid }{\sigma}\right), \) then \( f(X)=-\log \left(\sigma \sqrt{2}\right)-\sqrt{2}\mid X\mid /\sigma \) and the estimated \( \widehat{X}(Y) \) is [23],

$$ \widehat{X}(Y)=\operatorname{sign}(Y){\left(|Y|-\frac{\sqrt{2}{\sigma}_N^2}{\sigma}\right)}_{+} $$
(8)

Eq. (8) is the classical soft shrinkage function [33] defined as,

$$ \mathrm{soft}\left(g,\tau \right)=\operatorname{sign}(g){\left(|g|-\tau \right)}_{+} $$
(9)
$$ {\left(|g|-\tau \right)}_{+}=\left\{\begin{array}{c}\mid g\mid -\tau \kern0.5em \mathrm{if}\kern0.5em \mid g\mid \ge \tau \\ {}0\begin{array}{cc}& \end{array}\kern0.5em \mathrm{if}\kern0.5em \mid g\mid <\tau \end{array}\right. $$
(10)

Based on the classical soft shrinkage function, Eq. (8) is rewritten \( \widehat{X}(Y)=\mathrm{soft}\left(Y,\frac{\sqrt{2}{\sigma}_N^2}{\sigma}\right) \).

2.3 BiShrink denoising in transformed domain

In 2002, a novel denoising method named BiShrink was proposed [23, 35]. In fact, BiShrink is a simple nonlinear shrinkage function in the transformed domain, with subbands known as child and parent. To obtain the threshold value for denoising a child subband, BiShrink also uses the coefficients of the parent subband.

Suppose X1 and X2 as child and parent coefficients in transformed domain, then the vector form of Eq. (3) is,

$$ Y=X+N $$
(11)

where X = (X1, X2), Y = (Y1, Y2), and N = (N1, N2). Similar to the Bayesian MAP estimator explained in Section 2.2, the PDFs of the clean coefficients p X (X) and the noise coefficients p N (N) are to be known for estimating the noise free coefficients \( \widehat{X}(Y) \) from the observed data. In literatures [23, 24, 36], noise is often assumed Gaussian,

$$ {p}_N(N)=\frac{1}{2{\pi \sigma}_N^2}\exp \left(-\frac{N_1^2+{N}_2^2}{2{\sigma}_N^2}\right) $$
(12)

and in [23], the noise-free coefficients p X (X) were given by,

$$ {p}_X(X)=\frac{3}{2{\pi \sigma}^2}\exp \left(-\frac{\sqrt{3}}{\sigma}\sqrt{X_1^2+{X}_2^2}\right) $$
(13)

According to Eq. (6), the bivariate MAP estimator is,

$$ \widehat{X}(Y)=\arg \underset{X_1,{X}_2}{\max}\left[-\frac{{\left({Y}_1-{X}_1\right)}^2}{2{\sigma}_N^2}-\frac{{\left({Y}_2-{X}_2\right)}^2}{2{\sigma}_N^2}+f(X)\right] $$
(14)

Estimating \( {\widehat{X}}_1 \) and \( {\widehat{X}}_2 \) needs solving equations, \( \frac{Y_1-{\widehat{X}}_1}{\sigma_N^2}+{f}_1\left(\widehat{X}\right)=0 \) and \( \frac{Y_2-{\widehat{X}}_2}{\sigma_N^2}+{f}_2\left(\widehat{X}\right)=0 \) where f1 and f2 represent the derivative of f(X) with regard to X1 and X2, respectively. According to Eq. (13), we have \( f(X)=\log \left(\frac{3}{2{\pi \sigma}^2}\right)-\frac{\sqrt{3}}{\sigma}\sqrt{X_1^2+{X}_2^2} \), \( {f}_1(X)=\frac{df(X)}{dX_1}=-\frac{\sqrt{3}{X}_1}{\sigma \sqrt{X_1^2+{X}_2^2}} \), and \( {f}_2(X)=\frac{df(X)}{dX_2}=-\frac{\sqrt{3}{X}_2}{\sigma \sqrt{X_1^2+{X}_2^2}} \). Under assuming that the noise power of different subbands is invariant [36], i.e., \( {\sigma}_{N_1}={\sigma}_{N_2}={\sigma}_N \), and defining \( r=\sqrt{X_1^2+{X}_2^2} \), we have:

$$ {Y}_1={\widehat{X}}_1\left(1+\frac{\sqrt{3}{\sigma}_N^2}{\sigma \kern0em r}\right)\kern0.1em $$
(15)
$$ {Y}_2={\widehat{X}}_2\left(1+\frac{\sqrt{3}{\sigma}_N^2}{\sigma \kern0em r}\right) $$
(16)
$$ r={\left(\sqrt{Y_1^2+{Y}_2^2}-\frac{\sqrt{3}{\sigma}_N^2}{\sigma}\right)}_{+} $$
(17)

Substituting Eq. (17) into Eq. (15), the joint shrinkage function is [36],

$$ {\widehat{X}}_1(Y)=\frac{{\left(\sqrt{Y_1^2+{Y}_2^2}-\frac{\sqrt{3}{\sigma}_N^2}{\sigma}\right)}_{+}}{\sqrt{Y_1^2+{Y}_2^2}}{Y}_1 $$
(18)

In BiShrink denoising method based on Eq. (18), dead zone region where \( {\widehat{X}}_1(Y)=0 \) is,

$$ \mathrm{Dead}\kern0.17em \mathrm{zone}=\operatorname{}\left({Y}_1,{Y}_2\right):\left(\sqrt{Y_1^2+{Y}_2^2}\le \frac{\sqrt{3}{\sigma}_N^2}{\sigma}\right\} $$
(19)

To compare the NSST and SWT BiShrink functions and to observe the circular shape dead zone, we have chosen the “Barbara” image of size 512 × 512 pixels and 256 Gy levels. The test image is corrupted by the additive Gaussian noise with two noise levels (σ N  = 10,  30), and the corresponding dead zone region based on Eq. (19) are shown in Fig. 2. As shown in Eq. (19) and observed in the image, a direct relationship exists between the radius of the circular-like dead zone and the noise power. Since NSST is sparser than SWT, it yields the smaller radius of dead zone.

Fig. 2
figure 2

af Dead zone of BiShrink for Barbara in SWT and NSST domains and for two noise variances

2.4 Estimating noise and signal variance

The noise and signal variance \( {\widehat{\sigma}}_N^2 \) and \( {\widehat{\sigma}}^2 \), respectively, are required for the implementation both the BayesShrink [21] and BiShrink [23] algorithms. Using the median estimator [34] in the transformed domain for noise variance estimation is common:

$$ {\widehat{\sigma}}_N=\frac{\mathbf{median}\kern0.1em \left(|{Y}_{\mathrm{\ell}}|\right)}{0.6745} $$
(20)

where Y refers to the coefficients of ℓth decomposition level. For any arbitrary subband in the transformed domain, \( {\sigma}_Y^2={\sigma}^2+{\sigma}_N^2 \). Since the observed signal in the transformed domain is modeled as zero mean [27], the variance of each subband is obtained by

$$ {\widehat{\sigma}}_Y^2=\frac{1}{M}\sum \limits_{i,j=1}^M{\left({Y}_{\mathrm{\ell}}^2\right)}_{i,j} $$
(21)

where M is the size of square-shaped window. The variances of every window are obtained, and the average value is computed. Using \( {\widehat{\sigma}}_Y^2 \) and \( {\widehat{\sigma}}_N^2 \), the signal standard deviation \( \widehat{\sigma} \) is [33]

$$ \widehat{\sigma}=\sqrt{{\left({\widehat{\sigma}}_Y^2-{\widehat{\sigma}}_N^2\right)}_{+}} $$
(22)

3 Nonsubsampled shearlet transform (NSST)

A novel multi-scale directional representation system called shearlets was proposed in 2005 [7]. Two properties, multi-resolution and sparsity, render the ST attractive in science and engineering [10, 11, 37]. In the following, the continuous and discrete STs are explained briefly.

The continuous shearlet transform for an arbitrary signal f is:

$$ {S}_f\left(a,s,t\right)=\left\langle f,{\psi}_{ast}\right\rangle $$
(23)

where a ∈ R+, s ∈ R, and t ∈ R2 refer to the scaling, shearing, and translation parameters, respectively, and the shearlet function is given by \( {\psi}_{as t}(x)={a}^{-3/4}\psi \left({M}_{as}^{-1}\left(x-t\right)\right) \) and \( {\mathrm{M}}_{\mathrm{as}}=\left(\begin{array}{cc}1& s\\ {}0& 1\end{array}\right)\left(\begin{array}{cc}a& 0\\ {}0& \sqrt{a}\end{array}\right)=\left(\begin{array}{cc}a& \sqrt{a}s\\ {}0& \sqrt{a}\end{array}\right) \).

The discrete version of ST [10] used in digital signal processing is

$$ {S}_f\left(j,\mathrm{\ell},k\right)=\left\langle f,{\psi}_{j,\mathrm{\ell},k}\right\rangle $$
(24)

where j, ℓ ∈ Z, k ∈ Z2, ψj, ℓ, k = |detA0|j/2ψ(B0jA0x − k), and \( {\mathrm{A}}_0=\left(\begin{array}{cc}4& 0\\ {}0& 2\end{array}\right) \) and \( {B}_0=\left(\begin{array}{cc}1& 1\\ {}0& 1\end{array}\right) \).

In general, using subsampling operations causes variant shifts in a transform. Therefore, by omitting the up- and down-sampling blocks, SWT [30] in 2003 and NSST [10] in 2008 were proposed. In the nonsubsampled transform version, as the coefficients do not decimate between the decomposition levels, all subband sizes are the same as the original input image. Therefore, SWT and NSST require more computation and storage room in comparison with the conventional WT and ST.

NSST, as a multi-scale directional representation, is able to give a good sparse representation of an image. To show the sparsity of NSST in contrast with SWT, the Barbara of size 512 × 512 pixels and 256 Gy levels is used. The original test image and the one corrupted by Gaussian noise are decomposed by SWT and NSST into three levels. As shown in Fig. 3, for SWT in each decomposition level, there are three subbands, whereas for NSST there are 16, eight, and four subbands at the first, second, and third levels, respectively. Figure 3 shows the histogram of SWT (1st and 4th subbands) and NSST (9th and 1st subbands, all subbands to the next coarser) coefficients. The histograms in Fig. 3 conclude that the NSST is sparser than the SWT for both noisy and noise-free images. Sparsity, which means that most coefficients are approximately zero, plays a key role in any thresholding algorithm. The sparsity is a typical characteristic of the transform domain where noise is uniformly spread throughout all coefficients, and the data is represented by a small subset of big coefficients [38]. Therefore, coefficients with small magnitudes can be considered as noise and set to zero. The approach in which each coefficient is compared with a threshold in order to decide whether it constitutes a desired part of the original data is called the thresholding approach. Obviously, using a thresholding approach in a sparser transform is outstanding; since the basic functions of the NSST have multi-directional wedge-shaped support regions [2], it provides better sparse representation than the SWT.

Fig. 3
figure 3

Histogram for noise-free and noisy (σ N 2 = 0.05) coefficients in SWT and NSST domains

In this paper, in addition to showing the histograms, we also compute the average standard deviation (Sd) for an objective comparison between NSST and SWT in terms of sparsity. In this regard, the Sd parameter of all normalized subbands are obtained and the average Sd for noise-free and noise-variant “Lena” and Barbara images with sizes 512 × 512 are presented in Table 1. Increasing the power noise corresponds to a bigger Sd value. However, for both test images, and under different noise powers, the NSST is sparser than the SWT. Therefore, as anticipated, the performance of the threshold-based denoising filter is better in the NSST domain than the SWT domain.

Table 1 The Sd parameter for noise-free image and three different noise power

4 Proposed methods

In the first part of this section, the image assessment parameters to evaluate denoising methods are explained, and the mutual information (MI) to measure the statistical dependency between a child and its corresponding parent coefficients is expressed. Subsequently, the models for BiShrink in the transformed domain are proposed and the weighted BiShrink in the NSST and SWT domains are applied for the first time.

For Figs. 4 and 5, we have used eight test images, named Barbara, Lena, House, Boat, Goldhill, Fingerprint, Cameraman, and Peppers, of size 512 × 512 pixels and 256 Gy levels. To achieve measurement reliability in a noisy environment, the algorithm was run 10 times for the processing of every image; therefore, the average values for 80 (i.e., 8 × 10) completely independent trials are obtained.

Fig. 4
figure 4

MI for a SWT and b NSST

Fig. 5
figure 5

(a1, b1) The numbered subbands, (a2, b2) average MSEs for three noise variances where eight test images were used, (a3, b3) the computed weighting factor for each decomposition level and every subband in SWT and NSST

Figure 1 shows the block diagram of the BiShrink despeckling method in the NSST and SWT domains. In general, the BiShrink denoising algorithm consists of a three-step process: estimating \( {\widehat{\sigma}}_N \) for every subband according to Eq. (20), estimating \( {\widehat{\sigma}}_Y^2 \) and \( {\widehat{\sigma}}^2 \) based on Eqs. (21) and (22), and obtaining the noise-free coefficients using Eq. (18).

4.1 Image assessment parameters

Among the image assessment parameters used to evaluate the performance of a despeckling algorithm, in this paper, we have chosen the peak signal-to-noise ratio (PSNR) [39] and structural similarity (SSIM) [40] as full references and equivalent number of looks (ENL) [13], respectively; and mean square difference (MSD) [41] and edge save index (ESI) [13] as no references.

PSNR measures an image quality:

$$ \mathrm{PSNR}=20{\log}_{10}\left(\frac{256}{\sqrt{MSE}}\right) $$
(25)

where \( \mathrm{MSE}=\frac{1}{mn}\sum \limits_{i=1}^m\sum \limits_{j=1}^n{\left[{\widehat{I}}_x\left(i,j\right)-{I}_x\left(i,j\right)\right]}^2 \) is the mean square error (MSE), mn is the image size, and I x and \( {\widehat{I}}_x \) are the input and retrieved images, respectively (see Fig. 1 for the mentioned notations).

The SSIM index measures the similarity between the original and the despeckled image through a local statistical analysis (i.e., mean, variance, and covariance between the unfiltered and despeckled pixel values from the sliding window). SSIM ∈ (‐1, 1) and a bad similarity between the original and the despeckled image corresponds to SSIM →  ‐ 1, whereas a good similarity will be indicated by values SSIM → 1.

ENL and MSD both measure the speckle suppression:

$$ \mathrm{ENL}=\frac{{\overline{{\widehat{I}}_x}}^2}{\mathrm{NV}} $$
(26)
$$ \mathrm{MSD}=\frac{1}{mn}\sum \limits_{i=1}^m\sum \limits_{j=1}^n{\left[{\widehat{I}}_x\left(i,j\right)-{I}_y\left(i,j\right)\right]}^2 $$
(27)

where \( \mathrm{NV}=\frac{1}{mn}\sum \limits_{i=1}^m\sum \limits_{j=1}^n{\left[{\widehat{I}}_x\left(i,j\right)-\overline{{\widehat{I}}_x}\right]}^2 \) and \( \overline{{\widehat{I}}_x}=\frac{1}{mn}\sum \limits_{i=1}^m\sum \limits_{j=1}^n{\widehat{I}}_x\left(i,j\right) \). As ENL carries no information about the image resolution degradation, it is often used jointly with other parameters such as MSD. Large values for ENL and MSD indicate significant filtering. As ENL is to be computed over a uniform region, the image is divided into cells of size 16 × 16 and 25 × 25 pixels, where the ENL is computed for every block and finally, averaged to obtain the ENL value.

The ESI [13] reflects the edge preservation capability of a despeckling technique and is measured in both the horizontal and vertical directions:

$$ {\mathrm{ESI}}^h=\frac{\sum \limits_{i=1}^m\sum \limits_{j=1}^{n-1}\mid {\widehat{I}}_x\left(i,j+1\right)-{\widehat{I}}_x\left(i,j\right)\mid }{\sum \limits_{i=1}^m\sum \limits_{j=1}^{n-1}\mid {I}_y\left(i,j+1\right)-{I}_y\left(i,j\right)\mid } $$
(28)
$$ {\mathrm{ESI}}^v=\frac{\sum \limits_{j=1}^n\sum \limits_{i=1}^{m-1}\mid {\widehat{I}}_x\left(i+1,j\right)-{\widehat{I}}_x\left(i,j\right)\mid }{\sum \limits_{j=1}^n\sum \limits_{i=1}^{m-1}\mid {I}_y\left(i+1,j\right)-{I}_y\left(i,j\right)\mid } $$
(29)

MI [25, 42,43,44], a parameter for measuring the dependency between X1 as the child and X2 as the parent, is expressed as

$$ I\left({X}_1,{X}_2\right)=\sum \limits_{x_1\in X}\sum \limits_{x_2\in {X}_2}p\left({x}_1,{x}_2\right)\mathbf{\log}\frac{p\left({x}_1,{x}_2\right)}{p\left({x}_1\right)p\left({x}_2\right)} $$
(30)

where (x1, x2) is a pair of random variables with joint p(x1, x2) and marginal p(x1) and p(x2) PDFs. The MI or I (X1, X2) is zero if the child and parent are totally independent. However, increasing the MI means more dependency between the child and parent coefficients in the transformed domain. Therefore, the best child-parent coefficients are those that are totally dependent (i.e., the most positive MI).

4.2 Parent and child coefficient models

Although a fair amount of research on image denoising in the transformed domain has been carried out [31, 35], thresholding due to simplicity is still attractive [19, 20]. However, thresholding in a bivariate MAP that exploits the dependency between coefficients [25, 45] gives appropriate results.

The implementation of BiShrink for denoising, based on Eq. (18), requires a child coefficient X1 and its parent coefficient X2. In the SWT transformed domain, for an arbitrary coefficient considered as child, X1, three parents named X2(N), X2(SS), and X2(NC) can be considered where X2(N) refers to the neighbor subband at the same level, X2(SS) is a subband at the same orientation but at the next coarser level, and X2(NC) denotes all the subbands that belong to the next coarser level. For example, by noticing the numbered SWT subbands shown in Fig. 3, if the 1st subband is X1, the 2nd or 3rd subband can be considered as X2(N), the 4th subband is X2(SS), and all 4th–6th subbands are X2(NC). Similarly, in the NSST transformed domain, for an arbitrary child X1, three parents called X2(N), X2(NC), and X2(OPP) are considered, in which X2(OPP) refers to a subband at the same level as X1 but in the opposite orientation. For example, by noticing the numbered subbands for the NSST shown in Fig. 3, if the 9th subband is X1, the 8th or 10th subband can be considered as X2(N), and all 17th–24th subbands are X2(NC), and the 1st subband is X2(OPP). For the sample image “Zone Plate,” the child-parent in the SWT and NSST domains are shown in Fig. 6.

Fig. 6
figure 6

Child-parent from sample image ‘Zone Plate’ with sizes 512 × 512 pixels in SWT and NSST domains

As mentioned in Section 4.1, the best child-parent coefficients are those that are totally dependent (the most value of MI). We now use the eight test images mentioned and add Gaussian noise with zero mean and standard deviation σ N  = 30. Noisy images are decomposed by SWT and NSST into three levels, and the MI for different subbands as a child considering the three introduced models are obtained. As expected and shown in Fig. 4, the MI for both transformed domains is not zero because of the dependency between a child and the inter- or intra-subbands. Figure 4 shows that I(X1, X2(NC)) > I(X1, X2(SS)) > I(X1, X2(N)) for the SWT, and I(X1, X2(NC)) > I(X1, X2(N)) > I(X1, X2(OPP)) for the NSST. In this paper, we use X2(SS) as the classical parent in the SWT domain, based on previous studies [23, 26, 28, 35]. For the NSST domain, we propose the following two following:

  • Model 1: considering X2(OPP) as the parent, the method is named BI-NSST (1).

  • Model 2: considering X2(NC) as the parent, the method is named BI-NSST (2).

Although model 1 was proposed for the ST domain [25], we used it in the NSST domain as well. In addition, proposing model 2 in the NSST domain according to the MI shown in Fig. 4 is the contribution herein, where model 2 is expected to outperform model 1 in the NSST domain.

4.3 Weighted shrinkage method

Most shrinkage denoising techniques, including BiShrink [23, 25, 31, 35], assume that the noise power for different subbands are the same. Although this assumption for WT [23] is true, it was shown to be not entirely correct for the nonsubsampled contourlet transform [42] and NSST [13, 27] (i.e., noise power of different subbands are not the same). Although the weighted BayesShrink in the NSST domain [27] was previously used, the weighted BiShrink in the NSST domain, called the WBI-NSST method, is proposed in this paper. To show the validity of the assumption above, the original noise-free test images and the noisy images are decomposed to three levels by SWT and NSST, and the MSE of noise-free and noisy image coefficients are obtained for each decomposition level ℓ and subband k,

$$ {\mathrm{MSE}}_{\mathrm{\ell},k}=\frac{1}{mn}\sum \limits_{i=1}^m\sum \limits_{j=1}^n{\left[{Y}_{\mathrm{\ell},k}\left(i,j\right)-{X}_{\mathrm{\ell},k}\left(i,j\right)\right]}^2 $$
(31)

As the nonsubsampled version is used, all subband sizes are the same as the input image (i.e., mn). The MSE of all subbands for the eight mentioned test images in the SWT and NSST domains is obtained, and the average values are shown in Fig. 5. As expected, all subbands of the SWT (Fig. 5a2) are affected by noise approximately equal to the subbands in NSST where some subbands are more robust against noise than others (Fig. 5b2).

According to Eq. (18), the BiShrink threshold values [36] named TBI for every decomposition level ℓ and subband k is

$$ {\mathrm{TBI}}_{\mathrm{\ell},k}=\frac{\sqrt{3}{\sigma}_N^2}{\sigma_{X_{\mathrm{\ell},k}}} $$
(32)

where \( {\sigma}_{X_{\mathrm{\ell},k}}^2 \) (or σ2) is the power of the noise-free signal in the transformed domain. In this paper, the noise variance is approximated by the robust median estimator of Eq. (20), and the power of the noise-free signal is estimated using Eq. (22); thus, the Bishrink weighted threshold named TWBI is

$$ {TWBI}_{\mathrm{\ell},k}={\alpha}_{\mathrm{\ell},k}{TBI}_{\mathrm{\ell},k} $$
(33)

where α is the weighting factor that depends on the decomposition level ℓ and subband k, expressed as

$$ {\alpha}_{\mathrm{\ell},k}=\frac{{\mathrm{MSE}}_{\mathrm{\ell},k}}{\overline{{\mathrm{MSE}}_{\mathrm{\ell}}}} $$
(34)

where \( \overline{{\mathrm{MSE}}_{\mathrm{\ell}}} \) is the average MSE of all subbands that belong to the same level, \( \overline{{\mathrm{MSE}}_{\mathrm{\ell}}}=\frac{1}{K_{\mathrm{\ell}}}\sum \limits_{k=1}^{K_{\mathrm{\ell}}}{\mathrm{MSE}}_{\mathrm{\ell},k} \) and K is the number of subbands for the ℓ − th level decomposition.

Using the weighting factor, αℓ, k results in the optimum threshold value in Eq. (33), which is then applied to the coefficients as the soft thresholding in Eq. (18). The corresponding methods are named WBI-SWT, WBI-NSST(1), and WBI-NSST(2).

Obtaining the values of αℓ, k is the main challenge for implementing the weighted Shrinkage method. In this paper, two approaches for determining the weighting factor, αℓ, k, for test images and real SAR images are proposed. For the test images, the MSE between the noiseless and noisy coefficients gives the optimum weighting factor, αℓ, k at each decomposition level and for every subband as well, see Fig. 5a3, b3. In real applications, including SAR images, clean or noise-free signals are nonexistent. Therefore, a white or a flat image (whose pixels have the same gray scale) with the same size as the input image (512 × 512 pixels herein) is used as the noiseless signal. As mentioned above and with regard to Eq. (34), the MSE between the noiseless and noisy coefficients gives the optimum weighting factor, αℓ, k at each decomposition level and for every subband as well, see Fig. 7. According to Fig. 5a3, b3, we conclude that the values of the weighting factor are irrespective of either image type or noise variance, but depend on the transform or the coefficients’ noise efficiency. In the next section, the obtained weighting factors, αℓ, k shown in Fig. 7, are used for any weighted versions of shrinkage methods in the NSST or SWT domains when the purpose is to despeckle either the test images or the true SAR images.

Fig. 7
figure 7

Obtained weighting factor based on a flat image and for 80 independent trials: a SWT, and b NSST

5 Experimental results and discussion

In this paper, we used the images Barbara, Lena, House, Boat, Goldhill, Fingerprint, Cameraman, and Peppers of size 512 × 512 pixels and 256 Gy levels as the test images: images Farmland, Peninsula, and Shipping Terminal [46] of sizes 500 × 500 pixels and Aircraft [47] of size 512 × 512 pixels as the real SAR images. The three images (Farmland, Peninsula, and Shipping Terminal) [46] are part of the SAR images collected by RADARSAT-1 in the Fine Beam 2 mode on June 16, 2002. Most of the illuminated scenes was in Delta, British Columbia, Canada. The radar was operating in the C-band with HH polarization. These three parts have good image characteristics such as having grains, as well as many high- and low-frequency parts. The mini SAR image “Aircraft” [47] was collected from the Kirtland AFB region on August 27, 2007 in the Ka-band and Ku-band.

An input signal is decomposed into three levels using SWT and NSST. According to the numbered subbands shown in Fig. 5a1, b1, the SWT has three subbands in each level whereas the NSST has 16, eight, and four subbands for the 1st, 2nd, and 3rd decomposition levels, respectively. The block diagram in Fig. 1 indicates that the homomorphic framework (using the logarithm function at the first and exponential at the end) is used for both test images and real SAR images. The method to obtain the threshold value for the shrinkage methods and the weighted versions are explained in detail in Section 4.3.

Here, in the SWT domain, we evaluate the performance of BayesShrink (B-SWT) [30], weighted BayesShrink (WB-SWT), BiShrink (BI-SWT), and weighted BiShrink (WBI-SWT); in the NSST domain, we present the results achieved from BayesShrink (B-NSST) [2], weighted BayesShrink (WB-NSST) [27], BiShrink based on model 1 and model 2 ((BI-NSST(1), BI-NSST(2)), and weighted BiShrink based on model 1 and model 2 ((WBI-NSST(1), WBI-NSST(2)) in terms of subjective and objective criteria. Furthermore, two methods in the NSST domain, GΓD ‐ NSST [13], NIG ‐ NSST [13], and a high-order total variation method based on compressive sensing called High-TV [15] are also applied to compare the achieved performance among the proposed methods and state-of-the-art papers under the same circumstances. For this purpose, we had to implement all the methods ([2, 13, 15, 27, 30]).

For Barbara as a sample test image, which is corrupted by speckle noise with variance \( {\sigma}_N^2=0.1 \), the image results in the SWT and NSST domains are shown in Fig. 8. The objective full reference parameters [39, 40] (PSNR, SSIM) are obtained (for average values, the algorithm was run 30 times) for the test images and presented in Tables 2 and 3. Although all methods in the NSST domain outperform those in the SWT domain, no considerable improvement of weighted versions over the direct ones (for example WB-NSST versus B-NSST) was observed for artificial speckle noise.

Fig. 8
figure 8

a Barbara as the original test image, b noisy image (\( {\sigma}_N^2=0.1 \)), cf despeckled images by B-SWT [30], WB-SWT, BI-SWT, WBI-SWT, and gl despeckled images by B-NSST [2], WB-NSST [27], BI-NSST(1), WBI-NSST(1), BI-NSST(2), and WBI-NSST(2)

Table 2 Two objective assessment parameters for Barbara to compare with methods in SWT and NSST domains
Table 3 Obtained PSNRs for denoising the eight test images in NSST domain when \( {\sigma}_n^2=0.1 \). The algorithm was run 30 times for every image, and the average PSNR is reported

Using Air Craft of size 512 × 512 pixels as a real SAR image (see Fig. 9) shows that the proposed methods in the NSST domain is better than the approaches in the SWT domain based on visual qualification, i.e., noise reduction and edge preservation. Table 4, according to four no reference parameters, also proves that for “Peninsula” of size 512 × 512 pixels as a real SAR image, not only do the methods in the NSST domain outperform those in the SWT domain but also the performance of the weighted versions are significantly better than the directed ones; for example, see WBI-NSST(2) and BI-NSST(2).

Fig. 9
figure 9

a original SAR image Air Craft with size 512 × 512, be despeckled images by B-SWT [30], WB-SWT, BI-SWT, WBI-SWT, and fk despeckled images by B-NSST [2], WB-NSST [27], BI-NSST(1), WBI-NSST(1), BI-NSST(2), and WBI-NSST(2)

Table 4 Four no reference parameters for Peninsula SAR image to compare with methods in SWT and NSST domains

According to the results above for artificial speckle noise, and the shrinkage methods in the SWT domain, in the following, we only considered the real SAR images, using four proposed methods in the NSST domain, and compared them with [13] and [15]. While the visual results shown in Fig. 10 do not distinguish the methods precisely, the no reference parameters presented in Table 5 indicate that WBI-NSST(2) is the best approach. Since the ENL is one of the very important parameters that indicate speckle suppression in real SAR images, in this paper, we compute this parameter in two ways: (1) splitting an image into blocks with sizes 16 × 16 and 25 × 25 pixels for the real SAR images with size 500 × 500 and 512 × 512, respectively, obtaining the ENL for every block, and then writing the mean value ENL in Tables 4 and 5; (2) considering a homogeneous region of size 50 × 50 pixels and obtaining the ENL value named as ENLh, see Table 6.

Fig. 10
figure 10

Real SAR images called Farmland, Peninsula, and Shipping Terminal are shown on the first row, the results for High-TV [15], GΓD ‐ NSST [13], NIG ‐ NSST [13], our proposed methods, BI-NSST(1), WBI-NSST(1), BI-NSST(2), and WBI-NSST(2) are shown in order from rows 2 to 7

Table 5 Four no reference parameters to compare the performance of our proposed methods and references [13] and [15] for despeckling a real SAR image called Farmland
Table 6 ENLh to compare the performance of our proposed methods and references [13] and [15] for despeckling a real SAR image called Farmland, Peninsula, and Terminal Shipping

At the end, Fig. 11 shows the ratio images [2, 48] for Farmland, Peninsula, and Shipping Terminal using four proposed methods in the NSST domain and compared with [13] and [15] by considering \( {I}_{\mathrm{ratio}}={I}_y/{\overset{\frown }{I}}_x \), where I y is a real SAR image and \( {\overset{\frown }{I}}_x \) is the despeckled image. In general, the ratio image provides significant information on speckle suppression and edge preservation. Any geometric structures or details correlated with the original image in the ratio image indicates that some possible relevant information (e.g., edges or bright scatterers) have been removed or modified using the despeckling method. Note that any content within the ratio image apart from the pure speckle indicates that some modification on the nonhomogeneous areas (edges, mainly) has been performed by the filter. An ideal filter would not alter such edges or bright scatterers, and therefore, ratio images would show a pure speckle pattern. Therefore, if the ratio image does not have any structure (completely noisy shape) or does not show any edges or details from the inside, the algorithm is appropriate. In this case, see Fig. 11 that shows the image ratios of our proposed methods in the NSST domain in comparison with references [13] and [15].

Fig. 11
figure 11

Real SAR images called Farmland, Peninsula, and Shipping Terminal are shown on the first row, ratio images of the results for High-TV [15], GΓD ‐ NSST [13], NIG ‐ NSST [13], our proposed methods, BI-NSST(1), WBI-NSST(1), BI-NSST(2), and WBI-NSST(2) are shown in order from rows 2 to 7

6 Conclusions

In this paper, three methods in SWT and four approaches in NSST are developed according to BayesShrink, BiShrink, weighted BayesShrink, and weighted BiShrink. For BiShrink implementation in the NSST domain, two models to choose the child-parent are proposed with regard to the MI parameter. Although the model recommended by the MI value outperforms for synthesized image with highly detailed content, it is not appropriate for true SAR images and synthesized images with many smooth regions. Because, in this study, we showed that any thresholding-based methods in the NSST domain outperform the SWT domain, finding new parameters to choose the suitable child-parent in the NSST is future research work.