1 Introduction

Images are indispensable in science and everyday life. Mirroring the capabilities of our human visual system, it is natural to display observations of the world in graphical form. Images are obtained in areas ranging from everyday photography to astronomy, medical imaging, remote sensing, and microscopy. In each case, there is an underlying object we wish to observe, which is the original or true image. Indeed, this true image is the ideal representation of the observed scene.

Yet, the observation process is never ideal: there is uncertainty in the measurements, such as blur, noise, and other types of degradation. Image restoration aims to recover an estimate of the true image from the degraded observations. The key to being able to solve this problem is proper incorporation of prior knowledge about the original image into the restoration process.

Classical image restoration seeks an estimate of the true image f(x,y) when the point spread function (PSF) h(x,y) is known a priori, so it is assumed that the observed image g(x,y) is the output of a linear spatially invariant system

$$ g(x,y)=h(x,y)\ast f(x,y)+\epsilon(x,y), $$
(1)

where * represents the convolution operation and ε(x,y) the errors. Therefore, it becomes a non-blind deconvolution problem.

The discretized version of model (1) is given by

$$ g=Hf+\epsilon, $$
(2)

where H represents the 2D convolution matrix obtained from the PSF of the imaging system. For more details on this discretization refer to the paper [1].

Image deconvolution is an ill-posed problem. The solution may not depend continuously on the data, may not be unique, or it may not even exist, this means that, in practice, the data g alone is not sufficient to define a unique and satisfactory solution. Practical approaches as regularization theory and Bayesian inversion have been successful for this task [28].

The earliest classical methods for non-blind image deconvolution include the Wiener filter [9, 10] and the Richardson-Lucy algorithm [11, 12]. The Wiener filter works in the frequency domain, attempting to minimize the impact of deconvolved noise at frequencies. It has widespread use as the frequency spectrum of most visual images is fairly well behaved and may be estimated easily. It is simple and effective for some special images. However, it is not stable, has serious ringing artifacts in many applications, and its restoration quality is still limited (results often too blurred in the case of gaussian blurred images in which the kernel is dense).

The Richardson-Lucy algorithm maximizes the likelihood that the resulting image, when convolved with the PSF, is an instance of the blurred image, assuming Poisson noise statistics. A key limitation of this iterative method is the considerable computation it takes to obtain a (visually) stable solution due to the low convergence speed of this type of algorithm.

Also, increasing the number of iterations not only slows down the computational process but also magnifies noise and introduces waves near sharp edges.

Another popular regularization method for denoising and deblurring is the total variation (TV) method. It was introduced to image denoising by Rudin, Osher, and Fatemi [13] and then applied to deconvolution by Rudin and Osher [14]. It preserves edges well but has sometimes undesirable staircase effect, namely, the transformation of smooth regions into piecewise constant regions (stairs), which implied that the finer details in the original image may not be recovered satisfactorily.

Based on the spirit of TV regularization and bilateral filtering, Elad et al. [15, 16] proposed a bilateral TV (BTV) as a new regularization term that is computationally cheap to implement and preserves edges. This term is more robust to noise and can preserve more details, but it tends to remove texture, create flat intensity regions, and new contours that lead to cartoon-like images.

As well as this regularization theory, the supervised Bayesian approach is a method that has been used successfully [1722]. In order to increase stability and overcome some limitations of this method (ringing effect), we propose a novel regularized version for image deconvolution. We consider H1 regularization term incorporating the non-negative constraint of the restored image and the well-known generalized cross validation (GCV) method for choosing suitable regularization parameters. Experiments show that the proposed approach is superior to the Wiener filter, the Richardson-Lucy algorithm, the TV method, and the BTV approach, widely used in literature, in terms of restoration quality and stability.

The remaining of this article is organized as follows. In Section 2, we review the supervised Bayesian approach and then propose our deconvolution variational model. We give the experimental tests and discuss the obtained results in Section 3. The Section 4 concludes the paper.

2 Methods

2.1 Bayesian approach

From this point, the main objective is to infer on f given the forward model (2). In the classical reconstruction problems with the known blur kernel h, we mean to use the Bayes rule:

$$ p(f|g) = \frac{p(g|f)p(f)}{p(g)}\propto p(g|f)p(f) $$
(3)

to obtain what is called the posterior law p(f|g) from the likelihood p(g|f) and the prior p(f), and we can infer on f using this law.

2.1.1 Bayesian estimation with simple prior

The Bayesian inference approach is based on the posterior law:

$$ p(f|g,\theta_{1},\theta_{2}) =\frac{p(g|f,\theta_{1})p(f|\theta_{2})}{p(g|\theta_{1},\theta_{2})} \propto p(g|f,\theta_{1})p(f|\theta_{2}) $$
(4)

where the term p(g|f,θ1) is the likelihood, p(f|θ2) is the prior model, θ=(θ1,θ2) are their corresponding parameters (often called the hyper-parameters of the problem) and p(g|θ1,θ2) is called the evidence of the model. This relation is showed in the following scheme:

2.1.2 Assigning the likelihood p(g|f) and the prior p(f)

We consider the forward model (2) and suppose that we have some prior knowledge about the error term ε. In fact, if we can assign a probability law p(ε), then we can deduce the likelihood term p(g|f).

To account for possible non-stationarity of the noise, we propose to use

$$ p(\epsilon|v_{\epsilon})=\mathcal{N}(\epsilon|0,V_{\epsilon}) \text{~~with~~} {V_{\epsilon}=Dg(v_{\epsilon})}, $$
(5)

where \(v_{\epsilon }=[v_{\epsilon _{1}},\cdots,v_{\epsilon _{M}}]^{T}\phantom {\dot {i}\!}\) contains the unknown variances of the non-stationary noise and Dg is a diagonal matrix whose entries are the M elements of vector vε.

From this, we can define the expression of the likelihood:

$$ p(g|f,v_{\epsilon})=\mathcal{N}(g|Hf,V_{\epsilon}). $$
(6)

To be able to estimate the unknown variances, we assign an Inverse Gamma conjugate prior on \(v_{\epsilon _{i}}\):

$$ p(v_{\epsilon})=\prod_{i} p(v_{\epsilon_{i}})\text{~with~} p(v_{\epsilon_{i}})=\mathcal{I}\mathcal{G}(v_{\epsilon_{i}}|\alpha_{\epsilon_{i}},\beta_{\epsilon_{i}}), \forall i, $$
(7)

where \(\alpha _{\epsilon _{i}}\) and \(\beta _{\epsilon _{i}}\) are two positive parameters. The shape parameter \(\alpha _{\epsilon _{i}}\) controls the height of the probability density function of the Inverse Gamma distribution, and the scale parameter \(\beta _{\epsilon _{i}}\) controls the spread [23].

The next step is to assign a prior to the unknown f. Here too, different approaches can be used, we may have some prior information such as the mean and the variance of the unknown quantity. The objective is to assign a prior law p(f|θ2) in such a way to translate our incomplete prior knowledge on f.

We propose

$$ p(f|v_{f})=\mathcal{N}(f|0,V_{f})\text{~~with~~} V_{f}=Dg(v_{f}). $$
(8)

We also assign an Inverse Gamma conjugate prior on \(v_{f_{j}}\):

$$ p(v_{f})=\prod_{j} p(v_{f_{j}})\text{~with~} p(v_{f_{j}})=\mathcal{I}\mathcal{G}(v_{f_{j}}|\alpha_{f_{j}},\beta_{f_{j}}), \forall j. $$
(9)

2.1.3 Supervised Bayesian approach

We consider ε, vε and vf are known. We can define the expression of the likelihood as

$$ p(g|f,v_{\epsilon})=\mathcal{N}(g|Hf,v_{\epsilon}I). $$
(10)

We propose the prior for f:

$$ p(f|v_{f})=\mathcal{N}(f|0,v_{f}I), $$
(11)

with the supposes of the parameters and hyper-parameters, the joint posterior of all the unknowns becomes

$$ p(f|g,v_{\epsilon},v_{f})\propto p(g|f,v_{\epsilon})p(f|v_{f}) $$
(12)

We can obtain the estimate of f as below:

$$\begin{array}{@{}rcl@{}} \hat{f} &=& \underset{f}{\text{arg}\max}\left\{{p(f|g,v_{\epsilon},v_{f})}\right\} \\ &=& \underset{f}{\text{arg}\max}\left\{\text{exp}(-\frac{1}{2v_{\epsilon}} \parallel g-Hf \parallel_{2}^{2})\text{exp}(-\frac{1}{2v_{f}}\parallel f \parallel_{2}^{2})\right\} \\ &=& \underset{f}{\text{arg}\min}\left\{\frac{1}{2v_{\epsilon}} \parallel g-Hf\parallel_{2}^{2}+\frac{1}{2v_{f}}\parallel f \parallel_{2}^{2}\right\}. \end{array} $$
(13)

So the criterion to be optimized is a quadratic one:

$$\begin{array}{@{}rcl@{}} J(f) &=& \frac{1}{2v_{\epsilon}}\parallel g -Hf\parallel_{2}^{2}+\frac{1}{2v_{f}}\parallel f \parallel_{2}^{2} \\ &\propto& \parallel g -Hf\parallel_{2}^{2} +\lambda_{f}\parallel f \parallel_{2}^{2} \end{array} $$
(14)

where \(\lambda _{f}=\frac {v_{\epsilon }}{v_{f}}\). If we work it out directly, this criterion has a solution:

$$ \hat{f}=(H^{T}H +\lambda_{f}I)^{-1}H^{T}g $$
(15)

Using the singular value decomposition (SVD) of H by assuming \(\mathcal {H}\), the Fourier transform of H, to be circular bloc-circulant (CBC), it can be shown that \(\hat {f}\) in (15) can be computed using the Fourier transform. The result is comparable to the Wiener filter:

$$ \hat{\mathcal{F}}=\frac{\overline{\mathcal{H}}\mathcal{G}}{\overline{\mathcal{H}}\mathcal{H}+\lambda_{f}}, $$
(16)

where \(\mathcal {F}\) and \(\mathcal {G}\) denote the Fourier transforms of f and g, respectively.

2.2 The proposed method

In the situation where ε, vε and vf are known, we can use the MAP estimation. As mentioned, this estimation is unstable and suffers from serious ringing artifacts. To overcome these drawbacks, we propose a regularized MAP method where we minimize an energy functional combined by the mean square error with H1 regularization term:

$$ \underset{f}{\text{arg}\min}\left\{\parallel g-Hf\parallel_{2}^{2}+\lambda_{f}\parallel f \parallel_{2}^{2}+\lambda\parallel \nabla f \parallel_{2}^{2}\right\}, $$
(17)

where ∇=(∇h,∇v) is the gradient operator combined by difference operators along horizontal and vertical directions. The last two terms are regularization terms which ask that f should be smooth in H1 sense.

The staircase effect is partly due to the fact that the used norms are not biased against discontinuous nor continuous functions (e.g., TV-norm). The term \(\parallel \nabla f \parallel _{2}^{2}\) has a strong bias against discontinuous functions, so this model substantially reduces the staircase effect and recover the smooth region’s value in the image.

We use the Neumann boundary condition, a natural choice in image processing [24], to discretize the gradient by a finite difference scheme. This type of boundary condition requires that λf≠0 in the aim to prove the coercivity of the proposed functional in the space H1(Ω), Ω is a bounded Lipschitz domain in \(\mathbb {R}^{2}\), and then the existence of a solution of the minimization problem (17). Numerically, this choice avoids remarkably the apparition of a blurring effect in the resulting images instead of fixing λf at 0, using an empirical reduction rule to set it from large to small. We will give more details on this in the next section.

Solving the problem using Fourier properties results in a simple algorithm, that is easier to implement and takes a very short time to converge.

2.2.1 The convergence behavior

Let E be a Banach space, \(F:E \longrightarrow \mathbb {R}\), and consider the minimization problem

$$ \inf_{u\in E} F(u) $$
(18)

Theorem 1

Let E be a reflexive Banach space and \(F: E \longrightarrow \mathbb {R}\) a sequentially weakly lower semi-continuous and coercive function, then the problem (18) has a solution. Furthermore, if F is strictly convex, then this solution is unique.

Proof

Proving the existence of a solution of problem (18) is usually achieved by the following steps, which constitute the direct method of the calculus of variations:

  • Step 1: One constructs a minimizing sequence unE, i.e., a sequence satisfying \(\lim \limits _{n\longrightarrow +\infty } F(u_{n})=\underset {u\in E} \inf {F(u)}\).

  • Step 2: F is coercive \(\left (\lim \limits _{\parallel u \parallel _{E}\longrightarrow +\infty } F(u)=+\infty \right)\), one can obtain a uniform bound ∥unEC, C>0. E is reflexive, then by the theorem of weak sequential compactness [25] one deduces the existence of u0E and of a subsequence denoted also as (un)n such that \(u_{n} \underset {E}{\rightharpoonup } u_{0}\).

  • Step 3: F is sequentially weakly lower semi-continuous (for all sequence \(x_{n} \underset {E}{\rightharpoonup } x\) we have \(\underset {x_{n} \rightharpoonup x} \varliminf {F(x_{n}}) \geq F(x)\)), so \(\underset {u_{n} \rightharpoonup u_{0}} \varliminf {F(u_{n})} \geq F(u_{0})\), which obviously implies that \(F(u_{0})=\underset {u\in E} \inf {F(u)}\).

F is strictly convex (F(λx+(1−λ)y)<λF(x)+(1−λ)F(y), for all xyE and λ∈]0;1[); therefore, the minimum is unique. □

So, we have

$$ \begin{array}{ccccc} F & : & H^{1}(\Omega) & \to & \mathbb{R} \\ & & f & \mapsto & \parallel g -Hf\parallel_{2}^{2}+\lambda_{f}\parallel f\parallel_{2}^{2}+\lambda\parallel \nabla f \parallel_{2}^{2} \end{array} $$
(19)

Existence:H1(Ω) is a Hilbert Banach reflexive space.∙F is coercive :

$$\begin{array}{@{}rcl@{}} F(f) &\geq& {\lambda}_{f}\parallel f {\parallel}_{2}^{2}+\lambda\parallel \nabla f {\parallel}_{2}^{2} \\ &\geq& \min({\lambda}_{f},\lambda)(\parallel f{\parallel}_{2}^{2}+\parallel \nabla f {\parallel}_{2}^{2}) \\ & \geq & \alpha \parallel f\parallel_{H^{1}}^{2}, \end{array} $$
(20)

where α= min(λf,λ)>0.∙F is sequentially weakly lower semi-continuous.

Let fnf in H1, then \(\parallel f_{n}\parallel _{2}^{2}\longrightarrow \parallel f{\parallel }_{2}^{2}\) and \(\parallel \nabla f_{n}{\parallel }_{2}^{2}\longrightarrow \parallel \nabla f{\parallel }_{2}^{2}\).

Furthermore, gHfn=gH(fnf)−Hf, as fnf in H1, fnf→0 in H1gHfngHf in H1, finally \(\parallel g -{Hf}_{n}{\parallel }_{2}^{2}\longrightarrow \parallel g -Hf{\parallel }_{2}^{2}\).

So, the problem admits a solution.

Uniqueness: The function F is strictly convex, then the solution is unique.

In the Fourier domain, our model (17) is equivalent to

$$ \underset{\mathcal{F}}{\text{arg}\min}\left\{\parallel \mathcal{G} -\mathcal{H}\mathcal{F}\parallel_{2}^{2}+\lambda_{f}\parallel \mathcal{F} \parallel_{2}^{2}+\lambda\parallel D \mathcal{F} \parallel_{2}^{2}\right\}, $$
(21)

where D=(Dh,Dv) denotes the Fourier transform of ∇=(∇h,∇v), and

$$ \parallel D \mathcal{F} \parallel_{2}^{2}=\parallel D_{h} \mathcal{F} \parallel_{2}^{2}+\parallel D_{v} \mathcal{F} \parallel_{2}^{2}. $$
(22)

By taking the Wirtinger derivative of functional in (21) with respect to \( \mathcal {F}\) and setting the result to be 0, we get the optimal condition of \( \mathcal {F}\) as follows:

$$ -\overline{\mathcal{H}}\mathcal{G}+|\mathcal{H}|^{2} \mathcal{F}+\lambda_{f}\mathcal{F}+\lambda |D|^{2} \mathcal{F}=0, $$
(23)

where |D|2=|Dh|2+|Dv|2. The above equality gives the solution of \(\mathcal {F}\)

$$ \hat{\mathcal{F}}=\frac{\overline{\mathcal{H}}\mathcal{G}}{|\mathcal{H}|^{2}+\lambda_{f}+\lambda |D|^{2}} $$
(24)

We use the inverse Fourier transform to get the estimation.

2.2.2 A parameter selection method for H1 regularization

We now consider a parameter choice method. An appropriate selection of the regularization parameters λf and λ is important in the regularization. The well-known methods for this purpose are the L-curve [26] and the GCV method [27, 28]; here, we consider the GCV one. It is a widely used and very successful predictive method for choosing the smoothing parameter. The basic idea is that, if one datum point is dropped, then a good value of the regularization parameter should predict the missing datum value fairly well. In our case, the regularization parameter λf is first selected by the empirical reduction rule, then λ is chosen to minimize the GCV function

$$ {\rm{GCV}}(\lambda)=\frac{\parallel \left(I-{HH}_{\lambda}^{-1}H^{T}\right)g \parallel_{2}^{2}}{\left[tr\left(I-{HH}_{\lambda}^{-1}H^{T}\right)\right]^{2}}, $$
(25)

where \(H_{\lambda }=\mathcal {H}^{T}\mathcal {H}+\lambda _{f} I+\lambda D^{T}D\). This function can be simplified using the Generalized Singular Value Decompositions (GSVD) [29] of the pair \((\mathcal {H},D)\). Thus, there exist orthonormal matrices U,V and invertible matrix X such that

$$ U^{T}\mathcal{H}X=C=Dg(c_{1},..., c_{N}), \qquad c_{i} \geq 0, $$
(26)
$$ V^{T}DX=S=Dg(s_{1},..., s_{N}), \qquad s_{i} \geq 0, $$
(27)

where N=mn. Therefore, the GCV function when used with this regularization can be simplified to

$$ GCV(\lambda)=\frac{\sum_{i=1}^{N}\left(s_{i}^{2}\tilde{g_{i}}/\left({c_{i}^{2}+\lambda s_{i}^{2}+\lambda_{f}}\right)\right)^{2}}{\left(\sum_{i=1}^{N}s_{i}^{2}/\left({c_{i}^{2}+\lambda s_{i}^{2}+\lambda_{f}}\right)\right)^{2}}, $$
(28)

with \(\tilde {g}=U^{T}g\). For the particular case where the matrix D reduces to the identity I, the GSVD of the pair \((\mathcal {H},I)\) reduces to the SVD of the matrix \(\mathcal {H}\) and the expression of GCV is given by the following formula

$$ {\rm{GCV}}(\lambda)=\frac{\sum_{i=1}^{N}(\tilde{g_{i}}/({\sigma_{i}^{2}+\lambda +\lambda_{f}}))^{2}}{(\sum_{i=1}^{N} 1/({\sigma_{i}^{2}+\lambda +\lambda_{f}}))^{2}}, $$
(29)

where σi is the ith singular value of the matrix \(\mathcal {H}\).

GCV(λ) in this case is a continuous function, so we use the MATLAB function fminbnd, which is based on a combination of golden section search and quadratic interpolation search, to find the value of λ at which GCV(λ) is minimized.

Beginning with an initial image, the following sequence defines our algorithm:

3 Results and discussion

This section presents a culmination of all the numerical tests we performed of the proposed approach for solving image deconvolution problem, and compares it with the Wiener filter, the Richardson-Lucy deconvolution [30], the TV approach, and the BTV method. We consider the problem (2), the blurring matrix H is determined from the PSF h [31] of the imaging system which defines how each pixel is blurred. We use three different types of blur kernel: binary blur kernel of size 21×21 and normalized elements to sum 1 (Fig. 1) and Gaussian blur kernel of size 20×20 and standard deviation 3 and a Motion blur kernel of length 15, generated by MATLAB routine fspecial(‘gaussian’, 20, 3) and fspecial(‘motion’, 15), respectively. This choice is for showing the effectiveness of our approach against different types of degradation.

Fig. 1
figure 1

First kernel of convolution. The first kernel used to blur images

For comparison, it is hard to determine whether one method is better than the others just by looking at the images; therefore, it is necessary to compute the peak signal-to-noise ratio (PSNR), which is defined as

$${\rm{PSNR}}(\hat{f},f)=10 \log 10\left(\frac{255^{2}}{\frac{1}{st}\parallel \hat{f}-f \parallel_{2}^{2}}\right) $$

where s and t are numbers of row and column of the image. Note that PSNR is a standard image quality measure that is widely used in comparing image restoration results. We also use the Structural Similarity Index Measure (SSIM), an image quality metric that assesses the visual impact of three characteristics of an image: luminance, contrast, and structure [32].

A crucial issue in solving the problem is the determination of the regularization parameters λf and λ. A good selection of the parameters will result in a promising deblurring result, whereas a bad choice may lead to slow convergence as well as the existence of severe artifacts in the results. Bigger values lead to smoother deblurred images and more stability of the algorithm. However, too big λf and λ will over smooth the image details and decrease the restoration quality. Generally, when the degradation in the blurry image is significant, the values need to be set large, to reduce the blur as much as possible. However, in the continuing iterations, the blurry effect is decreased gradually. In this case, small values are required since large values will damage the fine details in the image. By considering these effects, a direct implementation is to set λf from large to small according to an empirical reduction rule [3335]

$$\lambda^{t+1}_{f}= {\rm{max}}\left(\lambda^{t}_{f}.r, \lambda_{f}^{{\rm{min}}}\right); $$

which depends on the initial value \(\lambda _{f}^{0}\), the minimal value \(\lambda _{f}^{\mbox{{min}}}\) and the reduction factor r∈(0,1). We choose r=0.5. This setting ensures the improvement of the convergence speed of the algorithm. We compute the other regularization parameter λ using the estimated value of λf and the GCV function (28).

A projection on the convex set C is required to impose the image constraint

$$C := \{f|0 \leq f(n) \leq 1, \forall n\} $$

For fair comparison, in Wiener filter [36], \(< |\hat {b}|^{2} > \) denotes the spectral of \(|\hat {b} |^{2}\) which can be estimated from the noise level. \(< |\hat b|^{2} >=\frac {\alpha }{\mbox{{variance}}(g)}\), we look for the optimal α that gives us the best result in each case. Furthermore, we consider the number of iterations as the stopping criterion in the Richardson-Lucy algorithm.

The output image in both methods exhibits ringing introduced by the discrete Fourier transforms used, so to reduce this effect, we use the function edgetaper before the processing.

In TV and BTV-based image restoration methods, the computations are too intensive to run until convergence (even with a large tolerance). Instead, we run until a specified reasonable maximum iteration.

We try to get the best result with each method and then compare it with our approach.

We use seven images in our experiments which are the standards for image processing (Fig. 2). They are of different gray-level histogram. The blurry versions are obtained by convolving the original images with the three PSFs defined previously. We show the deconvolution results in Figs. 3, 4, 5, 6, 7, 8, and 9, and PSNR and SSIM values in Tables 1, 2, and 3.

Fig. 2
figure 2

Set of seven images used in the tests. This is a set of seven images used as an original image in the tests

Fig. 3
figure 3

Deconvolution of Lena image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 100), kernel 2 (iterations = 200) and kernel 3 (iterations = 40)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Fig. 4
figure 4

Deconvolution of Cameraman image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 90), kernel 2 (iterations = 300), and kernel 3 (iterations = 200)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Fig. 5
figure 5

Deconvolution of House image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 70), kernel 2 (iterations = 140), and kernel 3 (iterations = 40)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Fig. 6
figure 6

Deconvolution of Couple image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 150), kernel 2 (iterations = 310), and kernel 3 (iterations = 90)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Fig. 7
figure 7

Deconvolution of Pirate image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 130), kernel 2 (iterations = 230), and kernel 3 (iterations = 80)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Fig. 8
figure 8

Deconvolution of Boat image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 150), kernel 2 (iterations = 330), and kernel 3 (iterations = 80)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Fig. 9
figure 9

Deconvolution of Fingerprint image. From left to right, the first column represents degradation by kernel 1, the second column for the degradation by kernel 2, and by kernel 3 in the third column. From top to bottom, the order of the rows is as follows: blurred images using the three kernels respectively, restored images using RL algorithm [11] (kernel 1 (iterations = 100), kernel 2 (iterations = 240), and kernel 3 (iterations = 90)), restored images using Wiener filter [9], restored images using TV method [14], restored images using BTV approach [16], and finally the restored images using our proposed MAP- H1 model

Table 1 PSNR and SSIM values obtained by five methods for seven different images degraded by kernel 1
Table 2 PSNR and SSIM values obtained by five methods for seven different images degraded by kernel 2
Table 3 PSNR and SSIM values obtained by five methods for seven different images degraded by kernel 3

By carefully observing these results, we find that the edges and details are well recovered, and the ringing artifacts are well suppressed too in each case using our approach instead of the Wiener filter, the Richardson-Lucy algorithm, the TV method, and the BTV approach (slow convergence methods). Also, more the structure and details of the image are important more the degradation is worse and the restoration quality is less. In terms of image quality measures, the proposed MAP- H1 has the highest PSNR and SSIM values among all, the BTV method is the second best, in general, following by the TV approach, the Wiener filter and finally the Richardson-Lucy algorithm.

Depending on the size of the image, the execution of the main proposed algorithm requires an average of 2 to 20 s on Intel(R) Celeron(R) CPU N2815 1.86 GHz computer, making it faster than the other methods.

4 Conclusions

In this work, we have extended the supervised Bayesian approach by adding H1 regularization term in an energy formulation and by proposing a method for choosing the regularization parameter. Numerical results show that the proposed algorithm is stable and can suppress ringing artifacts successfully using the proposed techniques instead of the Wiener filter, the Richardson-Lucy algorithm, the TV method, and the BTV approach, robust methods used in literature, for different types of the blur kernel. Future works will be focused more on using other blur kernels and tests on real images.