1 Introduction

Previous studies have established that high doses of X-ray radiation can cause cancer, leukemia, or other genetic diseases [1,2,3]. To reduce the damage of X-ray to the human body, it is necessary to study the low-dose CT imaging problem without sacrificing the clinical diagnosis information. Low-dose medical CT imaging is to reconstruct CT image by incomplete projection data or coarse data. However, some reconstruction methods based on the Nyquist sampling theorem restrict low-dose medical CT reconstruction by incomplete projection data. Moreover, in the process of CT image formation, it is often affected by the point spread function of the system. This will result in image degradation and artifacts. In addition, when the image is reconstructed from incomplete projection data, the influence of noise will be more serious. Therefore, it is of great significance to study how to eliminate the effects of the point spread function and realize the accurate low-dose medical CT reconstruction.

The basic idea of compress sensing (CS) is that if there is a sparse signal or the signal is sparse in a transform domain, the original signal can be reconstructed through a small amount of samples with high precision [4, 5]. If CS is applied in the field of CT image reconstruction, the decrease of the measurement value can greatly shorten the scanning time and reduce the scanning dose. However, due to the fact that most of the CT images are not sparse, it is necessary to find the appropriate transform domain to achieve the sparsity. For CT medical imaging system, the finite difference is often treated as the sparse transform [6, 7], and it is only suitable for the CT image with local smoothness. For the actual human CT images, the use of finite difference transform for image sparse representation is not very ideal. To solve the above problem, many sparsity methods are used in the CT image reconstruction, such as wavelet transform [8] and dictionary learning [9, 10]. Furthermore, the abovementioned methods can easily lead to the phenomenon of over smoothness. This is attributed to the lack of statistical information about the specific signal and noise in the image reconstruction. The Bayesian inference offers the potential to exactly estimate the original signal or effectively reduce the radiation dose by compress sensing. Consequently, the Bayesian method can provide a variety of probability distribution prediction mechanisms to estimate the different parameters.

In low-dose medical CT image reconstruction, the ill-posed problem is often caused by noise. At present, in the CS framework, there are a lot of specialized CT denoising algorithms [11,12,13,14]. However, the single-CT denoising method will increase the complexity of CT image processing. If we can get the anti-noise ability of compressed sensing CT reconstruction algorithm, which will be more conducive to the development of CT technology. To solve the ill-posed problem caused by noise, some algorithms improve the performance by adding a priori regularization term in the objective function [15,16,17,18,19]. These methods are of great significance to avoid the ambiguity of the solution and obtain the high-precision reconstruction image. The Bayesian inference method based on statistical iteration can effectively utilize the physical effects of the system, the statistical properties of the projection data, and the noise [20,21,22,23,24]. The statistical iterative reconstruction method considers the statistical distribution of signal and noise, but it has the problem of high computational complexity and the slow convergence speed.

At present, the CT image reconstruction algorithm based on CS often does not consider the influence of the point spread function. However, this assumption is unreasonable. Because the CT image is often affected by the point spread function of the system, it will cause the image degradation [25]. Some literatures use an image blind restoration method to eliminate the effects of the point spread function [26, 27]. Therefore, under the CS framework, the existing CT image reconstruction algorithms have the problem of ignoring the effects of the point spread function.

Because the point spread function of CT system is often unknown, the effects of point spread function can be eliminated by blind image restoration in the Bayesian framework. To this end, this paper proposes a variational Bayesian blind restoration reconstruction based on shear wave transform for low-dose medical CT image. The key problem of medical CT image restoration and reconstruction by Bayesian compressed sensing is to establish an accurate prior distribution model, including sparse coefficients, the point spread function, model parameters, and so on. Therefore, a joint distribution model is established. In image reconstruction, the effects of point spread function is considered and eliminated. The shear wave transform is used for CT image sparse representation. The variational Bayesian method is used to estimate all of unknown parameters and speed up the convergence. Finally, the low-dose CT image blind restoration reconstruction is realized.

2 Method

2.1 Medical CT image blind restoration reconstruction

The following model of the complete CT imaging is used in vector form.

$$ \mathbf{P}=\mathbf{AH}\boldsymbol{\uppsi } \boldsymbol{\upalpha} +\mathbf{n} $$
(1)

where P is the noisy projection; A is the system matrix of a CT scanning; H is the degradation matrix, which is composed of point spread function (h); ψ is the sparse transform matrix; α is the sparse transform coefficient matrix; and n is the noise matrix.

To get better performance, the reconstructed CT was represented by the shear wave transform. The reconstructed CT image is f = ψα. The sparse coefficient contains the maximal amount of necessary information.

The target of low-dose medical CT image blind restoration reconstruction based on Bayesian compressed sensing is that the values of the sparse coefficient (α) and the degradation matrix (H) are estimated based on the noisy projection (P) and parameters’ prior distributions. Different a priori models are adopted to describe the sparse coefficient (α) and the point spread function (h). The parameters of the model are defined (Ω). After defining the prior models, the hierarchical Bayesian model is used to establish the joint distribution model of the projection (P) and the other parameters.

The joint probability distribution model is given by

$$ p\left(\Omega, \boldsymbol{\upalpha}, \mathbf{h},\mathbf{P}\right)=p\left(\Omega \right)p\left(\boldsymbol{\upalpha} |\Omega \right)p\left(\mathbf{h}|\Omega \right)p\left(\mathbf{P}|\boldsymbol{\upalpha}, \mathbf{h},\Omega \right) $$
(2)

where p(Ω) is the prior distribution of the model parameters, p(α| Ω) is a priori distribution of the sparse coefficient, p(h| Ω) is the prior distribution of the point spread function, and p(P| α, h, Ω) is the prior distribution of the noisy projection.

The prior models of the noisy projection, the unknown sparse coefficients, and the point spread function are related to the unknown model parameters (Ω). The unknown model parameters are defined by hyper parameters, and the hierarchical Bayesian method is adopted to estimate parameters. Based on the noisy projection (P), the posterior distribution (p(α, h, Ω| P)) of unknown parameters is estimated by the variational Bayesian method. To accurately estimate the posterior distribution (p(α, h, Ω| P)), the paper assumes that the approximate distribution has a closed form solution. All of parameters (α, h, and Ω) are mutually independent.

On the basis of the maximum a posteriori estimation theory, the model parameters are obtained

$$ \widehat{\Omega}=\arg \underset{\Omega}{\max }p\left(\Omega |\mathbf{P}\right)=\underset{\boldsymbol{\upalpha}}{\int}\underset{\mathbf{h}}{\int }p\left(\Omega, \boldsymbol{\upalpha}, \mathbf{h},\mathbf{P}\right)d\boldsymbol{\upalpha} d\mathbf{h} $$
(3)

Then, \( \widehat{\boldsymbol{\upalpha}} \) and \( \widehat{\mathbf{h}} \) are estimated by maximizing the posterior distribution of the model parameters (Ω) and the noisy projection. That is

$$ \widehat{\boldsymbol{\upalpha}},\widehat{\mathbf{h}}=\arg \underset{\boldsymbol{\upalpha}, \mathbf{h}}{\max }p\left(\boldsymbol{\upalpha}, \mathbf{h}|\widehat{\Omega},\mathbf{P}\right) $$
(4)

2.2 Shear wave transform

The theoretical basis of the shear wave transform is the synthetic wavelet theory. Furthermore, the shear wave transform has shown better results than the traditional wavelet because of its characteristics. These characteristics include multi-scale, multi-directional, and multi-precision. The matrix ψ represents a set of basis function, which is obtained by scaling, translation, and rotation transformation of function ψ. Then, the image f can be represented by the sparse shear wave. The optimization problem can be described as follows:

$$ \underset{\boldsymbol{\upalpha}}{\min }{\left\Vert \boldsymbol{\upalpha} \right\Vert}_0\kern0.5em s.t.\kern0.5em \mathbf{f}=\boldsymbol{\uppsi} \boldsymbol{\upalpha} $$
(5)

where α is a sparse shear wave coefficient matrix, in which only a small amount of nonzero elements exists.

In the paper, the reconstructed image is projected into the shear wave domain, and the Laplacian distribution model is used to characterize the shear wave coefficients. The Laplacian distribution model is used as the prior probability density function of the shear wave coefficients.

2.3 Prior distribution

Based on the Bayesian approach, CT image blind restoration reconstruction is realized. It is important that the likelihood function and the prior distribution function of parameters are determined.

2.3.1 Priori distribution of the noisy projection

The probability density P(P|h, α) is usually derived from the noise.

$$ P\left(\left.\mathbf{P}\right|\mathbf{h},\boldsymbol{\upalpha} \right)={\left(\frac{1}{2{\pi \sigma}_n^2}\right)}^{N/2}\exp \left\{-\frac{1}{2{\sigma}_n^2}{\left\Vert \mathbf{P}-\mathbf{AH}\boldsymbol{\uppsi } \boldsymbol{\upalpha} \right\Vert}^2\right\} $$
(6)

where \( {\sigma}_n^2 \) is the variance of the noise and N is the number of the noisy projection.

2.3.2 Priori distribution of the coefficients from shear wave transform

For the shear wave transform, the coefficients from shear wave transform are self-similar and heavy-tail distributed. The Laplacian distribution model can be used to match the distribution of the coefficients. The Laplacian density function is defined as follows:

$$ P\left(\boldsymbol{\upalpha} \right)=\frac{1}{\sqrt{2}{\sigma}_{co}^2}\exp \left\{-\frac{\sqrt{2}\left|\boldsymbol{\upalpha} \right|}{\sigma_{co}^2}\right\} $$
(7)

where \( {\sigma}_{co}^2 \) is the variance of α.

2.3.3 Priori distribution of the point spread function

The autoregressive model is used as the prior model of the point spread function. As the statistical distribution of the point spread function, the probability density function of the Gauss distribution is

$$ p\left(\mathbf{h}\left|{\sigma}_{bl}^2\right.\right)={\left(\frac{1}{\sigma_{bl}^2}\right)}^{M/2}\exp \left\{-\frac{1}{2{\sigma}_{bl}^2}{\left\Vert C\mathbf{h}\right\Vert}^2\right\} $$
(8)

where C is the Laplacian operator, \( {\sigma}_{bl}^2 \) is the variance of the Gauss distribution, \( {\sigma}_{bl}^2 \) is the other model parameters, and M is the size of the point spread function h.

2.3.4 Priori distribution of the model parameters

To estimate the model parameters in the above mathematical expression, the hierarchical Bayesian model is used to estimate the model parameters \( {\sigma}_{co}^2 \), \( {\sigma}_{bl}^2 \), and \( {\sigma}_n^2 \). It is assumed that the unknown model parameters \( {\sigma}_{co}^2 \), \( {\sigma}_{bl}^2 \), and \( {\sigma}_n^2 \) are independent. Then, the joint distribution

$$ p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h},\mathbf{P}\right)=p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\right)p\left(\boldsymbol{\upalpha} \left|{\sigma}_{co}^2\right.\right)p\left(\mathbf{h}\left|{\sigma}_{bl}^2\right.\right)p\left(\mathbf{P}\left|{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h}\right.\right) $$
(9)

The prior distribution and the posterior distribution derived from the prior distribution. Therefore, when the unknown model parameters are estimated in the Bayesian framework, the prior distribution can be described as a conjugate prior distribution. In the algorithm, Γ distribution is used as the prior distribution of the unknown model parameters. Γ distribution is defined as follows:

$$ p\left(\omega \right)=\Gamma \left(\left.\omega \right|{a}_{\omega}^o,{b}_{\omega}^o\right)=\frac{{\left({b}_{\omega}^o\right)}^{a_{\omega}^o}}{\Gamma \left({a}_{\omega}^o\right)}{\omega}^{a_{\omega}^o-1}\exp \left[-{b}_{\omega}^o\omega \right] $$
(10)

where \( \omega \in \left\{{\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\right\} \) represents a parameter, \( {b}_{\omega}^o \) is the scale parameter (\( {b}_{\omega}^o>0 \)), and \( {a}_{\omega}^o \) is the shape parameter(\( {a}_{\omega}^o>0 \)).

2.4 Variational Bayesian medical CT image blind restoration reconstruction

Equation (9) is transformed into another form, then

$$ p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h},\mathbf{P}\right)=p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h}\left|\mathbf{P}\right.\right)p\left(\mathbf{P}\right) $$
(11)

To facilitate the derivation of mathematical form, all unknown parameters in the algorithm are expressed as

$$ \Theta =\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h}\right) $$
(12)

Based on the Bayesian paradigm, the posterior distribution is deduced.

$$ {\displaystyle \begin{array}{c}p\left(\Theta \left|\mathbf{P}\right.\right)=p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h}\left|\mathbf{P}\right.\right)=\frac{p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h},\mathbf{P}\right)}{p\left(\mathbf{P}\right)}\\ {}=\frac{p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\right)p\left(\left.\boldsymbol{\upalpha} \right|{\sigma}_{co}^2\right)p\left(\left.\mathbf{h}\right|{\sigma}_{bl}^2\right)p\left(\left.\mathbf{P}\right|\boldsymbol{\upalpha}, \mathbf{h},{\sigma}_n^2\right)}{p\left(\mathbf{P}\right)}\end{array}} $$
(13)

In this paper, the posterior distribution p(Θ|P) needs to be computed, so that the derivation of mathematical principle can be carried out. According to Eq. (13), once the posterior distribution p(Θ|P) can be calculated, α and h can be estimated by the posterior distribution \( p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\left|\mathbf{P}\right.\right) \) of the model parameters. In fact, it is difficult to obtain a closed form solution (p(P)), so that the posterior distribution p(Θ|P) cannot be expressed as a closed form.

The variational Bayesian method is often used to find the approximate distribution of the posterior distribution. Moreover, the approximate distribution has a closed form solution. The variational approximation method was used to approximate p(Θ|P) by finding an approximate posterior distribution (q(Θ)). In other words, when the Kullback-Leibler divergence between the two distributions attains a minimum, p(Θ|P) and q(Θ) are approximately equal. The Kullback-Leibler divergence between the two distributions is defined as follows:

$$ {C}_{KL}\left(q\left(\Theta \right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right)=\underset{\Theta}{\int }q\left(\Theta \right)\log \left(\frac{q\left(\Theta \right)}{p\left(\Theta \left|\mathbf{P}\right.\right)}\right)d\Theta =\underset{\Theta}{\int }q\left(\Theta \right)\log \left(\frac{q\left(\Theta \right)}{p\left(\Theta, \mathbf{P}\right)}\right)d\Theta +\mathrm{const} $$
(14)

where C KL (q(Θ)‖p(Θ|P)) is a nonnegative number. When q(Θ) = p(Θ|P), C KL (q(Θ)‖p(Θ|P)) = 0.

The variational method is used to estimate the posterior distribution p(Θ|P) by the approximate posterior distribution q(Θ). Then, q(Θ) is defined first. Assuming that all unknown parameters are independent of each other, that is

$$ q\left(\Theta \right)=q\left(\boldsymbol{\upalpha} \right)q\left(\mathbf{h}\right)q\left({\alpha}_{co}\right)q\left({\alpha}_{bl}\right)q\left(\beta \right) $$
(15)

After defining the closed form of q(Θ) distribution, it is necessary to optimize the distribution to get the best model. Θ θ represents a subset of Θ that does not include the parameter θ. For example, θ = α, then \( {\Theta}_{\boldsymbol{\upalpha}}=\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\mathbf{h}\right) \). Thus, the Kullback-Leibler divergence of q(Θ) and p(Θ|P) is as follows:

$$ {\displaystyle \begin{array}{c}{C}_{KL}\left(q\left(\Theta \right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right)={C}_{KL}\left(q\left(\theta \right)q\left({\Theta}_{\theta}\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right)\\ {}=\underset{\Theta}{\int }q\left(\theta \right)q\left({\Theta}_{\theta}\right)\log \left(\frac{q\left(\theta \right)q\left({\Theta}_{\theta}\right)}{p\left(\Theta \left|\mathbf{P}\right.\right)}\right)d\Theta \\ {}=\underset{\theta }{\int }q\left(\theta \right)\times \left(\underset{\Theta_{\theta }}{\int }q\left({\Theta}_{\theta}\right)\log \left(\frac{q\left(\theta \right)q\left({\Theta}_{\theta}\right)}{p\left(\theta, {\Theta}_{\theta },\mathbf{P}\right)}\right)d{\Theta}_{\theta}\right) d\theta +\mathrm{const}\end{array}} $$
(16)

where q θ ) = ∏ ρ ≠ θ q(ρ). The unknown parameters are independent to each other. If θ = α, then

$$ q\left({\Theta}_{\boldsymbol{\upalpha}}\right)=q\left({\alpha}_{co},{\alpha}_{bl},\beta, \mathbf{h}\right)=q\left({\alpha}_{co}\right)q\left({\alpha}_{bl}\right)q\left(\beta \right)q\left(\mathbf{h}\right) $$
(17)

Thus, for each unknown parameter, the approximate posterior distribution q(θ) can be solved by means of Eq. (18).

$$ \widehat{q}\left(\theta \right)=\arg \underset{q\left(\theta \right)}{\min }{C}_{KL}\left(q\left(\theta \right)q\left({\Theta}_{\theta}\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right)=\mathrm{const}\times \exp \left(E{\left[\log p\left(\Theta \right)p\left(\mathbf{P}\left|\Theta \right.\right)\right]}_{q\left({\Theta}_{\theta}\right)}\right) $$
(18)

where \( E{\left[\log p\left(\Theta \right)p\left(\mathbf{P}\left|\Theta \right.\right)\right]}_{q\left({\Theta}_{\theta}\right)}=\int \log p\left(\Theta \right)p\left(\mathbf{P}\left|\Theta \right.\right)q\left({\Theta}_{\theta}\right)d{\Theta}_{\theta } \).

The smaller the value of C KL (q(θ)q θ )‖p(Θ|P)), the closer p(Θ|P) and q(θ)q θ ) become. In this way, after the initial values (\( {q}^1\left({\sigma}_{co}^2\right) \), \( {q}^1\left({\sigma}_{bl}^2\right) \), \( {q}^1\left({\sigma}_n^2\right) \)) of the model parameters \( {\sigma}_{co}^2 \), \( {\sigma}_{bl}^2 \), and \( {\sigma}_n^2 \) are defined, the posterior distributions of all of the unknown parameters are obtained by Eq. (25)

$$ {q}^{k+1}\left(\boldsymbol{\upalpha} \right)=\arg \underset{q\left(\boldsymbol{\upalpha} \right)}{\min}\times {C}_{KL}\left(q\left(\boldsymbol{\upalpha} \right){q}^k\left(\mathbf{h}\right){q}^k\left({\sigma}_{co}^2\right){q}^k\left({\sigma}_{bl}^2\right){q}^k\left({\sigma}_n^2\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right) $$
(19)
$$ {q}^{k+1}\left(\mathbf{h}\right)=\arg \underset{q\left(\mathbf{h}\right)}{\min}\times {C}_{KL}\left(q\left(\mathbf{h}\right){q}^k\left(\boldsymbol{\upalpha} \right){q}^k\left({\sigma}_{co}^2\right){q}^k\left({\sigma}_{bl}^2\right){q}^k\left({\sigma}_n^2\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right) $$
(20)
$$ {q}^{k+1}\left({\sigma}_{co}^2\right)=\arg \underset{q\left({\sigma}_{co}^2\right)}{\min}\times {C}_{KL}\left(q\left({\sigma}_{co}^2\right){q}^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right){q}^k\left({\sigma}_{bl}^2\right){q}^k\left({\sigma}_n^2\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right) $$
(21)
$$ {q}^{k+1}\left({\sigma}_{bl}^2\right)=\arg \underset{q\left({\sigma}_{bl}^2\right)}{\min}\times {C}_{KL}\left(q\left({\sigma}_{bl}^2\right){q}^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right){q}^k\left({\sigma}_{co}^2\right){q}^k\left({\sigma}_n^2\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right) $$
(22)
$$ {q}^{k+1}\left({\sigma}_n^2\right)=\arg \underset{q\left({\sigma}_n^2\right)}{\min}\times {C}_{KL}\left(q\left({\sigma}_n^2\right){q}^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right){q}^k\left({\sigma}_{co}^2\right){q}^k\left({\sigma}_{bl}^2\right)\left\Vert p\left(\Theta \left|\mathbf{P}\right.\right)\right.\right) $$
(23)

where k = 1, 2, 3… until the stop criterion is met.

The stopping criterion is

$$ {\left\Vert E{\left[\boldsymbol{\upalpha} \right]}_{q^k\left(\boldsymbol{\upalpha} \right)}-E{\left[\boldsymbol{\upalpha} \right]}_{q^{k-1}\left(\boldsymbol{\upalpha} \right)}\right\Vert}^2/{\left\Vert E{\left[\boldsymbol{\upalpha} \right]}_{q^{k-1}\left(\boldsymbol{\upalpha} \right)}\right\Vert}^2<\varepsilon $$
(24)

where ε is the specified accuracy range. If the criterion is satisfied, the iteration is stopped and the corresponding results are obtained. Otherwise, the iterations are repeated.

To compute the posterior distribution of the unknown parameters, the mean and covariance matrix of h in the kth iteration are assumed to be

$$ E{\left[\mathbf{h}\right]}_{q^k\left(\mathbf{h}\right)}={E}^k\left(\mathbf{h}\right),\operatorname{cov}{\left[\mathbf{h}\right]}_{q^k\left(\mathbf{h}\right)}={\operatorname{cov}}^k\left(\mathbf{h}\right) $$
(25)

Similarly, the mean values of the model parameters are

$$ E{\left[{\sigma}_{co}^2\right]}_{q^k\left({\sigma}_{co}^2\right)}={\left({\sigma}_{co}^2\right)}^k,E{\left[{\sigma}_{bl}^2\right]}_{q^k\left({\sigma}_{bl}^2\right)}={\left({\sigma}_{bl}^2\right)}^k,E{\left[{\sigma}_n^2\right]}_{q^k\left({\sigma}_n^2\right)}={\left({\sigma}_n^2\right)}^k $$
(26)

Then, according to Eq. (18), we can estimate the posterior conditional distribution of the sparse transform coefficient α. With Eqs. (6) and (7), the two sides of Eq. (18) carried out the logarithm

$$ -2\log {q}^k\left(\boldsymbol{\upalpha} \right)=\mathrm{const}+{\left({\sigma}_{co}^2\right)}^k{\left\Vert \boldsymbol{\upalpha} \right\Vert}^2+{\left({\sigma}_n^2\right)}^kE{\left[{\left\Vert \mathbf{P}-\mathbf{AH}\boldsymbol{\uppsi } \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\mathbf{h}\right)} $$
(27)

Suppose q k(α) = N(α|E k(α), covk(α)), the mean value of the normal distribution is the solution of \( \frac{\partial 2\log {q}^k\left(\boldsymbol{\upalpha} \right)}{\partial \boldsymbol{\upalpha}}=0 \). The covariance of the normal distribution is \( {\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right)={\left[-\frac{\partial^2\log {q}^k\left(\boldsymbol{\upalpha} \right)}{\partial {\boldsymbol{\upalpha}}^2}\right]}^{-1} \). This can be derived

$$ {E}^k\left(\boldsymbol{\upalpha} \right)={\left({M}^k\left(\boldsymbol{\upalpha} \right)\right)}^{-1}{\left({\sigma}_n^2\right)}^k{E}^k{\left(\mathbf{h}\right)}^t\mathbf{P} $$
(28)
$$ {\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right)={\left({\left({\sigma}_{co}^2\right)}^k{C}^tC+{\left({\sigma}_n^2\right)}^k{E}^k{\left(\mathbf{h}\right)}^t{E}^k\left(\mathbf{h}\right)+{\left({\sigma}_n^2\right)}^k{\operatorname{cov}}^k\left(\mathbf{h}\right)\right)}^{-1} $$
(29)

q k(α) can be produced by the mean and covariance of α. Similarly, q k + 1(h) can also be calculated by the same procedure.

$$ {q}^{k+1}\left(\mathbf{h}\right)=N\left(\left.\mathbf{h}\right|{E}^{k+1}\left(\mathbf{h}\right),{\operatorname{cov}}^{k+1}\left(\mathbf{h}\right)\right) $$
(30)
$$ {E}^{k+1}\left(\mathbf{h}\right)={\left({\left({\sigma}_{bl}^2\right)}^k{C}^tC+{\left({\sigma}_n^2\right)}^k{\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right)\right)}^{-1}{\left({\sigma}_n^2\right)}^k{E}^k{\left(\boldsymbol{\upalpha} \right)}^t\mathbf{P} $$
(31)
$$ {\operatorname{cov}}^{k+1}\left(\mathbf{h}\right)={\left({\alpha}_{bl}^k{C}^tC+{\beta}^k{\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right)\right)}^{-1} $$
(32)

Based on the prior model defined by the former, Eq. (33) is used to solve the problem

$$ {\displaystyle \begin{array}{l}\kern1em E{\left[\log p\left(\Theta \right)p\left(\mathbf{P}\left|\Theta \right.\right)\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)}\\ {}=E{\left[\log p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2,\boldsymbol{\upalpha}, \mathbf{h},\mathbf{P}\right)\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)}\\ {}=E{\left[\log p\left({\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\right)p\left(\boldsymbol{\upalpha} \left|{\sigma}_{co}^2\right.\right)p\left(\left.\mathbf{h}\right|{\sigma}_{bl}^2\right)p\left(\left.\mathbf{P}\right|\boldsymbol{\upalpha}, \mathbf{h},{\sigma}_n^2\right)\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)}\\ {}=\mathrm{const}+\sum \limits_{\omega \in \left\{{\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\right\}}\left(\left({a}_{\omega}^o-1\right)\log \omega -\omega {b}_{\omega}^o\right)+N\log {\sigma}_{co}+M\log {\sigma}_{bl}+N\log {\sigma}_n\\ {}-\frac{1}{2}{\sigma}_{co}^2E{\left[{\left\Vert \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right)}-\frac{1}{2}{\sigma}_{bl}^2E{\left[{\left\Vert C\mathbf{h}\right\Vert}^2\right]}_{q^{k+1}\left(\mathbf{h}\right)}-\frac{1}{2}{\sigma}_n^2E{\left[{\left\Vert \mathbf{P}-\mathbf{AH}\boldsymbol{\uppsi } \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)}\end{array}} $$
(33)

According to the nature of mathematical expectation in probability theory, E[x 2] = E 2[x] + var(x), var(x) is the variance of x. trace(x) is the trace of x. Then

$$ E{\left[{\left\Vert \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right)}={\left\Vert {E}^k\left(\boldsymbol{\upalpha} \right)\right\Vert}^2+\mathrm{trace}\left({\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right)\right) $$
(34)
$$ E{\left[{\left\Vert C\mathbf{h}\right\Vert}^2\right]}_{q^{k+1}\left(\mathbf{h}\right)}={\left\Vert {CE}^{k+1}\left(\mathbf{h}\right)\right\Vert}^2+\mathrm{trace}\left({CC}^t{\operatorname{cov}}^{k+1}\left(\mathbf{h}\right)\right) $$
(35)
$$ {\displaystyle \begin{array}{c}E{\left[{\left\Vert \mathbf{P}-\mathbf{A}\mathbf{H}\boldsymbol{\uppsi} \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)}={\left\Vert \mathbf{P}-\mathbf{A}{E}^{k+1}\left(\mathbf{H}\right)\boldsymbol{\uppsi} {E}^k\left(\boldsymbol{\upalpha} \right)\right\Vert}^2+\mathrm{trace}\left({\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right){\operatorname{cov}}^{k+1}\left(\mathbf{h}\right)\right)\\ {}+\mathrm{trace}\left({E}^{k+1}\left(\mathbf{H}\right){E}^{k+1}{\left(\mathbf{H}\right)}^t{\operatorname{cov}}^k\left(\boldsymbol{\upalpha} \right)\right)\end{array}} $$
(36)

E k(α), covk(α), E k + 1(h), and covk + 1(h) have been obtained in Eqs. (28), (29), (31), and (32). E k + 1(H) and covk + 1(H) are composed of E k + 1(h) and covk + 1(h).

To calculate q k + 1(ω) (\( \omega \in \left\{{\sigma}_{co}^2,{\sigma}_{bl}^2,{\sigma}_n^2\right\} \)), we need to calculate the corresponding mean. In the paper, it is assumed that the model parameters \( {\sigma}_{co}^2 \), \( {\sigma}_{bl}^2 \), and \( {\sigma}_n^2 \) follow the Γ distribution. The mean of Γ distribution is \( E\left[\omega \right]=\frac{a_{\omega}^o}{b_{\omega}^o} \). When the model parameters are performed in the kth iteration, the Γ distribution is \( {q}^{k+1}\left(\omega \right)=\Gamma \left(\left.\omega \right|{a}_{\omega}^{k+1},{b}_{\omega}^{k+1}\right) \), and the mean value is \( E\left[\omega \right]=\frac{a_{\omega}^{k+1}}{b_{\omega}^{k+1}} \). By Eq. (33), we know

$$ {a}_{\sigma_{co}^2}^{k+1}={a}_{\sigma_{co}^2}^o+\frac{N}{2} $$
(37)
$$ {b}_{\sigma_{co}^2}^{k+1}={b}_{\sigma_{co}^2}^o+\frac{1}{2}E{\left[{\left\Vert \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right)} $$
(38)
$$ {a}_{\sigma_{bl}^2}^{k+1}={a}_{\sigma_{bl}^2}^o+\frac{M}{2} $$
(39)
$$ {b}_{\sigma_{bl}^2}^{k+1}={b}_{\sigma_{bl}^2}^o+\frac{1}{2}E{\left[{\left\Vert C\mathbf{h}\right\Vert}^2\right]}_{q^{k+1}\left(\mathbf{h}\right)} $$
(40)
$$ {a}_{\sigma_n^2}^{k+1}={a}_{\sigma_n^2}^o+\frac{N}{2} $$
(41)
$$ {b}_{\sigma_n^2}^{k+1}={b}_{\sigma_n^2}^o+\frac{1}{2}E{\left[{\left\Vert \mathbf{P}-\mathbf{AH}\boldsymbol{\uppsi } \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)} $$
(42)

Then, the mean values of the model parameters in the kth iteration are as follows:

$$ E{\left[{\sigma}_{co}^2\right]}_{q^{k+1}\left({\sigma}_{co}^2\right)}=\frac{a_{\sigma_{co}^2}^o+\frac{N}{2}}{b_{\sigma_{co}^2}^o+\frac{1}{2}E{\left[{\left\Vert \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right)}} $$
(43)
$$ E{\left[{\sigma}_{bl}^2\right]}_{q^{k+1}\left({\sigma}_{bl}^2\right)}=\frac{a_{\sigma_{bl}^2}^o+\frac{M}{2}}{b_{\sigma_{bl}^2}^o+\frac{1}{2}E{\left[{\left\Vert C\mathbf{h}\right\Vert}^2\right]}_{q^{k+1}\left(\mathbf{h}\right)}} $$
(44)
$$ E{\left[{\sigma}_n^2\right]}_{q^{k+1}\left({\sigma}_n^2\right)}=\frac{a_{\sigma_n^2}^o+\frac{N}{2}}{b_{\sigma_n^2}^o+\frac{1}{2}E{\left[{\left\Vert \mathbf{P}-\mathbf{AH}\boldsymbol{\uppsi } \boldsymbol{\upalpha} \right\Vert}^2\right]}_{q^k\left(\boldsymbol{\upalpha} \right){q}^{k+1}\left(\mathbf{h}\right)}} $$
(45)

The mean values of the above model parameters are used in the iterative process of calculating the approximate distribution of α and h.

Equations (35), (38), (43), (44), and (45) are used to estimate all of unknown parameters. And then, f = ψα is used to obtain the reconstructed image.

3 Discussion and experiment results

To verify the validity of the proposed algorithm, a low-dose CT image model is proposed in the literature [28]. According to the literature, it is shown that the noisy projection (P) of low-dose CT image approximately follows the Gauss distribution, and the relationship between the mean and variance can be described as

$$ {\sigma}_j^2={g}_j\exp \left(\frac{p_j}{\kappa}\right) $$
(46)

where P j and \( {\sigma}_j^2 \) represent the mean and variance of pixel j, respectively, and g j and κ are two related parameters of the system. For a given system, the two can be regarded as a known quantity.

According to the above model, a low-dose CT image with a size of 512 × 512 pixels is designed to simulate the digital model of the human thorax, and the simulation CT phantom image is shown in Fig. 1. The fan beam scanning mode is used in the experiment. According to Eq. (46), the Gauss noise is added into the ideal projection data, and the low-dose noise projection data is simulated (g j  = 200, κ = 4 × 104). The degradation process of CT image is approximated by a Gauss point spread function with σ 2 = 1. And, the Gauss white noise with the noise power of \( {\sigma}_n^2 \) is added. Signal-to-noise ratio (SNR) is defined as \( \mathrm{SNR}=10{\log}_{10}\left({\left\Vert \mathbf{f}\right\Vert}_2^2/{\sigma}_n^2\right) \). The experimental analysis is carried out under the condition of SNR of 20 and 40 dB, respectively (ε = 10−4). Under the same test condition, some commonly used algorithms and the proposed algorithm are used to reconstruct the low-dose CT images. These commonly used algorithms such as filtered back projection (FBP) [29], simultaneous algebraic reconstruction technique (SART) [30], and total variation (TV) regularization [31] combine iterative blind image restoration to obtain reconstruction image. MATLAB 2016a programming was used through the experiment, with the hardware environment for Intel (R) i7-4770 60GHz, 16G memory.

Fig. 1
figure 1

CT image

Obviously, in the case of sparse projection angles, an idea of using FBP algorithm and blind image restoration (BIR) is still easily affected by the noise. Especially, in low signal-to-noise ratio, the algorithm performance become worse rapidly. A method by combining SART with blind image restoration algorithm has good performance, but the artifacts are serious in image reconstruction from incomplete projections. A method combined by SART, TV regularization algorithm, and BIR can be used in the case of sparse projection angles, but it would result in a smooth image with missing details. The proposed algorithm eliminates the effects of the point spread function in the process of low-dose medical CT image reconstruction and improves the reconstructed image quality. With this approach, low-dose medical CT image reconstruction becomes an iterative process and a set of parameters is refined over successive optimization. By comparison, the proposed algorithm is the best in terms of clarity, contrast, and detail preservation.

To clearly see the details, local amplification of Figs. 2 and 3 is carried out for comprehensive evaluation, as shown in Figs. 4 and 5. As can be seen from the figure, the method can effectively suppress the noise and preserve the image details. The proposed method is superior to the other methods in the removal of artifacts and the preservation of edges.

Fig. 2
figure 2

Comparison of reconstructed images with different blind restoration reconstruction methods (SNR = 20 dB)

Fig. 3
figure 3

Comparison of reconstructed images with different blind restoration reconstruction methods (SNR = 40 dB)

Fig. 4
figure 4

Comparison of local amplification details with different blind restoration reconstruction methods (SNR = 20 dB)

Fig. 5
figure 5

Comparison of local amplification details with different blind restoration reconstruction methods (SNR = 40 dB)

To quantify the comparative algorithm performance, the external evaluation was useful, and peak signal-to-noise ratio (PSNR), universal image quality index (UIQI) [32], structural similarity index metric (SSIM) [33], and sum of square differences error (SSDE) are good criterions when improving a restoration reconstruction method. With regard to these four metrics, the ideal value of PSNR is +∞, the ideal values of both UIQI and SSIM are 1, and the ideal values of SSDE are 0. These four metrics can only be used in the simulated experiments because they require a referred image. The above four metrics tabulated for each experiment are the average values over 10 times repetitions. The results are shown in Tables 1 and 2.

Table 1 Comparison of image evaluation parameters of different reconstruction algorithms (SNR = 20 dB)
Table 2 Comparison of image evaluation parameters of different reconstruction algorithms (SNR = 40 dB)

Compared with the other methods, the proposed algorithm can obtain a better visual effect and can be more robust in detail reconstruction. It can be seen from Tables 1 and 2 that the proposed algorithm improves the objective image quality metrics such as the PSNR, SSIM, UIQI, SSDE, and so on. At the same time, under different parameters, the new algorithm has a better restoration and reconstruction effect. However, the initial estimation of the parameter is random. To obtain good estimation, we need to establish a relatively good estimation criterion and introduce the confidence parameter. It is helpful to eliminate the influence of various random factors and enhance the robustness of parameter estimation.

To verify the versatility, the second simulation experiment is completed to reconstruct the other CT images as shown in Fig. 6. The CT image size is 512 × 512 pixels.

Fig. 6
figure 6

The CT image in the second experiment

The limited-angle projection was used to simulate the low-dose medical CT imaging. The number of projection angle was set as 180. FBP + BIR, SART + BIR, SART + TV + BIR, and the proposed method are used to realize low-dose medical CT image blind restoration reconstruction, respectively.

The reconstructed images are shown in Fig. 7. To observe the simulation procedure more directly, a part of an image is amplified, as shown in Fig. 8. As shown in Figs. 7 and 8, the reconstructed images by FBP + BIR and SART + BIR have some artifacts in the case of low-dose CT imaging. The problem with SART + TV + BIR is that the reconstructed image is too smooth, and some visible details will be lost. The proposed algorithm has the best effects such as sharpness, contrast, and detail preservation.

Fig. 7
figure 7

Comparison of reconstructed images with different blind restoration reconstruction methods (the second CT phantom image)

Fig. 8
figure 8

Comparison of local amplification details with different blind restoration reconstruction methods (the second CT phantom image)

To quantitatively evaluate the effectiveness of the proposed algorithm, the reconstructed images shown in Fig. 7 and the ideal phantom shown in Fig. 6 are compared using SSIM, PSNR, UIQI, and SSDE, as shown in Table 3.

Table 3 Comparison of image evaluation parameters of different reconstruction algorithms (the CT phantom image in the second simulation experiment)

As is shown in Table 3, the value of SSIM by the proposed algorithm is closer to 1. By Comparison, PSNR, UIQI, and SSDE are improved. That means the method proposed can get a good reconstructed image that is highly similar to the ideal CT image.

4 Conclusion

This paper proposes a variational Bayesian blind restoration reconstruction based on shear wave transform for low-dose medical CT image. The shear wave coefficients are subject to Laplacian function. According to the Bayesian statistical theory, the parameters such as the hyper parameters of the shear wave coefficient, the parameters of the point spread function, and the inverse variance of the noise can be regarded as the random variables, and these parameters follow the gamma distribution. The values of the parameters are estimated by the maximum a posteriori estimation method. Finally, in the Bayesian framework, the CT restoration reconstruction of low-dose medical image is transformed into an optimization problem, and the variational approximation method is used to solve the problem. Experiments show that the Bayesian model can describe the local structure information adaptively. The proposed algorithm can preserve the image edge details and obtain the satisfactory visual effect. The proposed algorithm takes into account the noisy projection and eliminate the effects of the point spread function. The experimental results show that the proposed algorithm is superior to other algorithms in subjective visual effect. In the low signal-to-noise ratio environment, the proposed algorithm can reconstruct the high-quality image. At the same time, in the aspect of objective evaluation, the proposed algorithm improves the objective image quality metrics such as PSNR, SSIM, UIQI, SSDE, and so on. The convergence of the algorithm is greatly affected by the initial value. Several experiments show that similar initial values can improve the computational speed for the same type of image. Therefore, in order to improve the speed of the algorithm, the reference value can be provided in the same kind of image, also known as expert experience.