Sequential Image Recovery Using Joint Hierarchical Bayesian Learning

Xiao, Yao; Glaubitz, Jan

doi:10.1007/s10915-023-02234-1

Sequential Image Recovery Using Joint Hierarchical Bayesian Learning

Open access
Published: 18 May 2023

Volume 96, article number 4, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Scientific Computing Aims and scope Submit manuscript

Sequential Image Recovery Using Joint Hierarchical Bayesian Learning

Download PDF

984 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Recovering temporal image sequences (videos) based on indirect, noisy, or incomplete data is an essential yet challenging task. We specifically consider the case where each data set is missing vital information, which prevents the accurate recovery of the individual images. Although some recent (variational) methods have demonstrated high-resolution image recovery based on jointly recovering sequential images, there remain robustness issues due to parameter tuning and restrictions on the type of sequential images. Here, we present a method based on hierarchical Bayesian learning for the joint recovery of sequential images that incorporates prior intra- and inter-image information. Our method restores the missing information in each image by “borrowing” it from the other images. More precisely, we couple sequential images by penalizing their pixel-wise difference. The corresponding penalty terms (one for each pixel and pair of subsequent images) are treated as weakly-informative random variables that favor small pixel-wise differences but allow occasional outliers. As a result, all of the individual reconstructions yield improved accuracy. Our method can be used for various data acquisitions and allows for uncertainty quantification. Some preliminary results indicate its potential use for sequential deblurring and magnetic resonance imaging.

Integrating the Missing Information Estimation into Multi-frame Super-Resolution

Article 07 July 2015

Trainable Regularization for Multi-frame Superresolution

Online multi-frame super-resolution of image sequences

Article Open access 04 December 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many applications rely on recovering temporal image sequences from noisy, indirect, and incomplete data [6, 28, 42, 43]. We can often formulate this task as a sequence of linear inverse problems,

$$\begin{aligned} {\textbf{y}}^{(j)} = F^{(j)} {\textbf{x}}^{(j)} + {\textbf{e}}^{(j)}, \quad j=1,\dots ,J, \end{aligned}$$

(1)

where ${\textbf{y}}^{(j)}$ is a given data vector, ${\textbf{x}}^{(j)}$ is the (unknown) vectorized image, $F^{(j)}$ is a known linear forward operator, and ${\textbf{e}}^{(j)}$ corresponds to unknown noise. The individual recovery of each image by separately solving the linear inverse problems (1) is a well-studied, although challenging problem by itself [25, 27, 40]. A prominent approach is to replace (1) with a nearby regularized inverse problem that promotes some prior belief about the unknown image. In imaging applications, it is often reasonable to assume that some linear transform of the unknown image ${\textbf{x}}$, say $R {\textbf{x}}$, is sparse. This prior belief yields the $\ell ^1$-regularized inverse problems

$$\begin{aligned} \min _{{\textbf{x}}^{(j)}} \left\{ \Vert F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \Vert _2^2 + \lambda _j \Vert R {\textbf{x}}^{(j)} \Vert _1 \right\} , \quad j=1,\dots ,J, \end{aligned}$$

(2)

where R is the regularization operator and $\lambda _j > 0$ are regularization parameters. Usual choices for R are discrete (high-order) total variation (TV) [37], total generalized/directional variation [8]/ [35, 36], and polynomial annihilation [2, 3, 23] operators. The rationale behind considering (2) is that the $\ell ^1$-norm, $\Vert \cdot ||_1$, serves as a convex surrogate for the $\ell ^0$-“norm", $\Vert \cdot \Vert _0$; an observation that lies in the heart of compressed sensing [16, 18, 21]. Another prominent approach, closely related to regularization, is Bayesian inverse problems [11, 29, 38]. In this setting, we express our lack of information about some of the quantities in (1) by modeling them as random variables, which (relations) are characterized by certain density functions. The fidelity and regularization term corresponds to the negative logarithm of the likelihood and prior density, respectively, and the regularized solution (2) corresponds to the maximizer of the posterior density. The class of conditionally Gaussian priors [9, 10, 12, 13, 39] is particularly suited to promote sparsity.

Here, we consider the case that each data set ${\textbf{y}}^{(j)}$ in (1) is missing vital information, which prevents the accurate individual recovery of the images ${\textbf{x}}^{(j)}$. A common strategy is to restore the missing information in each image by “borrowing” it from the other images. [1, 15, 17, 41, 45] considered deterministic and Bayesian methods of such a flavor that rely on a common sparsity assumption, meaning that $R {\textbf{x}}^{(1)},\dots ,R {\textbf{x}}^{(J)}$ have the same support. Unfortunately, temporally changing image sequences often violate the common sparsity assumption. Recently, [42] addressed this problem by locally coupling the images in no-change regions while decoupling them in change regions. This was done by first computing a diagonal change mask $C^{(j-1,j)}$ directly from the consecutive data sets ${\textbf{y}}^{(j-1)}, {\textbf{y}}^{(j)}$ and using this change mask to penalize any difference between ${\textbf{x}}^{(j-1)}$ and ${\textbf{x}}^{(j)}$ in no-change regions. Let $[C^{(j-1,j)}]_{n,n} = 0$ in change regions and $[C^{(j-1,j)}]_{n,n} = 1$ in no-change regions. The joint $\ell ^1$-regularized inverse problem used in [42] is

$$\begin{aligned}{} & {} \min _{{\textbf{x}}^{(1)},\dots ,{\textbf{x}}^{(J)}} \left\{ \sum _{j=1}^J \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2 + \sum _{j=1}^J \lambda _j \left\| R {\textbf{x}}^{(j)} \right\| _1 \right. \nonumber \\{} & {} \quad \left. + \sum _{j=2}^J \mu _j \left\| C^{(j-1,j)} ( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} ) \right\| _2^2 \right\} , \end{aligned}$$

(3)

where the $\mu _j \ge 0$ are fixed coupling parameters. Equation (3) balances data fidelity, intra-image regularization (sparsity of $R {\textbf{x}}^{(j)}$), and inter-image regularization (coupling in no-change regions). Roughly speaking, the last term in (3),

$$\begin{aligned} \left\| C^{(j-1,j)} ( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} ) \right\| _2^2 = \sum _{n=1}^N \left[ C^{(j-1,j)}\right] _{n,n} \left| x^{(j-1)}_n - x^{(j)}_n \right| ^2, \end{aligned}$$

(4)

penalizes any change between the two subsequent images ${\textbf{x}}^{(j-1)}$ and ${\textbf{x}}^{(j)}$ in no-change regions, where $[C^{(j-1,j)}]_{n,n} = 1$, since the value of the objective function in (3) increases with $|x^{(j-1)}_n - x^{(j)}_n|$ in this case. At the same time, in change regions, the difference $|x^{(j-1)}_n - x^{(j)}_n|$ does not influence the value of the objective function since $[C^{(j-1,j)}]_{n,n} = 0$ in this case. Figure 1 illustrates the advantage of jointly recovering a temporal sequence of magnetic resonance images from noisy and under-sampled Fourier data (see Sect. 4 for more details).

While the joint recovery of changing images using (3) can yield an improved accuracy, there remain issues with robustness and the range of application: (I1) Selecting appropriate regularization and coupling parameters is a non-trivial task, and their choice can critically influence the quality of the recovered images. Although (re-)weighted [1, 14, 22] $\ell ^1$-regularization can increase the robustness w. r. t. the intra-image regularization parameters $\lambda _j$, selecting suitable inter-image coupling parameters $\mu _j$ remains an open problem. (I2) The method proposed in [42, Section 3] to pre-compute the change masks uses Fourier data and assumes that the sequential images only include objects with closed boundaries. This assumption prevents the application to problems with other types of data acquisition. Finally, one has to tune some problem-dependent free parameters by hand on a case-by-case basis.

1.1 Our Contribution

We propose a joint hierarchical Bayesian learning (JHBL) method for sequential image recovery. The method is easy to implement, efficient, and simple to parallelize. Our method avoids issues (I1) and (I2) by reinterpreting all involved parameters—including the change mask—as random variables, which we then estimate together with the recovered images. In particular, for the random variables responsible for the inter-image coupling, we use weakly-informative gamma distributions that favor small pixel-wise differences (in no-change regions) but allow for occasional outliers (in change regions). Our approach does not rely on Fourier data or images showing objects with closed boundaries. We demonstrate that our JHBL method improves the accuracy of all individual images. Another advantage is that our method quantifies the uncertainty in the recovered images, which is often desirable. Some preliminary results for magnetic resonance imaging and road-traffic monitoring indicate the advantage of JHBL for sequential image recovery.

1.2 Outline

In Sect. 2, we present the JHBL model that promotes intra-image sparsity and inter-image coupling. In Sect. 3, we propose an efficient method for Bayesian inference. In Sect. 4, we demonstrate the performance of the resulting JHBL method for test cases from sequential deblurring and magnetic resonance imaging. Section 5 offers some concluding thoughts.

2 The Joint Hierarchical Bayesian Model

We now describe the hierarchical Bayesian model we use to develop our JHBL method.

2.1 The Likelihood

Consider the data model (1) and assume that ${\textbf{y}}^{(j)} \in {\mathbb {R}}^{M_j}$, $F \in {\mathbb {R}}^{M_j \times N}$, ${\textbf{x}}^{(j)} \in {\mathbb {R}}^N$, and that ${\textbf{e}}^{(j)} \in {\mathbb {R}}^{M_j}$ is independent and identically distributed (i.i.d.) zero-mean normal noise, $e_m^{(j)} \sim {\mathcal {N}}(0,\alpha _j)$ for $m=1,\dots ,M_j$, with noise precision $\alpha _j > 0$.^{Footnote 1} The jth likelihood function, which is the conditional probability density of ${\textbf{y}}^{(j)}$ given ${\textbf{x}}^{(j)}$ and $\alpha _j$, is

$$\begin{aligned} p({\textbf{y}}^{(j)} | {\textbf{x}}^{(j)}, \alpha _j) \propto \alpha _j^{M_j/2} \exp \left\{ -\frac{\alpha _j}{2} \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2 \right\} , \end{aligned}$$

(5)

where “$\propto $” means that the two sides are equal up to a multiplicative constant. We also treat the noise precision $\alpha _j$ as a random variable that is learned together with the images ${\textbf{x}}^{(j)}$.^{Footnote 2} We assume that $\alpha _j$ is gamma distributed,

$$\begin{aligned} p(\alpha _j) = \Gamma (\alpha _j|\eta _{\alpha _j},\theta _{\alpha _j}) \propto \alpha _j^{\eta _{\alpha _j}-1} \exp \{-\theta _{\alpha _j} \alpha _j\}, \quad j=1,\dots ,J. \end{aligned}$$

(6)

For simplicity, we use the same shape and rate parameter for all noise precisions, denoted by $\eta _{\alpha }$ and $\theta _{\alpha }$. We use the gamma distribution because it is conditionally conjugate to the normal distribution, which is convenient for Bayesian inference (see Sect. 3). Moreover, we can make the noise hyper-priors (6) flat and therefore uninformative by choosing $\theta _{\alpha } \approx 0$. Figure 2 illustrates the gamma density function for different shape and rate parameters. Remark 1 addresses the possibility of using informative hyper-priors.

If we denote the collection of all images by ${\textbf{x}} = [{\textbf{x}}^{(1)};\dots ;{\textbf{x}}^{(J)}]$, of all data sets by ${\textbf{y}} = [{\textbf{y}}^{(1)};\dots ;{\textbf{y}}^{(J)}]$, and of all noise precisions by $\varvec{\alpha } = [\alpha _1;\dots ;\alpha _J]$, then the joint likelihood function is

$$\begin{aligned} p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha } ) = \prod _{j=1}^J p({\textbf{y}}^{(j)} | {\textbf{x}}^{(j)}, \alpha _j) \propto \prod _{j=1}^J \alpha _j^{M_j/2} \exp \left\{ -\frac{\alpha _j}{2} \Vert F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \Vert _2^2 \right\} , \end{aligned}$$

(7)

assuming that the data sets are conditionally independent. A few remarks are in order.

Remark 1

For simplicity we use the same hyper-prior $\Gamma (\cdot |\eta _{\alpha },\theta _{\alpha })$ and parameters $\eta _{\alpha },\theta _{\alpha }$ for all components of $\varvec{\alpha }$. We do this since we assume no prior knowledge about the noise variances of the different measurement vectors. However, if one has a reasonable a priori notion of the underlying noise variances, the choice for hyper-prior could be modified accordingly [5, 13].

Remark 2

The data sets being conditionally independent means that if we know the images ${\textbf{x}}$ and the noise precisions $\varvec{\alpha }$, then knowledge of one data set ${\textbf{y}}^{(j)}$ provides no additional information about the likelihood of another data set ${\textbf{y}}^{(i)}$ with $i \ne j$.

2.2 The Intra-image Prior

We assume that some linear transform of the images, say $R {\textbf{x}}^{(j)}$ with $R \in {\mathbb {R}}^{K \times N}$, is sparse. One can model sparsity by various priors, including TV [4, 31], mixture-of-Gaussian [19], Laplace [20], and hyper-Laplace [32, 34] priors. Here, we use conditionally Gaussian priors, which are particularly suited to promote sparsity and allow for efficient Bayesian inference [9, 13, 24]. The jth intra-image prior is

$$\begin{aligned} p( {\textbf{x}}^{(j)} | \varvec{\beta }^{(j)} ) \propto \det \left( B^{(j)} \right) ^{1/2} \exp \left\{ - \frac{1}{2} ( {\textbf{x}}^{(j)} )^T R^T B^{(j)} R {\textbf{x}}^{(j)} \right\} , \end{aligned}$$

(8)

where $B^{(j)} = {{\,\textrm{diag}\,}}(\varvec{\beta }^{(j)})$ and $\varvec{\beta }^{(j)} = [\beta ^{(j)}_1,\dots ,\beta ^{(j)}_K]$ is treated as a random vector with i. i. d. gamma distributed components,

$$\begin{aligned} p(\beta ^{(j)}_k) = \Gamma (\beta ^{(j)}_k|\eta _{\beta ^{(j)}_k},\theta _{\beta ^{(j)}_k}), \quad k=1,\dots ,K. \end{aligned}$$

(9)

For simplicity, we use the same shape and rate parameter for all prior precisions, denoted by $\eta _{\beta }$ and $\theta _{\beta }$. We choose $\theta _{\beta } \approx 0$ to make the hyper-prior flat and thus uninformative. Again, if one has a reasonable a priori notion of the support of $R {\textbf{x}}^{(j)}$, the choice for the hyper-prior (9) and the parameters $\eta _{\beta ^{(j)}_k},\theta _{\beta ^{(j)}_k}$ could be modified correspondingly. If we denote the collection of all precision vectors by $\varvec{\beta } = [\varvec{\beta }^{(1)};\dots ;\varvec{\beta }^{(J)}]$, then the joint intra-image prior is

$$\begin{aligned} \begin{aligned} p( {\textbf{x}} | \varvec{\beta } )&= \prod _{j=1}^J p( {\textbf{x}}^{(j)} | \varvec{\beta }^{(j)} ) \\&\propto \prod _{j=1}^J \det \left( B^{(j)} \right) ^{1/2} \exp \left\{ - \frac{1}{2}( {\textbf{x}}^{(j)} )^T R^T B^{(j)} R {\textbf{x}}^{(j)} \right\} , \end{aligned} \end{aligned}$$

(10)

assuming that the images are conditionally independent.

Remark 3

The images being conditionally independent means that if we know the parameters $\varvec{\beta }$, then knowledge of one image ${\textbf{x}}^{(j)}$ provides no additional information about the likelihood of another image ${\textbf{x}}^{(i)}$ with $i \ne j$. Although we assume that the images form a temporal sequence with only parts of the images changing, we treat this as qualitative information; We neither want to quantify the exact location nor the size of the (no-)change regions, making us assume that the images are conditionally independent.

2.3 The Inter-image Prior

Assume for the moment that we had a pre-computed diagonal change mask $C^{(j-1,j)}$ with $[C^{(j-1,j)}]_{n,n} = 0$ in change regions and $[C^{(j-1,j)}]_{n,n} = 1$ in no-change regions^{Footnote 3} as in [42]. We could then translate the coupling term in (3) into the empirical conditionally Gaussian prior

$$\begin{aligned} \begin{aligned} p( {\textbf{x}} | \mu _2,\dots ,\mu _J )&\propto \prod _{j=2}^J \mu _j^{N/2} \exp \left\{ - \frac{\mu _j}{2} \left\| C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\| _2^2 \right\} \\&= \prod _{j=2}^J \mu _j^{N/2} \exp \left\{ - \frac{\mu _j}{2} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)}\right) ^T C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\} , \end{aligned} \end{aligned}$$

(11)

where we have used that $\left( C^{(j-1,j)}\right) ^2 = C^{(j-1,j)}$. As mentioned before, the method proposed in [42, Section 3] to pre-compute the change masks uses Fourier data and assumes that the sequential images only include objects with closed boundaries, however. We overcome these restrictions by replacing the pre-computed diagonal elements of the change mask with random variables, which we then estimate together with the other parameters and images. We propose to use the inter-image prior

$$\begin{aligned} p( {\textbf{x}} | \varvec{\gamma } ) \propto \prod _{j=2}^J \, \det \left( C^{(j-1,j)} \right) ^{1/2} \exp \left\{ - \frac{1}{2} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) ^T C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\} \nonumber \\ \end{aligned}$$

(12)

with $C^{(j-1,j)} = {{\,\textrm{diag}\,}}(\varvec{\gamma }^{(j-1,j)})$ and $\varvec{\gamma }^{(j-1,j)} = [\gamma ^{(j-1,j)}_1, \dots , \gamma ^{(j-1,j)}_N]^T$. Furthermore, $\varvec{\gamma }$ in (12) denotes the collection of all coupling vectors $\varvec{\gamma }^{(1,2)},\dots ,\varvec{\gamma }^{(J-1,J)}$. The random variables $\gamma ^{(j-1,j)}_n$ introduce an adaptive and spatially varying weighting in the change mask. Another advantage of (12) is that we stay within the class of conditionally Gaussian densities, which is convenient for Bayesian inference. This further motivates us to assume that the elements of $\varvec{\gamma }^{(j-1,j)}$ are i. i. d. gamma distributed,

$$\begin{aligned} p\left( \gamma ^{(j-1,j)}_n \right) = \Gamma \left( \gamma ^{(j-1,j)}_n | \eta _{\gamma ^{(j-1,j)}_n}, \theta _{\gamma ^{(j-1,j)}_n}\right) , \quad n=1,\dots ,N. \end{aligned}$$

(13)

For simplicity, we use the same shape and rate parameter for all elements, denoted by $\eta _{\gamma }$ and $\theta _{\gamma }$. To make the hyper-prior flat and therefore uninformative again, we choose $\theta _{\gamma } \approx 0$. As will be discussed in greater detail in Sect. 4, the choice of $\eta _{\gamma }$ is influenced by the magnitude of change we expect to occur between consecutive pairs of images. Moreover, if one has a reasonable a priori notion of the location or amount of change between subsequent images, the choice for the hyper-prior (13) and the parameters $\eta _{\gamma ^{(j-1,j)}_n}, \theta _{\gamma ^{(j-1,j)}_n}$ could be modified correspondingly. We will investigate such informative hyper-priors in future works.

2.4 The Combined Prior

We now discuss how promoting intra-image sparsity as in Sect. 2.2 and inter-image coupling as in Sect. 2.3 can be combined in a single prior. To this end, we can re-write the joint intra-image prior (10) more compactly as

$$\begin{aligned} p( {\textbf{x}} | \varvec{\beta } ) \propto \det ( B )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T {\tilde{R}}^T B {\tilde{R}} {\textbf{x}} \right\} , \end{aligned}$$

(14)

where $B = {{\,\textrm{diag}\,}}(B^{(1)},\dots ,B^{(J)})$ and ${\tilde{R}} = {{\,\textrm{diag}\,}}(R,\dots ,R)$ are diagonal matrices, and ${\textbf{x}} = [{\textbf{x}}^{(1)};\dots ;{\textbf{x}}^{(J)}]$ again denotes the collection of all images. We can now note that the joint intra-image prior (14) encodes the assumption that

$$\begin{aligned} {\tilde{R}} {\textbf{x}} \sim {\mathcal {N}}({\textbf{0}},B^{-1}), \end{aligned}$$

(15)

i.e., ${\tilde{R}} {\textbf{x}}$ is zero-mean normal distributed with diagonal precision matrix B. At the same time, we can re-write the inter-image prior (12) more compactly as

$$\begin{aligned} p( {\textbf{x}} | \varvec{\gamma } ) \propto \det ( C )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T S^T C S {\textbf{x}} \right\} , \end{aligned}$$

(16)

where $C = {{\,\textrm{diag}\,}}(C^{(1,2)},\dots ,C^{(J-1,J)})$ is a diagonal matrix and

$$\begin{aligned} S = \begin{bmatrix} I &{} -I &{} &{} \\ &{} \ddots &{} \ddots &{} \\ &{} &{} I &{} -I \end{bmatrix} \end{aligned}$$

(17)

with I denoting the $N \times N$ identity matrix. Basically, S transforms the collection of images ${\textbf{x}} = [{\textbf{x}}^{(1)};\dots ;{\textbf{x}}^{(J)}]$ into the collection of differences of sequential images $[{\textbf{x}}^{(1)} - {\textbf{x}}^{(2)};\dots ;{\textbf{x}}^{(J-1)} - {\textbf{x}}^{(J)}]$. Similar to before, we can therefore note that the inter-image prior (16) encodes the assumption that

$$\begin{aligned} S {\textbf{x}} \sim {\mathcal {N}}({\textbf{0}},C^{-1}), \end{aligned}$$

(18)

Having (15) and (18) at hand is convenient since it positions us to combine these two assumptions in a single prior. To this end, note that (15) and (18) holding simultaneously is equivalent to

$$\begin{aligned} \begin{bmatrix} {\tilde{R}} \\ S \end{bmatrix} {\textbf{x}} \sim {\mathcal {N}}\left( {\textbf{0}}, \begin{bmatrix} B &{} 0 \\ 0 &{} C \end{bmatrix}^{-1} \right) , \end{aligned}$$

(19)

which corresponds to the combined prior

$$\begin{aligned} p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } ) \propto \det ( {\tilde{R}}^T B {\tilde{R}} + S^T C S )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T ( {\tilde{R}}^T B {\tilde{R}} + S^T C S ) {\textbf{x}} \right\} . \end{aligned}$$

(20)

The combined prior (20) is a desirable model, but it can have issues when the covariance matrix ${\tilde{R}}^T B {\tilde{R}} + S^T C S$ is non-invertible^{Footnote 4}, rendering the normalizing constant $\det ( {\tilde{R}}^T B {\tilde{R}} + S^T C S )^{1/2}$ zero. In this case, one may consider making the covariance matrix invertible by adding a small multiple of the identity to it. This approximation would allow some variability in the directions that collapse without changing in any substantial manner the variance along to dominant directions. Still, even when the covariance matrix is invertible, Bayesian inference can become computationally intractable due to the complicated relationship between the eigenvalues of ${\tilde{R}}^T B {\tilde{R}} + S^T C S$ and the hyper-parameters $\varvec{\beta }$ and $\varvec{\gamma }$. To overcome these challenges, we propose the following modification to the combined prior (20):

$$\begin{aligned} p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } ) \propto \det ( B )^{1/2} \det ( C )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T ( {\tilde{R}}^T B {\tilde{R}} + S^T C S ) {\textbf{x}} \right\} . \end{aligned}$$

(21)

The modified combined prior (21) offers computational benefits and produces satisfactory numerical results. We chose this particular form as the resulting fully conditional distributions for the hyper-parameters $\varvec{\beta }$ and $\varvec{\gamma }$ are gamma distributions, making Bayesian maximum a posteriori estimation computationally efficient. Further details on the proposed model’s inference are provided in Sect. 3. We acknowledge that this modification is ad-hoc, and future research could explore other potential modifications. Overall, the modified combined prior strikes a balance between modeling considerations and computational efficiency. Figure 3 provides a graphical summary of the JHBL model described in this section.

Remark 4

The joint intra-image and inter-image priors, (14) and (16), are not necessarily consistent. That is, the marginal distributions of ${\textbf{x}}$ derived from the conditional distributions of ${\textbf{x}}|\varvec{\beta }$ and ${\textbf{x}}|\varvec{\gamma }$, respectively, can differ. For this reason, we advise against performing Bayesian inference using the marginal prior. Instead, our Bayesian inference algorithm outlined in Sect. 3 will utilize the hierarchical structure of the Bayesian model.

3 Bayesian Inference

Assume that we are interested in the most probable value (mode) of the posterior $p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} )$, i. e., a combination of images and parameters for which the given data sets are most likely. To this end, we adopt the Bayesian coordinate descent (BCD) algorithm [24], which efficiently approximates the posterior mode by alternatingly updating the images and parameters based on the mode of the fully conditional distributions.

3.1 The Fully Conditional Distributions

We chose conditionally Gaussian priors and gamma hyper-priors because they are conditionally conjugate. This allows us to derive analytical expressions for the fully conditional densities, which is convenient for Bayesian inference. Bayes’ theorem states that

$$\begin{aligned} p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} ){} & {} \propto p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } ) p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } ) \nonumber \\{} & {} = p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha } ) p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } ) p( \varvec{\alpha } ) p( \varvec{\beta } ) p( \varvec{\gamma } ), \end{aligned}$$

(22)

where $p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha } )$ is the likelihood, $p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } )$ is the prior, and $p( \varvec{\alpha } ), p( \varvec{\beta } ), p( \varvec{\gamma } )$ are the hyper-priors. If we substitute the joint likelihood (7) and combined prior (21) together with the hyper-priors (6), (9), (13) into (22), we get

$$\begin{aligned} \begin{aligned}&p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} ) \\ \propto&\left( \prod _{j=1}^J \alpha _j^{M_j/2} \exp \left\{ -\frac{\alpha _j}{2} \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2 \right\} \right) \cdot \\&\left( \prod _{j=1}^J \det \left( B^{(j)} \right) ^{1/2} \exp \left\{ -\frac{1}{2} ( {\textbf{x}}^{(j)} )^T R^T B^{(j)} R {\textbf{x}}^{(j)} \right\} \right) \cdot \\&\left( \prod _{j=2}^J \, \det \left( C^{(j-1,j)} \right) ^{1/2} \exp \left\{ -\frac{1}{2} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) ^T C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\} \right) \cdot \\&\left( \prod _{j=1}^J \alpha _j^{\eta _{\alpha }-1} \exp \{ -\theta _{\alpha } \alpha _j \} \right) \left( \prod _{j=1}^J \beta _j^{\eta _{\beta }-1} \exp \{ -\theta _{\beta } \beta _j \} \right) \left( \prod _{j=2}^J \gamma _j^{\eta _{\gamma }-1} \exp \{ -\theta _{\gamma } \gamma _j \} \right) . \end{aligned} \end{aligned}$$

(23)

We can note from (23) that

$$\begin{aligned} \begin{aligned} p\left( \alpha _j | {\textbf{x}}^{(j)}, {\textbf{y}}^{(j)} \right)&\propto \alpha _j^{M_j/2 + \eta _\alpha - 1} \exp \left\{ - \left( \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2/2 + \theta _\alpha \right) \alpha _j \right\} , \\ p\left( \beta ^{(j)}_k | {\textbf{x}}^{(j)} \right)&\propto \left( \beta ^{(j)}_k \right) ^{1/2 + \eta _\beta - 1} \exp \left\{ - \left( \left[ R{\textbf{x}}^{(j)}\right] _k^2/2 + \theta _\beta \right) \beta ^{(j)}_k \right\} , \\ p\left( \gamma _n^{(j-1,j)} | {\textbf{x}}^{(j-1)}, {\textbf{x}}^{(j)} \right)&\propto \left( \gamma _n^{(j-1,j)} \right) ^{1/2 + \eta _\gamma - 1} \exp \left\{ - \left( \left[ {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)}\right] _n^2/2 + \theta _\gamma \right) \gamma _n^{(j-1,j)} \right\} , \end{aligned} \end{aligned}$$

(24)

and thus

$$\begin{aligned} p\left( \alpha _j | {\textbf{x}}^{(j)}, {\textbf{y}}^{(j)} \right)&\propto \Gamma \left( \alpha _j | \eta _{\alpha } + M_j/2, \theta _{\alpha } + \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2/2 \right) , \end{aligned}$$

(25)

$$\begin{aligned} p\left( \beta ^{(j)}_k | {\textbf{x}}^{(j)} \right)&\propto \Gamma \left( \beta ^{(j)}_k | \eta _{\beta } + 1/2, \theta _{\beta } + \left[ R{\textbf{x}}^{(j)}\right] _k^2/2 \right) , \end{aligned}$$

(26)

$$\begin{aligned} p\left( \gamma _n^{(j-1,j)} | {\textbf{x}}^{(j-1)}, {\textbf{x}}^{(j)} \right)&\propto \Gamma \left( \gamma _n^{(j-1,j)} | \eta _{\gamma } + 1/2, \theta _{\gamma } + \left[ {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right] _n^2/2 \right) , \end{aligned}$$

(27)

where (25), (26) hold for $j=1,\dots ,J$ and $k=1,\dots ,K$, while (27) holds for $j=2,\dots ,J$ and $n=1,\dots ,N$. Similarly, for $j=2,\dots ,J-1$, we have

$$\begin{aligned} \begin{aligned} p\left( {\textbf{x}}^{(1)} | {\textbf{x}}^{(2)}, {\textbf{y}}^{(1)}, \alpha _1, \varvec{\beta }^{(1)}, \varvec{\gamma }^{(1,2)} \right)&\propto {\mathcal {N}}\left( {\textbf{x}}^{(1)} | \varvec{\mu }^{(1)}, \Sigma ^{(1)} \right) , \\ p\left( {\textbf{x}}^{(j)} | {\textbf{x}}^{(j-1)}, {\textbf{x}}^{(j+1)}, {\textbf{y}}^{(j)}, \alpha _j, \varvec{\beta }^{(j)}, \varvec{\gamma }^{(j-1,j)}, \varvec{\gamma }^{(j,j+1)} \right)&\propto {\mathcal {N}}\left( {\textbf{x}}^{(j)} | \varvec{\mu }^{(j)}, \Sigma ^{(j)} \right) , \\ p\left( {\textbf{x}}^{(J)} | {\textbf{x}}^{(J-1)}, {\textbf{y}}^{(J)}, \alpha _J, \varvec{\beta }^{(J)}, \varvec{\gamma }^{(J-1,J)} \right)&\propto {\mathcal {N}}\left( {\textbf{x}}^{(N)} | \varvec{\mu }^{(N)}, \Sigma ^{(N)} \right) , \end{aligned} \end{aligned}$$

(28)

with covariance matrices

$$\begin{aligned} \begin{aligned} \Sigma ^{(1)}&= \left( \alpha _1 \left( F^{(1)} \right) ^T F^{(1)} + R^T B^{(1)} R + C^{(1,2)} \right) ^{-1}, \\ \Sigma ^{(j)}&= \left( \alpha _j \left( F^{(j)} \right) ^T F^{(j)} + R^T B^{(j)} R + C^{(j-1,j)} + C^{(j,j+1)} \right) ^{-1}, \\ \Sigma ^{(J)}&= \left( \alpha _J \left( F^{(J)} \right) ^T F^{(J)} + R^T B^{(J)} R + C^{(J-1,J)} \right) ^{-1}, \\ \end{aligned} \end{aligned}$$

(29)

and means

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{(1)}&= \Sigma ^{(1)} \left( \alpha _1 \left( F^{(1)} \right) ^T {\textbf{y}}^{(1)} + C^{(1,2)} {\textbf{x}}^{(2)} \right) , \\ \varvec{\mu }^{(j)}&= \Sigma ^{(j)} \left( \alpha _j \left( F^{(j)} \right) ^T {\textbf{y}}^{(j)} + C^{(j-1,j)} {\textbf{x}}^{(j-1)} + C^{(j,j+1)} {\textbf{x}}^{(j+1)} \right) , \\ \varvec{\mu }^{(J)}&= \Sigma ^{(J)} \left( \alpha _J \left( F^{(J)} \right) ^T {\textbf{y}}^{(J)} + C^{(J-1,J)} {\textbf{x}}^{(J-1)} \right) . \end{aligned} \end{aligned}$$

(30)

Since $\alpha _j$, $\beta ^{(j)}_k$, and $\gamma ^{(j-1,j)}_n$ are gamma distributed and only take on positive values, the covariance matrices in (29) are symmetric and positive definite (SPD).

3.2 Proposed Method: Joint Hierarchical Bayesian Learning

We are now positioned to formulate our JHBL method by adapting the BCD algorithm [24] to our model in Sect. 2. The BCD algorithm is described in Algorithm 1 and approximates the mode (or mean) of the posterior $p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} )$ by alternatingly updating the images and parameters based on the mode (or mean) of the fully conditional distributions (25), (26), (27), (28).

Algorithm 1 is efficient and straightforward to implement because of the analytical expressions for the fully conditional distributions we derived in (25), (26), (27), (28). The mode of a gamma distribution $\Gamma (\eta ,\theta )$ with $\eta ,\theta > 0$ is $\max \{0,(\eta -1)/\theta \}$. Thus, (25), (26), (27) imply that the $\varvec{\alpha }$-, $\varvec{\beta }$-, and $\varvec{\gamma }$-updates in Algorithm 1 are equivalent to

$$\begin{aligned} \left\{ \alpha _j \right\} ^{l+1}&= \frac{ \eta _{\alpha } + M_j/2 - 1}{ \theta _{\alpha } + \left\| F^{(j)} \{ {\textbf{x}}^{(j)} \}^{l} - {\textbf{y}}^{(j)} \right\| _2^2/2}, \quad{} & {} j=1,\ldots ,J, \end{aligned}$$

(31)

$$\begin{aligned} \left\{ \beta ^{(j)}_k \right\} ^{l+1}&= \frac{ \eta _{\beta } - 1/2 }{ \theta _{\beta } + \left[ R \{ {\textbf{x}}^{(j)} \}^{l} \right] _k^2/2 }, \quad{} & {} j=1,\ldots ,J,\ k=1,\ldots ,K, \end{aligned}$$

(32)

$$\begin{aligned} \left\{ \gamma ^{(j-1,j)}_n \right\} ^{l}&= \frac{ \eta _{\gamma } - 1/2 }{ \theta _{\gamma } + \left[ \{ {\textbf{x}}^{(j-1)} \}^{l} - \{ {\textbf{x}}^{(j)} \}^{l} \right] _n^2/2}, \quad{} & {} j=2,\ldots ,J,\ n=1,\ldots ,N, \end{aligned}$$

(33)

assuming nonnegative numerators, where $\{ \ \}^{l}$ denotes the l-th iteration of the term inside the curved brackets. Further, (28) implies that the ${\textbf{x}}$-update in Algorithm 1 is equivalent to solving the linear systems

$$\begin{aligned} \left\{ G^{(j)} \right\} ^{l+1} \left\{ {\textbf{x}}^{(j)} \right\} ^{l+1} = \left\{ {\textbf{b}}^{(j)} \right\} ^{l+1}, \quad j=1,\dots ,J, \end{aligned}$$

(34)

with SPD coefficient matrices

$$\begin{aligned} \begin{aligned} \left\{ G^{(1)} \right\} ^{l+1} =&\ \{ \alpha _1 \}^{l+1} (F^{(1)})^T F^{(1)} + R^T \{ B^{(1)} \}^{l+1} R + \{ C^{(1,2)} \}^{l+1}, \\ \left\{ G^{(j)} \right\} ^{l+1} =&\ \{ \alpha _j \}^{l+1} (F^{(j)})^T F^{(j)} + R^T \{ B^{(j)} \}^{l+1} R + \{ C^{(j-1,j)} \}^{l+1} \\&+ \{ C^{(j,j+1)} \}^{l+1}, \\ \left\{ G^{(J)} \right\} ^{l+1} =&\ \{ \alpha _J \}^{l+1} (F^{(J)})^T F^{(J)} + R^T \{ B^{(J)} \}^{l+1} R + \{ C^{(J-1,J)} \}^{l+1}, \end{aligned} \end{aligned}$$

(35)

and right-hand sides

$$\begin{aligned} \begin{aligned} \left\{ {\textbf{b}}^{(1)} \right\} ^{l+1} =&\ \{ \alpha _1 \}^{l+1} ( F^{(1)} )^T {\textbf{y}}^{(1)} + \{ C^{(1,2)} \}^{l+1} \{ {\textbf{x}}^{(2)} \}^{l}, \\ \left\{ {\textbf{b}}^{(j)} \right\} ^{l+1} =&\ \{ \alpha _j \}^{l+1} ( F^{(1)} )^T {\textbf{y}}^{(j)} + \{ C^{(j-1,j)} \}^{l+1} \{ {\textbf{x}}^{(j-1)} \}^{l} \\&+ \{ C^{(j,j+1)} \}^{l+1} \{ {\textbf{x}}^{(j+1)} \}^{l}, \\ \left\{ {\textbf{b}}^{(J)} \right\} ^{l+1}&= \ \{ \alpha _J \}^{l+1} ( F^{(J)} )^T {\textbf{y}}^{(J)} + \{ C^{(J-1,J)} \}^{l+1} \{ {\textbf{x}}^{(J-1)} \}^{l}, \end{aligned} \end{aligned}$$

(36)

for $j=2,\dots ,J-1$. Notably, we compute the $(l+1)$-th iteration of the ${\textbf{x}}^{(j)}$’s using the l-th iteration of the neighboring images in (36). This allows us to parallelize the ${\textbf{x}}^{(j)}$-updates (34), making our method efficient even for large image sequences. We can now summarize our JHBL method for sequential image recovery as in Algorithm 2.

3.3 Separate Recovery as a Special Case

We can recover the generalized sparse Bayesian learning (GSBL) method [24] from our JHBL procedure as a special/limit case. If $\eta _{\gamma } \le 1/2$ or $\theta _{\gamma } \rightarrow \infty $, then the $\varvec{\gamma }^{(j)}$-update (33) becomes

$$\begin{aligned} \left\{ \varvec{\gamma }^{(j-1,j)} \right\} ^{l} = 0, \quad j=2,\dots ,J, \end{aligned}$$

(37)

and the ${\textbf{x}}^{(j)}$-update (34) reduces to

$$\begin{aligned} \left( \{ \alpha _j \}^{l+1} (F^{(j)})^T F^{(j)} + R^T \{ B^{(j)} \}^{l+1} R \right) \left\{ {\textbf{x}}^{(j)} \right\} ^{l+1} = \{ \alpha _j \}^{l+1} ( F^{(j)} )^T {\textbf{y}}^{(j)}, \end{aligned}$$

(38)

for $j=1,\dots ,J$. In this case, Algorithm 2 corresponds to using the GSBL algorithm [24] to recover the images separately.

3.4 Efficient Implementation

A few remarks on Algorithm 2 are in order.

Remark 5

We initialize the images $\{ {\textbf{x}}^{(1)} \}^{0},\dots ,\{ {\textbf{x}}^{(J)} \}^{0}$ in Algorithm 2 as the separately recovered images using the GSBL algorithm [24], which we efficiently implemented in parallel.

Remark 6

The different ${\textbf{x}}^{j}$-, $\alpha _j$-, $\varvec{\beta }^{(j)}$-, and $\varvec{\gamma }^{(j)}$-updates can be easily parallelized. This makes Algorithm 2 efficient even for large image sequences.

Remark 7

The coefficient matrices in the ${\textbf{x}}^{(j)}$-updates (34) can become prohibitively large in imaging applications. To avoid storage and efficiency issues, we identify the solutions of (34) with the unique minimizers of the quadratic functionals

$$\begin{aligned} L^{(j)}({\textbf{x}}) = {\textbf{x}}^T G^{(j)} {\textbf{x}} - 2 {\textbf{x}}^T {\textbf{b}}^{(j)}, \quad j=1,\dots ,J, \end{aligned}$$

(39)

which is possible since the $G^{(j)}$’s are (almost surely) SPD. We then efficiently solve for the minimizers of (39) using a gradient descent method [24]. In our implementation, we use five gradient descent steps for every iteration of the ${\textbf{x}}^{(j)}$-updates.

Remark 8

We stop the iterations in Algorithm 2 if the average relative and absolute change between two subsequent image sequence iterations w. r. t. the $\Vert \cdot \Vert _2$-norm are less than $10^{-3}$ or if a maximum number of $10^{3}$ iterations is reached.

Remark 9

A detailed analysis regarding the convexity of the cost function, $-\log p({\textbf{x}},\varvec{\alpha },\varvec{\beta },\varvec{\gamma }| {\textbf{y}})$, and the convergence of Algorithm 2 exceeds the scope of this paper and will be addressed in future works.

4 Numerical Tests

We consider two numerical tests to demonstrate the performance of our JHBL method.

4.1 Sequential Magnetic Resonance Imaging

Given is a temporal sequence of six $128 \times 128$ phantom images obtained by a GE HTXT 1.5T clinical magnetic resonance imaging (MRI) scanner [33], from which we generate under-sampled and indirect Fourier data. Figure 4a, b and c show the first three reference images, where the change consists of the rotating left (green) ellipse and the down-moving right (yellow) ellipse. The Fourier samples contained in the data sets are

$$\begin{aligned} y^{(j)}_{k,l} = \int ^1_0\int ^1_0 x^{(j)}(s,t)e^{-i2\pi (ks+lt)}\, \textrm{d}s \, \textrm{d}t, \quad - \left\lceil \llceil \frac{N_1}{2} \right\rceil \rrceil \le k,l < \left\lceil \llceil \frac{N_1}{2} \right\rceil \rrceil , \end{aligned}$$

(40)

for $j = 1,\dots ,J$, where $N = N_1^2$ and $x^{(j)}$ is a function descibing the jth image. We use the discrete Fourier transform as a linear forward operator, thereby introducing model discrepancy and avoiding the inverse crime [30]. Further, each data set is missing Fourier samples for the symmetric bands

$$\begin{aligned} {\mathcal {K}}_j = \left[ \pm (10j + 1), \pm (10j+10) \right] ^2, \quad j=1,\dots ,6, \end{aligned}$$

(41)

and the remaining Fourier samples contain additive i. i. d. zero-mean normal noise. The amount of noise is measured using the signal-to-noise ratio (SNR)

$$\begin{aligned} \textrm{SNR}^{(j)} = 10 \log _{10} \left( \alpha _j y^{(j)}_{0,0} \right) , \quad j=1,\dots ,J, \end{aligned}$$

(42)

where $y^{(j)}_{0,0}$ is the average of the jth image. The SNR for all images in Fig. 4 is 2.

Figure 4d, e and f illustrate the separately recovered images from the noisy and under-sampled Fourier data by solving the (weighted) $\ell ^1$-regularized inverse problems (2) using the alternating directions method of multipliers (ADMM) [7, 42]. Figure 4g, h and i visualizes the separately recovered images from the same data sets using the GSBL algorithm [24] with hyper-parameters $\eta _{\alpha } = \eta _{\beta } = 1$ and $\theta _{\alpha } = \theta _{\beta } = 10^{-3}$. In both cases, we used an anisotropic first-order TV regularization operator

$$\begin{aligned} R = \begin{bmatrix} I \otimes D \\ D \otimes I \end{bmatrix} \quad \text {with} \quad D = \begin{bmatrix} -1 &{} 1 &{} &{} \\ &{} \ddots &{} \ddots &{} \\ &{} &{} -1 &{} 1 \end{bmatrix} \in {\mathbb {R}}^{(N_1-1) \times N_1}, \end{aligned}$$

(43)

to promote the images being piecewise constant. Figure 4 demonstrates that the Bayesian GSBL algorithm separately recovers the images more accurately than the deterministic algorithm, although we still observe some smeared features in Fig. 4g and h.

Figure 5 illustrates the corresponding jointly recovered images. Specifically, Fig. 5a, b and c visualizes the jointly recovered images using ADMM to solve the joint $\ell ^1$-regularized inverse problem (3) as proposed in [42], which we use as a benchmark. Figure 5d, e and f show the jointly recovered images using our JHBL algorithm (Algorithm 2). Following the discussion in Sect. 2, we chose the hyper-parameters as $\eta _{\alpha } = \eta _{\beta } = 1$, $\eta _{\gamma } = 2$, and $\theta _{\alpha } = \theta _{\beta } = \theta _{\gamma } = 10^{-3}$.

Remark 10

Following [24], $\eta _{\alpha } = \eta _{\beta } = 1$ and $\theta _{\alpha } = \theta _{\beta } = 10^{-3}$ are usual choices for the GSBL algorithm. Initially, we also tried to use $\eta _{\gamma } = 1$ and $\theta _{\gamma } = 10^{-3}$ for the parameters of the inter-image hyper-prior, but observed that the inter-image coupling was sometimes too strong. We thus used $\eta _{\gamma } = 2$ and $\theta _{\gamma } = 10^{-3}$ instead. The heuristic behind increasing the shape parameter to $\eta _{\gamma }$ is as follows: We assume that every consecutive pair of images contains more no-change regions than change regions. We thus expect $\gamma ^{(j-1,j)}_n \gg 0$ for most of the $\gamma ^{(j-1,j)}_n$’s in (12), indicating no change, and $\gamma ^{(j-1,j)}_n \approx 0$ for only a few of the $\gamma ^{(j-1,j)}_n$’s. We increasingly promote this behavior for the $\gamma ^{(j-1,j)}_n$’s by increasing the shape parameter $\eta _{\gamma }$ in (13). We further investigate the influence of the shape and rate parameter $\eta _{\gamma }$ and $\theta _{\gamma }$ on the inter-image coupling in Sect. 4.3.

Both joint methods yield more accurate recovered images than the respective separate method. At the same time, our JHBL algorithm provides notably more accurate recovered images than the deterministic joint method proposed in [42].

We observed the proposed JHBL method to yield the most accurate recovered images for all combinations of Fourier samples and SNRs we considered. Figure 6a and b show the average relative log-error for different numbers of Fourier samples and SNRs. The relative log-error of the jth image is

$$\begin{aligned} E_{\log }^{(j)} = \log _{10}\left( \frac{\left\| {\textbf{x}}^{(j)}_{\textrm{ref}} - {\textbf{x}}^{(j)} \right\| _2}{\left\| {\textbf{x}}^{(j)}_{\textrm{ref}}\right\| _2} \right) , \quad j=1,\dots ,J, \end{aligned}$$

(44)

where ${\textbf{x}}^{(j)}_{\textrm{ref}}$ and ${\textbf{x}}^{(j)}$ are the reference and recovered image, respectively. The average relative log-error is $\left( E_{\log }^{(1)}+\dots +E_{\log }^{(J)}\right) /J$.

Another advantage of our JHBL method is that it allows us to quantify uncertainty. Indeed, we recover a full Gaussian distribution, ${\mathcal {N}}(\varvec{\mu }^{(j)}, \Sigma ^{(j)})$, for every individual image, which is conditioned on the observed data sets and the other estimated parameters. To demonstrate this, Fig. 7a and b illustrate the pixelwise variance of the separately and jointly recovered first image using the GSBL and our JHBL algorithm. The pixelwise variance corresponds to the diagonal elements of the covariance matrices (29). Comparing Fig. 7a and b, we observe reduced uncertainty in the jointly recovered first image, except for the change regions around the two ellipses. The reduced uncertainty of the recovered first image by our JHBL method away from the change regions is due to the method “borrowing” information from the neighboring images.

Another convenient by-product of our JHBL method is that it allows for edge and change detection, which we illustrate in Figs. 8 and 9. Figure 8a and b respectively visualize the final estimate of the first and second half of the intra-image regularization parameter $\varvec{\beta }^{(1)}$ of our JHBL method for the first recovered image. Since we used an anisotropic first-order TV operator (43), the intra-image regularization parameter values indicate edges in the vertical and the horizontal direction. We can combine them by considering the pixelwise average of the images in Fig. 8a and b to obtain the edge profile in Fig. 8c.

Figure 9a and b visualize the final estimate of the conditional change mask used by our JHBL method for the first and second pair of images, while Fig. 9c and d illustrates the corresponding pre-computed binary change masks used in [42]. The pre-computed binary change masks rely on the Fourier data sets and sequential images containing only objects with closed boundaries. By contrast, the change masks used in our JHBL method neither rely on Fourier data nor the sequential images containing only objects with closed boundaries.

4.2 Sequential Image Deblurring

We next consider the deconvolution of a temporal sequence of six $400 \times 400$ images from the GRAM road-traffic monitoring data set [26]. Figure 10 illustrates the first three reference images and their noisy blurred versions. The i. i. d. normal noise ${\textbf{e}}^{(j)}$ in the corresponding linear data model (1) has $\textrm{SNR} = 2+j$, $j=1,\dots ,J$. The forward operator F is obtained by applying the tensor-product midpoint quadrature to the convolution equations

$$\begin{aligned} y^{(j)}(s,t) = \int _0^1 \int _0^1 k(s-s',t-t') x^{(j)}(s,t) \, \textrm{d}s' \, \textrm{d}t', \quad j=1,\dots ,J, \end{aligned}$$

(45)

where $x^{(j)}$ is a function describing the jth image. We assume a Gaussian convolution kernel

$$\begin{aligned} k(s,t)=\frac{1}{2\pi \gamma ^2}\exp {\left( -\frac{s^2+t^2}{2\gamma ^2}\right) } \end{aligned}$$

(46)

with blurring parameter $\gamma = 5 \cdot 10^{-3}$, which makes F highly ill-conditioned. We use an anisotropic second-order TV regularization operator

$$\begin{aligned} R = \begin{bmatrix} I \otimes D \\ D \otimes I \end{bmatrix} \quad \text {with} \quad D = \begin{bmatrix} -1 &{} 2 &{} -1 &{} &{} \\ &{} \ddots &{} \ddots &{} \ddots &{} \\ &{} &{} -1 &{} 2 &{} -1 \end{bmatrix} \in {\mathbb {R}}^{(N_1-2) \times N_1}, \end{aligned}$$

(47)

where $N_1 = 400$ is the number of pixels in each direction to promote the images being piecewise smooth—but not necessarily piecewise constant.

We can no longer use the deterministic method [42] to pre-compute binary change masks directly from the indirect data sets. However, we can still use our JHBL method to jointly recover the sequential images in a Bayesian setting. Figure 11 illustrates the separately (top row) and jointly (bottom row) recovered images from the noisy blurred images in Fig. 10d, e and f using the GSBL and our JHBL algorithm, respectively. The hyper-parameters are $\eta _{\alpha } = \eta _{\beta } = 1$, $\eta _{\gamma } = 1$, $\theta _{\alpha } = \theta _{\beta } = 10^{-3}$, and $\theta _{\gamma }=10^{-1}$. The jointly recovered images using our JHBL algorithm are more accurate than the separately recovered images.

As mentioned, the proposed JHBL method can quantify uncertainty in the recovered images with increased reliability, which is often desirable in applications with no reference images. We demonstrate this in Fig. 12, which visualizes the pixelwise variances of the recovered first image in Fig. 11. We again see that jointly recovering the images by “borrowing" missing information from the other images reduces uncertainty.

Moreover, Fig. 13a and b illustrate the final estimate of the first and second half of the intra-image regularization parameter $\varvec{\beta }^{(1)}$ of our JHBL method for the first recovered image in Fig. 11d. Since we used an anisotropic second-order TV regularization operator (47), the intra-image regularization parameter values indicate edges in the vertical and horizontal direction. We can again combine them, e. g., by considering the pixelwise average of the images in Fig. 13a and b to obtain the edge profile in Fig. 13c.

Finally, Fig. 14 provides the final estimates for the Bayesian change mask for the first and second pair of images in Fig. 11 using our JHBL algorithm only.

4.3 Investigating the Influence of the Shape and Rate Parameter

We end this section by briefly demonstrating how different choices for the shape and rate parameter, $\eta _{\gamma }$ and $\theta _{\gamma }$, of the inter-image hyper-prior (see Sect. 2.3) influence the jointly recovered image. Recall (,e.g., from Remark 10) that we expect the coupling between images to increase/decrease as smaller/larger values for the inter-image hyper-parameters $\gamma _n^{(j-1,j)}$ become more likely. At the same time, the expected value and variance of the inter-image hyper-prior (13) are $\eta _{\gamma }/\theta _{\gamma }$ and $\eta _{\gamma }/\theta _{\gamma }^2$, respectively. Hence, if $\eta _{\gamma }$ is increased, we expect larger values for $\gamma _n^{(j-1,j)}$ to become more likely and the inter-image coupling to decrease. On the other hand, if $\theta _{\gamma }$ is increased, we expect smaller values for $\gamma _n^{(j-1,j)}$ to become more likely and the inter-image coupling to increase.

Figure 15 illustrates the connection between $\eta _{\gamma }, \theta _{\gamma }$ and the strength of the inter-image coupling for the first phantom image from the sequential MRI test case previously discussed in Sect. 4.1. Specifically, the second row of Fig. 15 shows that the jointly recovered image is visibly close to the separately recovered image for $\eta _{\gamma } = 0.8$. At the same time, the inter-image coupling becomes so strong that some spurious artifacts from the subsequent images are introduced in change regions for $\eta _{\gamma } = 4$. The third row of Fig. 15 demonstrates a similar behavior for fixed $\eta _{\gamma } = 2$ and decreasing $\theta _{\gamma }$; The jointly recovered image is visibly close to the separately recovered image for $\theta _{\gamma } = 10^{-1}$, while the inter-image coupling becomes so strong that some spurious artifacts are introduced in change regions for $\theta _{\gamma } = 10^{-3.5}$. Future work will optimize the shape and rate parameter selection of the inter-image hyper-prior to further increase the advantage of inter-image coupling between sequential images.

5 Summary

We presented a new method to jointly recover temporal image sequences by “borrowing” missing information in each image from the other images. Our JHBL method is simple to implement, easily parallelized, and efficient. We found our method to yield more accurate recovered images than both, separately recovering the images using the Bayesian GSBL algorithm and jointly recovering them using the deterministic method from [42]. In addition, our method avoids exhaustive parameter fine-tuning. Moreover, the deterministic method proposed in [42] is limited to Fourier data sets and images only containing objects with closed boundaries to pre-compute a binary change mask. By contrast, we treat the change mask that steers the coupling between neighboring images as a random variable, which we estimate together with the images and other parameters, making our method applicable to general modalities. We demonstrated this by considering a sequential image deblurring problem based on the GRAM road-traffic monitoring data set. Another distinct advantage of our method is that it allows us to quantify uncertainty, which is vital in applications without reference images. An additional valuable by-product is that our method allows for edge and change detection.

Future work will address the selection of hyper-parameters. Future efforts will also investigate the structural properties of the cost function and convergence of the proposed JHBL algorithm. Finally, a comparison or combination of the proposed BCD algorithm for Bayesian inference with other existing methods would be of interest.

Data Availability

All data sets are publicly available. MATLAB codes used for all numerical results are available from the authors upon request.

Notes

We can also use (1) for complex (Fourier) data.
Even when the exact noise precision is known, using it as a fixed value for $\alpha _j$ can yield suboptimal reconstructions [44].
For instance, Fig. 9c and d in Sect. 4.1 illustrate the pre-computed binary change mask used to jointly recover the images in Fig. 1.
For example, consider the discrete gradient operator R with $[R {\textbf{x}}^{(j)}]k = x^{(j)}{k+1} - x^{(j)}_k$. If ${\textbf{x}}^{(1)} = \dots = {\textbf{x}}^{(J)}$ is a constant sequence of constant images, where ${\textbf{x}}^{(j)} = C \cdot [1,\dots ,1]^T$ and $C \in {\mathbb {R}}$, then ${\tilde{R}}^T B {\tilde{R}} + S^T C S$ maps it to zero. This illustrates that the kernel of ${\tilde{R}}^T B {\tilde{R}} + S^T C S$ is not trivial and, as a result, ${\tilde{R}}^T B {\tilde{R}} + S^T C S$ is non-invertible.

References

Adcock, B., Gelb, A., Song, G., Sui, Y.: Joint sparse recovery based on variances. SIAM J. Sci. Comput. 41(1), A246–A268 (2019)
Article MathSciNet MATH Google Scholar
Archibald, R., Gelb, A., Platte, R.B.: Image reconstruction from undersampled Fourier data using the polynomial annihilation transform. J. Sci. Comput. 67(2), 432–452 (2016)
Article MathSciNet MATH Google Scholar
Archibald, R., Gelb, A., Yoon, J.: Polynomial fitting for edge detection in irregularly sampled signals and images. SIAM J. Numer. Anal. 43(1), 259–279 (2005)
Article MathSciNet MATH Google Scholar
Babacan, S.D., Molina, R., Katsaggelos, A.K.: Parameter estimation in TV image restoration using variational distribution approximation. IEEE Trans. Image Process. 17(3), 326–339 (2008)
Article MathSciNet Google Scholar
Bardsley, J.M.: MCMC-based image reconstruction with uncertainty quantification. J. Sci. Comput. 34(3), A1316–A1332 (2012)
MathSciNet MATH Google Scholar
Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: CVPR 2011, pp. 3457–3464. IEEE (2011)
Boyd, S., Parikh, N., Chu, E.: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Now Publishers Inc (2011)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–526 (2010)
Article MathSciNet MATH Google Scholar
Calvetti, D., Pragliola, M., Somersalo, E.: Sparsity promoting hybrid solvers for hierarchical Bayesian inverse problems. J. Sci. Comput. 42(6), A3761–A3784 (2020)
MathSciNet MATH Google Scholar
Calvetti, D., Somersalo, E.: A Gaussian hypermodel to recover blocky objects. Inverse Probl. 23(2), 733 (2007)
Article MathSciNet MATH Google Scholar
Calvetti, D., Somersalo, E.: An Introduction to Bayesian Scientific Computing: Ten Lectures on Subjective Computing, vol. 2. Springer Science & Business Media, London (2007)
MATH Google Scholar
Calvetti, D., Somersalo, E.: Hypermodels in the Bayesian imaging framework. Inverse Probl. 24(3), 034013 (2008)
Article MathSciNet MATH Google Scholar
Calvetti, D., Somersalo, E., Strang, A.: Hierachical Bayesian models and sparsity: $\ell _2$-magic. Inverse Probl. 35(3), 035003 (2019)
Article MATH Google Scholar
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted $\ell _1$ minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)
Article MathSciNet MATH Google Scholar
Cotter, S.F., Rao, B.D., Engan, K., Kreutz-Delgado, K.: Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Process. 53(7), 2477–2488 (2005)
Article MathSciNet MATH Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Ehrhardt, M.J., Thielemans, K., Pizarro, L., Atkinson, D., Ourselin, S., Hutton, B.F., Arridge, S.R.: Joint reconstruction of PET-MRI by exploiting structural similarity. Inverse Probl. 31(1), 015001 (2014)
Article MathSciNet MATH Google Scholar
Eldar, Y.C., Kutyniok, G.: Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shake from a single photograph. In: ACM SIGGRAPH 2006 Papers, pp. 787–794. ACM (2006)
Figueiredo, M.A., Bioucas-Dias, J.M., Nowak, R.D.: Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16(12), 2980–2991 (2007)
Article MathSciNet Google Scholar
Foucart, S., Rauhut, H.: A mathematical introduction to compressive sensing. Bull. Amer. Math. Soc. 54, 151–165 (2017)
MathSciNet MATH Google Scholar
Gelb, A., Scarnati, T.: Reducing effects of bad data using variance based joint sparsity recovery. J. Sci. Comput. 78(1), 94–120 (2019)
Article MathSciNet MATH Google Scholar
Glaubitz, J., Gelb, A.: High order edge sensors with $\ell ^1$ regularization for enhanced discontinuous Galerkin methods. SIAM J. Sci. Comput. 41(2), A1304–A1330 (2019)
Article MathSciNet MATH Google Scholar
Glaubitz, J., Gelb, A., Song, G.: Generalized sparse Bayesian learning and application to image reconstruction. SIAM/ASA J. Uncertain. Quantif. 11(1), 262–284 (2023)
Article MathSciNet Google Scholar
Groetsch, C.W.: Inverse Problems in the Mathematical Sciences, vol. 52. Springer, London (1993)
Guerrero-Gomez-Olmedo, R., Lopez-Sastre, R.J., Maldonado-Bascon, S., Fernandez-Caballero, A.: Vehicle tracking by simultaneous detection and viewpoint estimation. In: IWINAC 2013, Part II, LNCS 7931, pp. 306–316 (2013)
Hansen, P.C.: Discrete Inverse Problems: Insight and Algorithms. SIAM. University City Science Center Philadelphia, PA, USA (2010)
Book MATH Google Scholar
Hu, W., Xie, D., Fu, Z., Zeng, W., Maybank, S.: Semantic-based surveillance video retrieval. IEEE Trans. Image Process. 16(4), 1168–1181 (2007)
Article MathSciNet Google Scholar
Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160. Springer Science & Business Media, London (2006)
MATH Google Scholar
Kaipio, J., Somersalo, E.: Statistical inverse problems: discretization, model reduction and inverse crimes. J. Comput. Appl. Math. 198(2), 493–504 (2007)
Article MathSciNet MATH Google Scholar
Kaipio, J.P., Kolehmainen, V., Somersalo, E., Vauhkonen, M.: Statistical inversion and Monte Carlo sampling methods in electrical impedance tomography. Inverse Probl. 16(5), 1487 (2000)
Article MathSciNet MATH Google Scholar
Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-Laplacian priors. Adv. Neural. Inf. Process. Syst. 22, 1033–1041 (2009)
Google Scholar
Lalwani, G., Livingston Sundararaj, J., Schaefer, K., Button, T., Sitharaman, B.: Synthesis, characterization, in vitro phantom imaging, and cytotoxicity of a novel graphene-based multimodal magnetic resonance imaging - X-ray computed tomography contrast agent. J. Mater. Chem. B 2(22), 3519–3530 (2015)
Article Google Scholar
Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM Trans. Graph. 26(3), 70 (2007)
Article Google Scholar
Parisotto, S., Lellmann, J., Masnou, S., Schönlieb, C.B.: Higher-order total directional variation: Imaging applications. SIAM J. Imaging Sci. 13(4), 2063–2104 (2020)
Article MathSciNet MATH Google Scholar
Parisotto, S., Masnou, S., Schönlieb, C.B.: Higher-order total directional variation: Analysis. SIAM J. Imaging Sci. 13(1), 474–496 (2020)
Article MathSciNet MATH Google Scholar
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992)
Article MathSciNet MATH Google Scholar
Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)
Article MathSciNet MATH Google Scholar
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)
MathSciNet MATH Google Scholar
Vogel, C.R.: Computational Methods for Inverse Problems. SIAM. University City Science Center Philadelphia, PA, USA (2002)
Book MATH Google Scholar
Wipf, D.P., Rao, B.D.: An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Process. 55(7), 3704–3716 (2007)
Article MathSciNet MATH Google Scholar
Xiao, Y., Glaubitz, J., Gelb, A., Song, G.: Sequential image recovery from noisy and under-sampled Fourier data. J. Sci. Comput. 91(3), 1–29 (2022)
Article MathSciNet MATH Google Scholar
Yang, X., Shi, J., Zhou, Y., Wang, C., Hu, Y., Zhang, X., Wei, S.: Ground moving target tracking and refocusing using shadow in video-SAR. Remote Sens. 12(18), 3083 (2020)
Article Google Scholar
Zhang, Z., Rao, B.D.: Clarify some issues on the sparse Bayesian learning for sparse signal recovery. Technical Report. University of California at San Diego, La Jolla, CA, USA (2011)
Zhang, Z., Rao, B.D.: Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning. IEEE J. Sel. Top. Signal Process. 5(5), 912–926 (2011)
Article Google Scholar

Download references

Acknowledgements

We thank Guohui Song and the anonymous reviewers for helpful feedback on an earlier version of this manuscript.

Funding

’Open Access funding provided by the MIT Libraries’

Author information

Authors and Affiliations

Department of Mathematics, Dartmouth College, Hanover, NH, 03755, USA
Yao Xiao
Department of Aeronautics and Astronautics, MIT, Cambridge, MA, 02139, USA
Jan Glaubitz

Authors

Yao Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jan Glaubitz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Glaubitz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

JG acknowledges support from AFOSR #F9550-18-1-0316, ONR #N00014-20-1-2595, and the US Department of Energy, SciDAC program, under grant DE-SC0012704.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xiao, Y., Glaubitz, J. Sequential Image Recovery Using Joint Hierarchical Bayesian Learning. J Sci Comput 96, 4 (2023). https://doi.org/10.1007/s10915-023-02234-1

Download citation

Received: 03 August 2022
Revised: 02 May 2023
Accepted: 03 May 2023
Published: 18 May 2023
DOI: https://doi.org/10.1007/s10915-023-02234-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sequential Image Recovery Using Joint Hierarchical Bayesian Learning

Abstract

Similar content being viewed by others

Integrating the Missing Information Estimation into Multi-frame Super-Resolution

Trainable Regularization for Multi-frame Superresolution

Online multi-frame super-resolution of image sequences

1 Introduction

1.1 Our Contribution

1.2 Outline

2 The Joint Hierarchical Bayesian Model

2.1 The Likelihood

Remark 1

Remark 2

2.2 The Intra-image Prior

Remark 3

2.3 The Inter-image Prior

2.4 The Combined Prior

Remark 4

3 Bayesian Inference

3.1 The Fully Conditional Distributions

3.2 Proposed Method: Joint Hierarchical Bayesian Learning

3.3 Separate Recovery as a Special Case

3.4 Efficient Implementation

Remark 5

Remark 6

Remark 7

Remark 8

Remark 9

4 Numerical Tests

4.1 Sequential Magnetic Resonance Imaging

Remark 10

4.2 Sequential Image Deblurring

4.3 Investigating the Influence of the Shape and Rate Parameter

5 Summary

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation