1 Introduction

Many applications rely on recovering temporal image sequences from noisy, indirect, and incomplete data [6, 28, 42, 43]. We can often formulate this task as a sequence of linear inverse problems,

$$\begin{aligned} {\textbf{y}}^{(j)} = F^{(j)} {\textbf{x}}^{(j)} + {\textbf{e}}^{(j)}, \quad j=1,\dots ,J, \end{aligned}$$
(1)

where \({\textbf{y}}^{(j)}\) is a given data vector, \({\textbf{x}}^{(j)}\) is the (unknown) vectorized image, \(F^{(j)}\) is a known linear forward operator, and \({\textbf{e}}^{(j)}\) corresponds to unknown noise. The individual recovery of each image by separately solving the linear inverse problems (1) is a well-studied, although challenging problem by itself [25, 27, 40]. A prominent approach is to replace (1) with a nearby regularized inverse problem that promotes some prior belief about the unknown image. In imaging applications, it is often reasonable to assume that some linear transform of the unknown image \({\textbf{x}}\), say \(R {\textbf{x}}\), is sparse. This prior belief yields the \(\ell ^1\)-regularized inverse problems

$$\begin{aligned} \min _{{\textbf{x}}^{(j)}} \left\{ \Vert F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \Vert _2^2 + \lambda _j \Vert R {\textbf{x}}^{(j)} \Vert _1 \right\} , \quad j=1,\dots ,J, \end{aligned}$$
(2)

where R is the regularization operator and \(\lambda _j > 0\) are regularization parameters. Usual choices for R are discrete (high-order) total variation (TV) [37], total generalized/directional variation [8]/ [35, 36], and polynomial annihilation [2, 3, 23] operators. The rationale behind considering (2) is that the \(\ell ^1\)-norm, \(\Vert \cdot ||_1\), serves as a convex surrogate for the \(\ell ^0\)-“norm", \(\Vert \cdot \Vert _0\); an observation that lies in the heart of compressed sensing [16, 18, 21]. Another prominent approach, closely related to regularization, is Bayesian inverse problems [11, 29, 38]. In this setting, we express our lack of information about some of the quantities in (1) by modeling them as random variables, which (relations) are characterized by certain density functions. The fidelity and regularization term corresponds to the negative logarithm of the likelihood and prior density, respectively, and the regularized solution (2) corresponds to the maximizer of the posterior density. The class of conditionally Gaussian priors [9, 10, 12, 13, 39] is particularly suited to promote sparsity.

Here, we consider the case that each data set \({\textbf{y}}^{(j)}\) in (1) is missing vital information, which prevents the accurate individual recovery of the images \({\textbf{x}}^{(j)}\). A common strategy is to restore the missing information in each image by “borrowing” it from the other images. [1, 15, 17, 41, 45] considered deterministic and Bayesian methods of such a flavor that rely on a common sparsity assumption, meaning that \(R {\textbf{x}}^{(1)},\dots ,R {\textbf{x}}^{(J)}\) have the same support. Unfortunately, temporally changing image sequences often violate the common sparsity assumption. Recently, [42] addressed this problem by locally coupling the images in no-change regions while decoupling them in change regions. This was done by first computing a diagonal change mask \(C^{(j-1,j)}\) directly from the consecutive data sets \({\textbf{y}}^{(j-1)}, {\textbf{y}}^{(j)}\) and using this change mask to penalize any difference between \({\textbf{x}}^{(j-1)}\) and \({\textbf{x}}^{(j)}\) in no-change regions. Let \([C^{(j-1,j)}]_{n,n} = 0\) in change regions and \([C^{(j-1,j)}]_{n,n} = 1\) in no-change regions. The joint \(\ell ^1\)-regularized inverse problem used in [42] is

$$\begin{aligned}{} & {} \min _{{\textbf{x}}^{(1)},\dots ,{\textbf{x}}^{(J)}} \left\{ \sum _{j=1}^J \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2 + \sum _{j=1}^J \lambda _j \left\| R {\textbf{x}}^{(j)} \right\| _1 \right. \nonumber \\{} & {} \quad \left. + \sum _{j=2}^J \mu _j \left\| C^{(j-1,j)} ( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} ) \right\| _2^2 \right\} , \end{aligned}$$
(3)

where the \(\mu _j \ge 0\) are fixed coupling parameters. Equation (3) balances data fidelity, intra-image regularization (sparsity of \(R {\textbf{x}}^{(j)}\)), and inter-image regularization (coupling in no-change regions). Roughly speaking, the last term in (3),

$$\begin{aligned} \left\| C^{(j-1,j)} ( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} ) \right\| _2^2 = \sum _{n=1}^N \left[ C^{(j-1,j)}\right] _{n,n} \left| x^{(j-1)}_n - x^{(j)}_n \right| ^2, \end{aligned}$$
(4)

penalizes any change between the two subsequent images \({\textbf{x}}^{(j-1)}\) and \({\textbf{x}}^{(j)}\) in no-change regions, where \([C^{(j-1,j)}]_{n,n} = 1\), since the value of the objective function in (3) increases with \(|x^{(j-1)}_n - x^{(j)}_n|\) in this case. At the same time, in change regions, the difference \(|x^{(j-1)}_n - x^{(j)}_n|\) does not influence the value of the objective function since \([C^{(j-1,j)}]_{n,n} = 0\) in this case. Figure 1 illustrates the advantage of jointly recovering a temporal sequence of magnetic resonance images from noisy and under-sampled Fourier data (see Sect. 4 for more details).

Fig. 1
figure 1

Three images of a temporal magnetic resonance image sequence with a rotating left (green) and a down-moving right (yellow) ellipse. The first row shows the reference images, the second row the separately recovered images using (2), and the third row the jointly recovered images using (3). The data sets are noisy and miss different Fourier samples (Color figure online)

While the joint recovery of changing images using (3) can yield an improved accuracy, there remain issues with robustness and the range of application: (I1) Selecting appropriate regularization and coupling parameters is a non-trivial task, and their choice can critically influence the quality of the recovered images. Although (re-)weighted [1, 14, 22] \(\ell ^1\)-regularization can increase the robustness w. r. t. the intra-image regularization parameters \(\lambda _j\), selecting suitable inter-image coupling parameters \(\mu _j\) remains an open problem. (I2) The method proposed in [42, Section 3] to pre-compute the change masks uses Fourier data and assumes that the sequential images only include objects with closed boundaries. This assumption prevents the application to problems with other types of data acquisition. Finally, one has to tune some problem-dependent free parameters by hand on a case-by-case basis.

1.1 Our Contribution

We propose a joint hierarchical Bayesian learning (JHBL) method for sequential image recovery. The method is easy to implement, efficient, and simple to parallelize. Our method avoids issues (I1) and (I2) by reinterpreting all involved parameters—including the change mask—as random variables, which we then estimate together with the recovered images. In particular, for the random variables responsible for the inter-image coupling, we use weakly-informative gamma distributions that favor small pixel-wise differences (in no-change regions) but allow for occasional outliers (in change regions). Our approach does not rely on Fourier data or images showing objects with closed boundaries. We demonstrate that our JHBL method improves the accuracy of all individual images. Another advantage is that our method quantifies the uncertainty in the recovered images, which is often desirable. Some preliminary results for magnetic resonance imaging and road-traffic monitoring indicate the advantage of JHBL for sequential image recovery.

1.2 Outline

In Sect. 2, we present the JHBL model that promotes intra-image sparsity and inter-image coupling. In Sect. 3, we propose an efficient method for Bayesian inference. In Sect. 4, we demonstrate the performance of the resulting JHBL method for test cases from sequential deblurring and magnetic resonance imaging. Section 5 offers some concluding thoughts.

2 The Joint Hierarchical Bayesian Model

We now describe the hierarchical Bayesian model we use to develop our JHBL method.

2.1 The Likelihood

Consider the data model (1) and assume that \({\textbf{y}}^{(j)} \in {\mathbb {R}}^{M_j}\), \(F \in {\mathbb {R}}^{M_j \times N}\), \({\textbf{x}}^{(j)} \in {\mathbb {R}}^N\), and that \({\textbf{e}}^{(j)} \in {\mathbb {R}}^{M_j}\) is independent and identically distributed (i.i.d.) zero-mean normal noise, \(e_m^{(j)} \sim {\mathcal {N}}(0,\alpha _j)\) for \(m=1,\dots ,M_j\), with noise precision \(\alpha _j > 0\).Footnote 1 The jth likelihood function, which is the conditional probability density of \({\textbf{y}}^{(j)}\) given \({\textbf{x}}^{(j)}\) and \(\alpha _j\), is

$$\begin{aligned} p({\textbf{y}}^{(j)} | {\textbf{x}}^{(j)}, \alpha _j) \propto \alpha _j^{M_j/2} \exp \left\{ -\frac{\alpha _j}{2} \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2 \right\} , \end{aligned}$$
(5)

where “\(\propto \)” means that the two sides are equal up to a multiplicative constant. We also treat the noise precision \(\alpha _j\) as a random variable that is learned together with the images \({\textbf{x}}^{(j)}\).Footnote 2 We assume that \(\alpha _j\) is gamma distributed,

$$\begin{aligned} p(\alpha _j) = \Gamma (\alpha _j|\eta _{\alpha _j},\theta _{\alpha _j}) \propto \alpha _j^{\eta _{\alpha _j}-1} \exp \{-\theta _{\alpha _j} \alpha _j\}, \quad j=1,\dots ,J. \end{aligned}$$
(6)

For simplicity, we use the same shape and rate parameter for all noise precisions, denoted by \(\eta _{\alpha }\) and \(\theta _{\alpha }\). We use the gamma distribution because it is conditionally conjugate to the normal distribution, which is convenient for Bayesian inference (see Sect. 3). Moreover, we can make the noise hyper-priors (6) flat and therefore uninformative by choosing \(\theta _{\alpha } \approx 0\). Figure 2 illustrates the gamma density function for different shape and rate parameters. Remark 1 addresses the possibility of using informative hyper-priors.

Fig. 2
figure 2

Gamma density functions for different shape and rate parameters \(\eta , \theta \). Recall that the mode, expected value, and variance of a gamma distribution \(\Gamma (\eta ,\theta )\) are \((\eta -1)/\theta \), \(\eta /\theta \), and \(\eta /\theta ^2\), respectively. In particular, decreasing the rate \(\theta \) makes the density flatter, while \(\eta \) influences its peak (Color figure online)

If we denote the collection of all images by \({\textbf{x}} = [{\textbf{x}}^{(1)};\dots ;{\textbf{x}}^{(J)}]\), of all data sets by \({\textbf{y}} = [{\textbf{y}}^{(1)};\dots ;{\textbf{y}}^{(J)}]\), and of all noise precisions by \(\varvec{\alpha } = [\alpha _1;\dots ;\alpha _J]\), then the joint likelihood function is

$$\begin{aligned} p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha } ) = \prod _{j=1}^J p({\textbf{y}}^{(j)} | {\textbf{x}}^{(j)}, \alpha _j) \propto \prod _{j=1}^J \alpha _j^{M_j/2} \exp \left\{ -\frac{\alpha _j}{2} \Vert F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \Vert _2^2 \right\} , \end{aligned}$$
(7)

assuming that the data sets are conditionally independent. A few remarks are in order.

Remark 1

For simplicity we use the same hyper-prior \(\Gamma (\cdot |\eta _{\alpha },\theta _{\alpha })\) and parameters \(\eta _{\alpha },\theta _{\alpha }\) for all components of \(\varvec{\alpha }\). We do this since we assume no prior knowledge about the noise variances of the different measurement vectors. However, if one has a reasonable a priori notion of the underlying noise variances, the choice for hyper-prior could be modified accordingly [5, 13].

Remark 2

The data sets being conditionally independent means that if we know the images \({\textbf{x}}\) and the noise precisions \(\varvec{\alpha }\), then knowledge of one data set \({\textbf{y}}^{(j)}\) provides no additional information about the likelihood of another data set \({\textbf{y}}^{(i)}\) with \(i \ne j\).

2.2 The Intra-image Prior

We assume that some linear transform of the images, say \(R {\textbf{x}}^{(j)}\) with \(R \in {\mathbb {R}}^{K \times N}\), is sparse. One can model sparsity by various priors, including TV [4, 31], mixture-of-Gaussian [19], Laplace [20], and hyper-Laplace [32, 34] priors. Here, we use conditionally Gaussian priors, which are particularly suited to promote sparsity and allow for efficient Bayesian inference [9, 13, 24]. The jth intra-image prior is

$$\begin{aligned} p( {\textbf{x}}^{(j)} | \varvec{\beta }^{(j)} ) \propto \det \left( B^{(j)} \right) ^{1/2} \exp \left\{ - \frac{1}{2} ( {\textbf{x}}^{(j)} )^T R^T B^{(j)} R {\textbf{x}}^{(j)} \right\} , \end{aligned}$$
(8)

where \(B^{(j)} = {{\,\textrm{diag}\,}}(\varvec{\beta }^{(j)})\) and \(\varvec{\beta }^{(j)} = [\beta ^{(j)}_1,\dots ,\beta ^{(j)}_K]\) is treated as a random vector with i. i. d. gamma distributed components,

$$\begin{aligned} p(\beta ^{(j)}_k) = \Gamma (\beta ^{(j)}_k|\eta _{\beta ^{(j)}_k},\theta _{\beta ^{(j)}_k}), \quad k=1,\dots ,K. \end{aligned}$$
(9)

For simplicity, we use the same shape and rate parameter for all prior precisions, denoted by \(\eta _{\beta }\) and \(\theta _{\beta }\). We choose \(\theta _{\beta } \approx 0\) to make the hyper-prior flat and thus uninformative. Again, if one has a reasonable a priori notion of the support of \(R {\textbf{x}}^{(j)}\), the choice for the hyper-prior (9) and the parameters \(\eta _{\beta ^{(j)}_k},\theta _{\beta ^{(j)}_k}\) could be modified correspondingly. If we denote the collection of all precision vectors by \(\varvec{\beta } = [\varvec{\beta }^{(1)};\dots ;\varvec{\beta }^{(J)}]\), then the joint intra-image prior is

$$\begin{aligned} \begin{aligned} p( {\textbf{x}} | \varvec{\beta } )&= \prod _{j=1}^J p( {\textbf{x}}^{(j)} | \varvec{\beta }^{(j)} ) \\&\propto \prod _{j=1}^J \det \left( B^{(j)} \right) ^{1/2} \exp \left\{ - \frac{1}{2}( {\textbf{x}}^{(j)} )^T R^T B^{(j)} R {\textbf{x}}^{(j)} \right\} , \end{aligned} \end{aligned}$$
(10)

assuming that the images are conditionally independent.

Remark 3

The images being conditionally independent means that if we know the parameters \(\varvec{\beta }\), then knowledge of one image \({\textbf{x}}^{(j)}\) provides no additional information about the likelihood of another image \({\textbf{x}}^{(i)}\) with \(i \ne j\). Although we assume that the images form a temporal sequence with only parts of the images changing, we treat this as qualitative information; We neither want to quantify the exact location nor the size of the (no-)change regions, making us assume that the images are conditionally independent.

2.3 The Inter-image Prior

Assume for the moment that we had a pre-computed diagonal change mask \(C^{(j-1,j)}\) with \([C^{(j-1,j)}]_{n,n} = 0\) in change regions and \([C^{(j-1,j)}]_{n,n} = 1\) in no-change regionsFootnote 3 as in [42]. We could then translate the coupling term in (3) into the empirical conditionally Gaussian prior

$$\begin{aligned} \begin{aligned} p( {\textbf{x}} | \mu _2,\dots ,\mu _J )&\propto \prod _{j=2}^J \mu _j^{N/2} \exp \left\{ - \frac{\mu _j}{2} \left\| C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\| _2^2 \right\} \\&= \prod _{j=2}^J \mu _j^{N/2} \exp \left\{ - \frac{\mu _j}{2} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)}\right) ^T C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\} , \end{aligned} \end{aligned}$$
(11)

where we have used that \(\left( C^{(j-1,j)}\right) ^2 = C^{(j-1,j)}\). As mentioned before, the method proposed in [42, Section 3] to pre-compute the change masks uses Fourier data and assumes that the sequential images only include objects with closed boundaries, however. We overcome these restrictions by replacing the pre-computed diagonal elements of the change mask with random variables, which we then estimate together with the other parameters and images. We propose to use the inter-image prior

$$\begin{aligned} p( {\textbf{x}} | \varvec{\gamma } ) \propto \prod _{j=2}^J \, \det \left( C^{(j-1,j)} \right) ^{1/2} \exp \left\{ - \frac{1}{2} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) ^T C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\} \nonumber \\ \end{aligned}$$
(12)

with \(C^{(j-1,j)} = {{\,\textrm{diag}\,}}(\varvec{\gamma }^{(j-1,j)})\) and \(\varvec{\gamma }^{(j-1,j)} = [\gamma ^{(j-1,j)}_1, \dots , \gamma ^{(j-1,j)}_N]^T\). Furthermore, \(\varvec{\gamma }\) in (12) denotes the collection of all coupling vectors \(\varvec{\gamma }^{(1,2)},\dots ,\varvec{\gamma }^{(J-1,J)}\). The random variables \(\gamma ^{(j-1,j)}_n\) introduce an adaptive and spatially varying weighting in the change mask. Another advantage of (12) is that we stay within the class of conditionally Gaussian densities, which is convenient for Bayesian inference. This further motivates us to assume that the elements of \(\varvec{\gamma }^{(j-1,j)}\) are i. i. d. gamma distributed,

$$\begin{aligned} p\left( \gamma ^{(j-1,j)}_n \right) = \Gamma \left( \gamma ^{(j-1,j)}_n | \eta _{\gamma ^{(j-1,j)}_n}, \theta _{\gamma ^{(j-1,j)}_n}\right) , \quad n=1,\dots ,N. \end{aligned}$$
(13)

For simplicity, we use the same shape and rate parameter for all elements, denoted by \(\eta _{\gamma }\) and \(\theta _{\gamma }\). To make the hyper-prior flat and therefore uninformative again, we choose \(\theta _{\gamma } \approx 0\). As will be discussed in greater detail in Sect. 4, the choice of \(\eta _{\gamma }\) is influenced by the magnitude of change we expect to occur between consecutive pairs of images. Moreover, if one has a reasonable a priori notion of the location or amount of change between subsequent images, the choice for the hyper-prior (13) and the parameters \(\eta _{\gamma ^{(j-1,j)}_n}, \theta _{\gamma ^{(j-1,j)}_n}\) could be modified correspondingly. We will investigate such informative hyper-priors in future works.

2.4 The Combined Prior

We now discuss how promoting intra-image sparsity as in Sect. 2.2 and inter-image coupling as in Sect. 2.3 can be combined in a single prior. To this end, we can re-write the joint intra-image prior (10) more compactly as

$$\begin{aligned} p( {\textbf{x}} | \varvec{\beta } ) \propto \det ( B )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T {\tilde{R}}^T B {\tilde{R}} {\textbf{x}} \right\} , \end{aligned}$$
(14)

where \(B = {{\,\textrm{diag}\,}}(B^{(1)},\dots ,B^{(J)})\) and \({\tilde{R}} = {{\,\textrm{diag}\,}}(R,\dots ,R)\) are diagonal matrices, and \({\textbf{x}} = [{\textbf{x}}^{(1)};\dots ;{\textbf{x}}^{(J)}]\) again denotes the collection of all images. We can now note that the joint intra-image prior (14) encodes the assumption that

$$\begin{aligned} {\tilde{R}} {\textbf{x}} \sim {\mathcal {N}}({\textbf{0}},B^{-1}), \end{aligned}$$
(15)

i.e., \({\tilde{R}} {\textbf{x}}\) is zero-mean normal distributed with diagonal precision matrix B. At the same time, we can re-write the inter-image prior (12) more compactly as

$$\begin{aligned} p( {\textbf{x}} | \varvec{\gamma } ) \propto \det ( C )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T S^T C S {\textbf{x}} \right\} , \end{aligned}$$
(16)

where \(C = {{\,\textrm{diag}\,}}(C^{(1,2)},\dots ,C^{(J-1,J)})\) is a diagonal matrix and

$$\begin{aligned} S = \begin{bmatrix} I &{} -I &{} &{} \\ &{} \ddots &{} \ddots &{} \\ &{} &{} I &{} -I \end{bmatrix} \end{aligned}$$
(17)

with I denoting the \(N \times N\) identity matrix. Basically, S transforms the collection of images \({\textbf{x}} = [{\textbf{x}}^{(1)};\dots ;{\textbf{x}}^{(J)}]\) into the collection of differences of sequential images \([{\textbf{x}}^{(1)} - {\textbf{x}}^{(2)};\dots ;{\textbf{x}}^{(J-1)} - {\textbf{x}}^{(J)}]\). Similar to before, we can therefore note that the inter-image prior (16) encodes the assumption that

$$\begin{aligned} S {\textbf{x}} \sim {\mathcal {N}}({\textbf{0}},C^{-1}), \end{aligned}$$
(18)

Having (15) and (18) at hand is convenient since it positions us to combine these two assumptions in a single prior. To this end, note that (15) and (18) holding simultaneously is equivalent to

$$\begin{aligned} \begin{bmatrix} {\tilde{R}} \\ S \end{bmatrix} {\textbf{x}} \sim {\mathcal {N}}\left( {\textbf{0}}, \begin{bmatrix} B &{} 0 \\ 0 &{} C \end{bmatrix}^{-1} \right) , \end{aligned}$$
(19)

which corresponds to the combined prior

$$\begin{aligned} p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } ) \propto \det ( {\tilde{R}}^T B {\tilde{R}} + S^T C S )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T ( {\tilde{R}}^T B {\tilde{R}} + S^T C S ) {\textbf{x}} \right\} . \end{aligned}$$
(20)

The combined prior (20) is a desirable model, but it can have issues when the covariance matrix \({\tilde{R}}^T B {\tilde{R}} + S^T C S\) is non-invertibleFootnote 4, rendering the normalizing constant \(\det ( {\tilde{R}}^T B {\tilde{R}} + S^T C S )^{1/2}\) zero. In this case, one may consider making the covariance matrix invertible by adding a small multiple of the identity to it. This approximation would allow some variability in the directions that collapse without changing in any substantial manner the variance along to dominant directions. Still, even when the covariance matrix is invertible, Bayesian inference can become computationally intractable due to the complicated relationship between the eigenvalues of \({\tilde{R}}^T B {\tilde{R}} + S^T C S\) and the hyper-parameters \(\varvec{\beta }\) and \(\varvec{\gamma }\). To overcome these challenges, we propose the following modification to the combined prior (20):

$$\begin{aligned} p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } ) \propto \det ( B )^{1/2} \det ( C )^{1/2} \exp \left\{ - \frac{1}{2} {\textbf{x}}^T ( {\tilde{R}}^T B {\tilde{R}} + S^T C S ) {\textbf{x}} \right\} . \end{aligned}$$
(21)

The modified combined prior (21) offers computational benefits and produces satisfactory numerical results. We chose this particular form as the resulting fully conditional distributions for the hyper-parameters \(\varvec{\beta }\) and \(\varvec{\gamma }\) are gamma distributions, making Bayesian maximum a posteriori estimation computationally efficient. Further details on the proposed model’s inference are provided in Sect. 3. We acknowledge that this modification is ad-hoc, and future research could explore other potential modifications. Overall, the modified combined prior strikes a balance between modeling considerations and computational efficiency. Figure 3 provides a graphical summary of the JHBL model described in this section.

Fig. 3
figure 3

Graphical representation of the JHBL model for two images. Shaded and plain circles represent observed and unobserved random variables, respectively. The arrows indicate how the random variables influence each other. More specifically, the noise precisions \(\alpha _1, \alpha _2\) and images \({\textbf{x}}^{(1)},{\textbf{x}}^{(2)}\) are connected to the data vectors \({\textbf{y}}^{(1)},{\textbf{y}}^{(2)}\) via the likelihood (5); The hyper-parameters \(\varvec{\beta }^{(1)},\varvec{\beta }^{(2)}\) are connected to the images \({\textbf{x}}^{(1)},{\textbf{x}}^{(2)}\) via the intra-image prior (10); The images \({\textbf{x}}^{(1)}\) and \({\textbf{x}}^{(2)}\) are connected to each and the hyper-parameter \(\varvec{\gamma }^{(1,2)}\) via the inter-image prior (12), where \(\varvec{\gamma }^{(1,2)}\) encodes the (change and no-change) regions and the amount of coupling

Remark 4

The joint intra-image and inter-image priors, (14) and (16), are not necessarily consistent. That is, the marginal distributions of \({\textbf{x}}\) derived from the conditional distributions of \({\textbf{x}}|\varvec{\beta }\) and \({\textbf{x}}|\varvec{\gamma }\), respectively, can differ. For this reason, we advise against performing Bayesian inference using the marginal prior. Instead, our Bayesian inference algorithm outlined in Sect. 3 will utilize the hierarchical structure of the Bayesian model.

3 Bayesian Inference

Assume that we are interested in the most probable value (mode) of the posterior \(p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} )\), i. e., a combination of images and parameters for which the given data sets are most likely. To this end, we adopt the Bayesian coordinate descent (BCD) algorithm [24], which efficiently approximates the posterior mode by alternatingly updating the images and parameters based on the mode of the fully conditional distributions.

3.1 The Fully Conditional Distributions

We chose conditionally Gaussian priors and gamma hyper-priors because they are conditionally conjugate. This allows us to derive analytical expressions for the fully conditional densities, which is convenient for Bayesian inference. Bayes’ theorem states that

$$\begin{aligned} p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} ){} & {} \propto p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } ) p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } ) \nonumber \\{} & {} = p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha } ) p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } ) p( \varvec{\alpha } ) p( \varvec{\beta } ) p( \varvec{\gamma } ), \end{aligned}$$
(22)

where \(p( {\textbf{y}} | {\textbf{x}}, \varvec{\alpha } )\) is the likelihood, \(p( {\textbf{x}} | \varvec{\beta }, \varvec{\gamma } )\) is the prior, and \(p( \varvec{\alpha } ), p( \varvec{\beta } ), p( \varvec{\gamma } )\) are the hyper-priors. If we substitute the joint likelihood (7) and combined prior (21) together with the hyper-priors (6), (9), (13) into (22), we get

$$\begin{aligned} \begin{aligned}&p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} ) \\ \propto&\left( \prod _{j=1}^J \alpha _j^{M_j/2} \exp \left\{ -\frac{\alpha _j}{2} \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2 \right\} \right) \cdot \\&\left( \prod _{j=1}^J \det \left( B^{(j)} \right) ^{1/2} \exp \left\{ -\frac{1}{2} ( {\textbf{x}}^{(j)} )^T R^T B^{(j)} R {\textbf{x}}^{(j)} \right\} \right) \cdot \\&\left( \prod _{j=2}^J \, \det \left( C^{(j-1,j)} \right) ^{1/2} \exp \left\{ -\frac{1}{2} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) ^T C^{(j-1,j)} \left( {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right) \right\} \right) \cdot \\&\left( \prod _{j=1}^J \alpha _j^{\eta _{\alpha }-1} \exp \{ -\theta _{\alpha } \alpha _j \} \right) \left( \prod _{j=1}^J \beta _j^{\eta _{\beta }-1} \exp \{ -\theta _{\beta } \beta _j \} \right) \left( \prod _{j=2}^J \gamma _j^{\eta _{\gamma }-1} \exp \{ -\theta _{\gamma } \gamma _j \} \right) . \end{aligned} \end{aligned}$$
(23)

We can note from (23) that

$$\begin{aligned} \begin{aligned} p\left( \alpha _j | {\textbf{x}}^{(j)}, {\textbf{y}}^{(j)} \right)&\propto \alpha _j^{M_j/2 + \eta _\alpha - 1} \exp \left\{ - \left( \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2/2 + \theta _\alpha \right) \alpha _j \right\} , \\ p\left( \beta ^{(j)}_k | {\textbf{x}}^{(j)} \right)&\propto \left( \beta ^{(j)}_k \right) ^{1/2 + \eta _\beta - 1} \exp \left\{ - \left( \left[ R{\textbf{x}}^{(j)}\right] _k^2/2 + \theta _\beta \right) \beta ^{(j)}_k \right\} , \\ p\left( \gamma _n^{(j-1,j)} | {\textbf{x}}^{(j-1)}, {\textbf{x}}^{(j)} \right)&\propto \left( \gamma _n^{(j-1,j)} \right) ^{1/2 + \eta _\gamma - 1} \exp \left\{ - \left( \left[ {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)}\right] _n^2/2 + \theta _\gamma \right) \gamma _n^{(j-1,j)} \right\} , \end{aligned} \end{aligned}$$
(24)

and thus

$$\begin{aligned} p\left( \alpha _j | {\textbf{x}}^{(j)}, {\textbf{y}}^{(j)} \right)&\propto \Gamma \left( \alpha _j | \eta _{\alpha } + M_j/2, \theta _{\alpha } + \left\| F^{(j)} {\textbf{x}}^{(j)} - {\textbf{y}}^{(j)} \right\| _2^2/2 \right) , \end{aligned}$$
(25)
$$\begin{aligned} p\left( \beta ^{(j)}_k | {\textbf{x}}^{(j)} \right)&\propto \Gamma \left( \beta ^{(j)}_k | \eta _{\beta } + 1/2, \theta _{\beta } + \left[ R{\textbf{x}}^{(j)}\right] _k^2/2 \right) , \end{aligned}$$
(26)
$$\begin{aligned} p\left( \gamma _n^{(j-1,j)} | {\textbf{x}}^{(j-1)}, {\textbf{x}}^{(j)} \right)&\propto \Gamma \left( \gamma _n^{(j-1,j)} | \eta _{\gamma } + 1/2, \theta _{\gamma } + \left[ {\textbf{x}}^{(j-1)} - {\textbf{x}}^{(j)} \right] _n^2/2 \right) , \end{aligned}$$
(27)

where (25), (26) hold for \(j=1,\dots ,J\) and \(k=1,\dots ,K\), while (27) holds for \(j=2,\dots ,J\) and \(n=1,\dots ,N\). Similarly, for \(j=2,\dots ,J-1\), we have

$$\begin{aligned} \begin{aligned} p\left( {\textbf{x}}^{(1)} | {\textbf{x}}^{(2)}, {\textbf{y}}^{(1)}, \alpha _1, \varvec{\beta }^{(1)}, \varvec{\gamma }^{(1,2)} \right)&\propto {\mathcal {N}}\left( {\textbf{x}}^{(1)} | \varvec{\mu }^{(1)}, \Sigma ^{(1)} \right) , \\ p\left( {\textbf{x}}^{(j)} | {\textbf{x}}^{(j-1)}, {\textbf{x}}^{(j+1)}, {\textbf{y}}^{(j)}, \alpha _j, \varvec{\beta }^{(j)}, \varvec{\gamma }^{(j-1,j)}, \varvec{\gamma }^{(j,j+1)} \right)&\propto {\mathcal {N}}\left( {\textbf{x}}^{(j)} | \varvec{\mu }^{(j)}, \Sigma ^{(j)} \right) , \\ p\left( {\textbf{x}}^{(J)} | {\textbf{x}}^{(J-1)}, {\textbf{y}}^{(J)}, \alpha _J, \varvec{\beta }^{(J)}, \varvec{\gamma }^{(J-1,J)} \right)&\propto {\mathcal {N}}\left( {\textbf{x}}^{(N)} | \varvec{\mu }^{(N)}, \Sigma ^{(N)} \right) , \end{aligned} \end{aligned}$$
(28)

with covariance matrices

$$\begin{aligned} \begin{aligned} \Sigma ^{(1)}&= \left( \alpha _1 \left( F^{(1)} \right) ^T F^{(1)} + R^T B^{(1)} R + C^{(1,2)} \right) ^{-1}, \\ \Sigma ^{(j)}&= \left( \alpha _j \left( F^{(j)} \right) ^T F^{(j)} + R^T B^{(j)} R + C^{(j-1,j)} + C^{(j,j+1)} \right) ^{-1}, \\ \Sigma ^{(J)}&= \left( \alpha _J \left( F^{(J)} \right) ^T F^{(J)} + R^T B^{(J)} R + C^{(J-1,J)} \right) ^{-1}, \\ \end{aligned} \end{aligned}$$
(29)

and means

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{(1)}&= \Sigma ^{(1)} \left( \alpha _1 \left( F^{(1)} \right) ^T {\textbf{y}}^{(1)} + C^{(1,2)} {\textbf{x}}^{(2)} \right) , \\ \varvec{\mu }^{(j)}&= \Sigma ^{(j)} \left( \alpha _j \left( F^{(j)} \right) ^T {\textbf{y}}^{(j)} + C^{(j-1,j)} {\textbf{x}}^{(j-1)} + C^{(j,j+1)} {\textbf{x}}^{(j+1)} \right) , \\ \varvec{\mu }^{(J)}&= \Sigma ^{(J)} \left( \alpha _J \left( F^{(J)} \right) ^T {\textbf{y}}^{(J)} + C^{(J-1,J)} {\textbf{x}}^{(J-1)} \right) . \end{aligned} \end{aligned}$$
(30)

Since \(\alpha _j\), \(\beta ^{(j)}_k\), and \(\gamma ^{(j-1,j)}_n\) are gamma distributed and only take on positive values, the covariance matrices in (29) are symmetric and positive definite (SPD).

3.2 Proposed Method: Joint Hierarchical Bayesian Learning

We are now positioned to formulate our JHBL method by adapting the BCD algorithm [24] to our model in Sect. 2. The BCD algorithm is described in Algorithm 1 and approximates the mode (or mean) of the posterior \(p( {\textbf{x}}, \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } | {\textbf{y}} )\) by alternatingly updating the images and parameters based on the mode (or mean) of the fully conditional distributions (25), (26), (27), (28).

figure a

Algorithm 1 is efficient and straightforward to implement because of the analytical expressions for the fully conditional distributions we derived in (25), (26), (27), (28). The mode of a gamma distribution \(\Gamma (\eta ,\theta )\) with \(\eta ,\theta > 0\) is \(\max \{0,(\eta -1)/\theta \}\). Thus, (25), (26), (27) imply that the \(\varvec{\alpha }\)-, \(\varvec{\beta }\)-, and \(\varvec{\gamma }\)-updates in Algorithm 1 are equivalent to

$$\begin{aligned} \left\{ \alpha _j \right\} ^{l+1}&= \frac{ \eta _{\alpha } + M_j/2 - 1}{ \theta _{\alpha } + \left\| F^{(j)} \{ {\textbf{x}}^{(j)} \}^{l} - {\textbf{y}}^{(j)} \right\| _2^2/2}, \quad{} & {} j=1,\ldots ,J, \end{aligned}$$
(31)
$$\begin{aligned} \left\{ \beta ^{(j)}_k \right\} ^{l+1}&= \frac{ \eta _{\beta } - 1/2 }{ \theta _{\beta } + \left[ R \{ {\textbf{x}}^{(j)} \}^{l} \right] _k^2/2 }, \quad{} & {} j=1,\ldots ,J,\ k=1,\ldots ,K, \end{aligned}$$
(32)
$$\begin{aligned} \left\{ \gamma ^{(j-1,j)}_n \right\} ^{l}&= \frac{ \eta _{\gamma } - 1/2 }{ \theta _{\gamma } + \left[ \{ {\textbf{x}}^{(j-1)} \}^{l} - \{ {\textbf{x}}^{(j)} \}^{l} \right] _n^2/2}, \quad{} & {} j=2,\ldots ,J,\ n=1,\ldots ,N, \end{aligned}$$
(33)

assuming nonnegative numerators, where \(\{ \ \}^{l}\) denotes the l-th iteration of the term inside the curved brackets. Further, (28) implies that the \({\textbf{x}}\)-update in Algorithm 1 is equivalent to solving the linear systems

$$\begin{aligned} \left\{ G^{(j)} \right\} ^{l+1} \left\{ {\textbf{x}}^{(j)} \right\} ^{l+1} = \left\{ {\textbf{b}}^{(j)} \right\} ^{l+1}, \quad j=1,\dots ,J, \end{aligned}$$
(34)

with SPD coefficient matrices

$$\begin{aligned} \begin{aligned} \left\{ G^{(1)} \right\} ^{l+1} =&\ \{ \alpha _1 \}^{l+1} (F^{(1)})^T F^{(1)} + R^T \{ B^{(1)} \}^{l+1} R + \{ C^{(1,2)} \}^{l+1}, \\ \left\{ G^{(j)} \right\} ^{l+1} =&\ \{ \alpha _j \}^{l+1} (F^{(j)})^T F^{(j)} + R^T \{ B^{(j)} \}^{l+1} R + \{ C^{(j-1,j)} \}^{l+1} \\&+ \{ C^{(j,j+1)} \}^{l+1}, \\ \left\{ G^{(J)} \right\} ^{l+1} =&\ \{ \alpha _J \}^{l+1} (F^{(J)})^T F^{(J)} + R^T \{ B^{(J)} \}^{l+1} R + \{ C^{(J-1,J)} \}^{l+1}, \end{aligned} \end{aligned}$$
(35)

and right-hand sides

$$\begin{aligned} \begin{aligned} \left\{ {\textbf{b}}^{(1)} \right\} ^{l+1} =&\ \{ \alpha _1 \}^{l+1} ( F^{(1)} )^T {\textbf{y}}^{(1)} + \{ C^{(1,2)} \}^{l+1} \{ {\textbf{x}}^{(2)} \}^{l}, \\ \left\{ {\textbf{b}}^{(j)} \right\} ^{l+1} =&\ \{ \alpha _j \}^{l+1} ( F^{(1)} )^T {\textbf{y}}^{(j)} + \{ C^{(j-1,j)} \}^{l+1} \{ {\textbf{x}}^{(j-1)} \}^{l} \\&+ \{ C^{(j,j+1)} \}^{l+1} \{ {\textbf{x}}^{(j+1)} \}^{l}, \\ \left\{ {\textbf{b}}^{(J)} \right\} ^{l+1}&= \ \{ \alpha _J \}^{l+1} ( F^{(J)} )^T {\textbf{y}}^{(J)} + \{ C^{(J-1,J)} \}^{l+1} \{ {\textbf{x}}^{(J-1)} \}^{l}, \end{aligned} \end{aligned}$$
(36)

for \(j=2,\dots ,J-1\). Notably, we compute the \((l+1)\)-th iteration of the \({\textbf{x}}^{(j)}\)’s using the l-th iteration of the neighboring images in (36). This allows us to parallelize the \({\textbf{x}}^{(j)}\)-updates (34), making our method efficient even for large image sequences. We can now summarize our JHBL method for sequential image recovery as in Algorithm 2.

figure b

3.3 Separate Recovery as a Special Case

We can recover the generalized sparse Bayesian learning (GSBL) method [24] from our JHBL procedure as a special/limit case. If \(\eta _{\gamma } \le 1/2\) or \(\theta _{\gamma } \rightarrow \infty \), then the \(\varvec{\gamma }^{(j)}\)-update (33) becomes

$$\begin{aligned} \left\{ \varvec{\gamma }^{(j-1,j)} \right\} ^{l} = 0, \quad j=2,\dots ,J, \end{aligned}$$
(37)

and the \({\textbf{x}}^{(j)}\)-update (34) reduces to

$$\begin{aligned} \left( \{ \alpha _j \}^{l+1} (F^{(j)})^T F^{(j)} + R^T \{ B^{(j)} \}^{l+1} R \right) \left\{ {\textbf{x}}^{(j)} \right\} ^{l+1} = \{ \alpha _j \}^{l+1} ( F^{(j)} )^T {\textbf{y}}^{(j)}, \end{aligned}$$
(38)

for \(j=1,\dots ,J\). In this case, Algorithm 2 corresponds to using the GSBL algorithm [24] to recover the images separately.

3.4 Efficient Implementation

A few remarks on Algorithm 2 are in order.

Remark 5

We initialize the images \(\{ {\textbf{x}}^{(1)} \}^{0},\dots ,\{ {\textbf{x}}^{(J)} \}^{0}\) in Algorithm 2 as the separately recovered images using the GSBL algorithm [24], which we efficiently implemented in parallel.

Remark 6

The different \({\textbf{x}}^{j}\)-, \(\alpha _j\)-, \(\varvec{\beta }^{(j)}\)-, and \(\varvec{\gamma }^{(j)}\)-updates can be easily parallelized. This makes Algorithm 2 efficient even for large image sequences.

Remark 7

The coefficient matrices in the \({\textbf{x}}^{(j)}\)-updates (34) can become prohibitively large in imaging applications. To avoid storage and efficiency issues, we identify the solutions of (34) with the unique minimizers of the quadratic functionals

$$\begin{aligned} L^{(j)}({\textbf{x}}) = {\textbf{x}}^T G^{(j)} {\textbf{x}} - 2 {\textbf{x}}^T {\textbf{b}}^{(j)}, \quad j=1,\dots ,J, \end{aligned}$$
(39)

which is possible since the \(G^{(j)}\)’s are (almost surely) SPD. We then efficiently solve for the minimizers of (39) using a gradient descent method [24]. In our implementation, we use five gradient descent steps for every iteration of the \({\textbf{x}}^{(j)}\)-updates.

Remark 8

We stop the iterations in Algorithm 2 if the average relative and absolute change between two subsequent image sequence iterations w. r. t. the \(\Vert \cdot \Vert _2\)-norm are less than \(10^{-3}\) or if a maximum number of \(10^{3}\) iterations is reached.

Remark 9

A detailed analysis regarding the convexity of the cost function, \(-\log p({\textbf{x}},\varvec{\alpha },\varvec{\beta },\varvec{\gamma }| {\textbf{y}})\), and the convergence of Algorithm 2 exceeds the scope of this paper and will be addressed in future works.

4 Numerical Tests

We consider two numerical tests to demonstrate the performance of our JHBL method.

4.1 Sequential Magnetic Resonance Imaging

Given is a temporal sequence of six \(128 \times 128\) phantom images obtained by a GE HTXT 1.5T clinical magnetic resonance imaging (MRI) scanner [33], from which we generate under-sampled and indirect Fourier data. Figure 4a, b and c show the first three reference images, where the change consists of the rotating left (green) ellipse and the down-moving right (yellow) ellipse. The Fourier samples contained in the data sets are

$$\begin{aligned} y^{(j)}_{k,l} = \int ^1_0\int ^1_0 x^{(j)}(s,t)e^{-i2\pi (ks+lt)}\, \textrm{d}s \, \textrm{d}t, \quad - \left\lceil \llceil \frac{N_1}{2} \right\rceil \rrceil \le k,l < \left\lceil \llceil \frac{N_1}{2} \right\rceil \rrceil , \end{aligned}$$
(40)

for \(j = 1,\dots ,J\), where \(N = N_1^2\) and \(x^{(j)}\) is a function descibing the jth image. We use the discrete Fourier transform as a linear forward operator, thereby introducing model discrepancy and avoiding the inverse crime [30]. Further, each data set is missing Fourier samples for the symmetric bands

$$\begin{aligned} {\mathcal {K}}_j = \left[ \pm (10j + 1), \pm (10j+10) \right] ^2, \quad j=1,\dots ,6, \end{aligned}$$
(41)

and the remaining Fourier samples contain additive i. i. d. zero-mean normal noise. The amount of noise is measured using the signal-to-noise ratio (SNR)

$$\begin{aligned} \textrm{SNR}^{(j)} = 10 \log _{10} \left( \alpha _j y^{(j)}_{0,0} \right) , \quad j=1,\dots ,J, \end{aligned}$$
(42)

where \(y^{(j)}_{0,0}\) is the average of the jth image. The SNR for all images in Fig. 4 is 2.

Fig. 4
figure 4

Three temporal phantom images by a GE HTXT 1.5T clinical MRI scanner [33] (first row) and their separate reconstructions from noisy and under-sampled Fourier data solving the deterministic \(\ell ^1\)-regularized inverse problems (2) (second row) and the Bayesian GSBL algorithm (third row)

Figure 4d, e and f illustrate the separately recovered images from the noisy and under-sampled Fourier data by solving the (weighted) \(\ell ^1\)-regularized inverse problems (2) using the alternating directions method of multipliers (ADMM) [7, 42]. Figure 4g, h and i visualizes the separately recovered images from the same data sets using the GSBL algorithm [24] with hyper-parameters \(\eta _{\alpha } = \eta _{\beta } = 1\) and \(\theta _{\alpha } = \theta _{\beta } = 10^{-3}\). In both cases, we used an anisotropic first-order TV regularization operator

$$\begin{aligned} R = \begin{bmatrix} I \otimes D \\ D \otimes I \end{bmatrix} \quad \text {with} \quad D = \begin{bmatrix} -1 &{} 1 &{} &{} \\ &{} \ddots &{} \ddots &{} \\ &{} &{} -1 &{} 1 \end{bmatrix} \in {\mathbb {R}}^{(N_1-1) \times N_1}, \end{aligned}$$
(43)

to promote the images being piecewise constant. Figure 4 demonstrates that the Bayesian GSBL algorithm separately recovers the images more accurately than the deterministic algorithm, although we still observe some smeared features in Fig. 4g and h.

Fig. 5
figure 5

Three temporal phantom images by a GE HTXT 1.5T clinical MRI scanner [33] (first row) and their joint reconstructions from noisy and under-sampled Fourier data using the deterministic weighted \(\ell ^1\)-regularized inverse problems (3) (second row) and the proposed JHBL algorithm (third row)

Figure 5 illustrates the corresponding jointly recovered images. Specifically, Fig. 5a, b and c visualizes the jointly recovered images using ADMM to solve the joint \(\ell ^1\)-regularized inverse problem (3) as proposed in [42], which we use as a benchmark. Figure 5d, e and f show the jointly recovered images using our JHBL algorithm (Algorithm 2). Following the discussion in Sect. 2, we chose the hyper-parameters as \(\eta _{\alpha } = \eta _{\beta } = 1\), \(\eta _{\gamma } = 2\), and \(\theta _{\alpha } = \theta _{\beta } = \theta _{\gamma } = 10^{-3}\).

Remark 10

Following [24], \(\eta _{\alpha } = \eta _{\beta } = 1\) and \(\theta _{\alpha } = \theta _{\beta } = 10^{-3}\) are usual choices for the GSBL algorithm. Initially, we also tried to use \(\eta _{\gamma } = 1\) and \(\theta _{\gamma } = 10^{-3}\) for the parameters of the inter-image hyper-prior, but observed that the inter-image coupling was sometimes too strong. We thus used \(\eta _{\gamma } = 2\) and \(\theta _{\gamma } = 10^{-3}\) instead. The heuristic behind increasing the shape parameter to \(\eta _{\gamma }\) is as follows: We assume that every consecutive pair of images contains more no-change regions than change regions. We thus expect \(\gamma ^{(j-1,j)}_n \gg 0\) for most of the \(\gamma ^{(j-1,j)}_n\)’s in (12), indicating no change, and \(\gamma ^{(j-1,j)}_n \approx 0\) for only a few of the \(\gamma ^{(j-1,j)}_n\)’s. We increasingly promote this behavior for the \(\gamma ^{(j-1,j)}_n\)’s by increasing the shape parameter \(\eta _{\gamma }\) in (13). We further investigate the influence of the shape and rate parameter \(\eta _{\gamma }\) and \(\theta _{\gamma }\) on the inter-image coupling in Sect. 4.3.

Both joint methods yield more accurate recovered images than the respective separate method. At the same time, our JHBL algorithm provides notably more accurate recovered images than the deterministic joint method proposed in [42].

Fig. 6
figure 6

Average relative log-errors for the separate deterministic (blue strokes), the separate Bayesian (orange circles), the joint deterministic (yellow triangles), and our joint Bayesian (purple diamonds) algorithm. In Fig. 6a, the SNR is 2. In Fig. 6b, we started with \(128 \times 128\) Fourier samples and then removed the bands in (41) (Color figure online)

We observed the proposed JHBL method to yield the most accurate recovered images for all combinations of Fourier samples and SNRs we considered. Figure 6a and b show the average relative log-error for different numbers of Fourier samples and SNRs. The relative log-error of the jth image is

$$\begin{aligned} E_{\log }^{(j)} = \log _{10}\left( \frac{\left\| {\textbf{x}}^{(j)}_{\textrm{ref}} - {\textbf{x}}^{(j)} \right\| _2}{\left\| {\textbf{x}}^{(j)}_{\textrm{ref}}\right\| _2} \right) , \quad j=1,\dots ,J, \end{aligned}$$
(44)

where \({\textbf{x}}^{(j)}_{\textrm{ref}}\) and \({\textbf{x}}^{(j)}\) are the reference and recovered image, respectively. The average relative log-error is \(\left( E_{\log }^{(1)}+\dots +E_{\log }^{(J)}\right) /J\).

Fig. 7
figure 7

Pixelwise variances of the recovered first image. Jointly recovering the images by ”borrowing” missing information from the other images reduces uncertainty in no-change regions

Another advantage of our JHBL method is that it allows us to quantify uncertainty. Indeed, we recover a full Gaussian distribution, \({\mathcal {N}}(\varvec{\mu }^{(j)}, \Sigma ^{(j)})\), for every individual image, which is conditioned on the observed data sets and the other estimated parameters. To demonstrate this, Fig. 7a and b illustrate the pixelwise variance of the separately and jointly recovered first image using the GSBL and our JHBL algorithm. The pixelwise variance corresponds to the diagonal elements of the covariance matrices (29). Comparing Fig. 7a and b, we observe reduced uncertainty in the jointly recovered first image, except for the change regions around the two ellipses. The reduced uncertainty of the recovered first image by our JHBL method away from the change regions is due to the method “borrowing” information from the neighboring images.

Fig. 8
figure 8

The final estimates for the intra-image regularization parameters in the vertical and horizontal direction for the first image in Fig. 5 and their pixelwise average

Another convenient by-product of our JHBL method is that it allows for edge and change detection, which we illustrate in Figs. 8 and 9. Figure 8a and b respectively visualize the final estimate of the first and second half of the intra-image regularization parameter \(\varvec{\beta }^{(1)}\) of our JHBL method for the first recovered image. Since we used an anisotropic first-order TV operator (43), the intra-image regularization parameter values indicate edges in the vertical and the horizontal direction. We can combine them by considering the pixelwise average of the images in Fig. 8a and b to obtain the edge profile in Fig. 8c.

Fig. 9
figure 9

The final estimates for the Bayesian change masks (top row) and pre-computed binary change masks (bottom row) used by the Bayesian and deterministic method to jointly recover the temporal image sequence in Fig. 5

Figure 9a and b visualize the final estimate of the conditional change mask used by our JHBL method for the first and second pair of images, while Fig. 9c and d illustrates the corresponding pre-computed binary change masks used in [42]. The pre-computed binary change masks rely on the Fourier data sets and sequential images containing only objects with closed boundaries. By contrast, the change masks used in our JHBL method neither rely on Fourier data nor the sequential images containing only objects with closed boundaries.

4.2 Sequential Image Deblurring

Fig. 10
figure 10

The first three reference images of a temporal image sequence coming from the GRAM road-traffic monitoring data set [26] (first row) and their noisy blurred versions (second row)

We next consider the deconvolution of a temporal sequence of six \(400 \times 400\) images from the GRAM road-traffic monitoring data set [26]. Figure 10 illustrates the first three reference images and their noisy blurred versions. The i. i. d. normal noise \({\textbf{e}}^{(j)}\) in the corresponding linear data model (1) has \(\textrm{SNR} = 2+j\), \(j=1,\dots ,J\). The forward operator F is obtained by applying the tensor-product midpoint quadrature to the convolution equations

$$\begin{aligned} y^{(j)}(s,t) = \int _0^1 \int _0^1 k(s-s',t-t') x^{(j)}(s,t) \, \textrm{d}s' \, \textrm{d}t', \quad j=1,\dots ,J, \end{aligned}$$
(45)

where \(x^{(j)}\) is a function describing the jth image. We assume a Gaussian convolution kernel

$$\begin{aligned} k(s,t)=\frac{1}{2\pi \gamma ^2}\exp {\left( -\frac{s^2+t^2}{2\gamma ^2}\right) } \end{aligned}$$
(46)

with blurring parameter \(\gamma = 5 \cdot 10^{-3}\), which makes F highly ill-conditioned. We use an anisotropic second-order TV regularization operator

$$\begin{aligned} R = \begin{bmatrix} I \otimes D \\ D \otimes I \end{bmatrix} \quad \text {with} \quad D = \begin{bmatrix} -1 &{} 2 &{} -1 &{} &{} \\ &{} \ddots &{} \ddots &{} \ddots &{} \\ &{} &{} -1 &{} 2 &{} -1 \end{bmatrix} \in {\mathbb {R}}^{(N_1-2) \times N_1}, \end{aligned}$$
(47)

where \(N_1 = 400\) is the number of pixels in each direction to promote the images being piecewise smooth—but not necessarily piecewise constant.

Fig. 11
figure 11

Separately recovered images using GSBL (top row) and the jointly recovered images using our JHBL algorithm (bottom row)

We can no longer use the deterministic method [42] to pre-compute binary change masks directly from the indirect data sets. However, we can still use our JHBL method to jointly recover the sequential images in a Bayesian setting. Figure 11 illustrates the separately (top row) and jointly (bottom row) recovered images from the noisy blurred images in Fig. 10d, e and f using the GSBL and our JHBL algorithm, respectively. The hyper-parameters are \(\eta _{\alpha } = \eta _{\beta } = 1\), \(\eta _{\gamma } = 1\), \(\theta _{\alpha } = \theta _{\beta } = 10^{-3}\), and \(\theta _{\gamma }=10^{-1}\). The jointly recovered images using our JHBL algorithm are more accurate than the separately recovered images.

Fig. 12
figure 12

Pixelwise variances of the recovered first image in Fig. 11. Jointly recovering the images by “borrowing” missing information from the other images reduces uncertainty

As mentioned, the proposed JHBL method can quantify uncertainty in the recovered images with increased reliability, which is often desirable in applications with no reference images. We demonstrate this in Fig. 12, which visualizes the pixelwise variances of the recovered first image in Fig. 11. We again see that jointly recovering the images by “borrowing" missing information from the other images reduces uncertainty.

Fig. 13
figure 13

The final estimates for the intra-image regularization parameters in the vertical and horizontal direction for the first recovered image in Fig. 11 using our JHBL algorithm and their pixelwise average

Moreover, Fig. 13a and b illustrate the final estimate of the first and second half of the intra-image regularization parameter \(\varvec{\beta }^{(1)}\) of our JHBL method for the first recovered image in Fig. 11d. Since we used an anisotropic second-order TV regularization operator (47), the intra-image regularization parameter values indicate edges in the vertical and horizontal direction. We can again combine them, e. g., by considering the pixelwise average of the images in Fig. 13a and b to obtain the edge profile in Fig. 13c.

Fig. 14
figure 14

The final estimates for the Bayesian change masks for the first and second pair of images in Fig. 11 using our JHBL algorithm

Finally, Fig. 14 provides the final estimates for the Bayesian change mask for the first and second pair of images in Fig. 11 using our JHBL algorithm only.

4.3 Investigating the Influence of the Shape and Rate Parameter

We end this section by briefly demonstrating how different choices for the shape and rate parameter, \(\eta _{\gamma }\) and \(\theta _{\gamma }\), of the inter-image hyper-prior (see Sect. 2.3) influence the jointly recovered image. Recall (,e.g., from Remark 10) that we expect the coupling between images to increase/decrease as smaller/larger values for the inter-image hyper-parameters \(\gamma _n^{(j-1,j)}\) become more likely. At the same time, the expected value and variance of the inter-image hyper-prior (13) are \(\eta _{\gamma }/\theta _{\gamma }\) and \(\eta _{\gamma }/\theta _{\gamma }^2\), respectively. Hence, if \(\eta _{\gamma }\) is increased, we expect larger values for \(\gamma _n^{(j-1,j)}\) to become more likely and the inter-image coupling to decrease. On the other hand, if \(\theta _{\gamma }\) is increased, we expect smaller values for \(\gamma _n^{(j-1,j)}\) to become more likely and the inter-image coupling to increase.

Fig. 15
figure 15

Demonstrating the influence of the shape and rate parameter, \(\eta _{\gamma }\) and \(\vartheta _{\gamma }\), on the inter-image coupling. First row: First phantom reference image, its joint reconstruction using the JHBL algorithm with \(\eta _{\gamma } = 2\) and \(\eta _{\gamma } = 10^{-3}\), and its separate reconstruction using the GSBL algorithm. Second row: Joint reconstructions for fixed \(\eta _{\gamma } = 10^{-3}\) and varying \(\eta _{\gamma }\). Third row: Joint reconstructions for fixed \(\theta _{\gamma } = 2\) and varying \(\theta _{\gamma }\)

Figure 15 illustrates the connection between \(\eta _{\gamma }, \theta _{\gamma }\) and the strength of the inter-image coupling for the first phantom image from the sequential MRI test case previously discussed in Sect. 4.1. Specifically, the second row of Fig. 15 shows that the jointly recovered image is visibly close to the separately recovered image for \(\eta _{\gamma } = 0.8\). At the same time, the inter-image coupling becomes so strong that some spurious artifacts from the subsequent images are introduced in change regions for \(\eta _{\gamma } = 4\). The third row of Fig. 15 demonstrates a similar behavior for fixed \(\eta _{\gamma } = 2\) and decreasing \(\theta _{\gamma }\); The jointly recovered image is visibly close to the separately recovered image for \(\theta _{\gamma } = 10^{-1}\), while the inter-image coupling becomes so strong that some spurious artifacts are introduced in change regions for \(\theta _{\gamma } = 10^{-3.5}\). Future work will optimize the shape and rate parameter selection of the inter-image hyper-prior to further increase the advantage of inter-image coupling between sequential images.

5 Summary

We presented a new method to jointly recover temporal image sequences by “borrowing” missing information in each image from the other images. Our JHBL method is simple to implement, easily parallelized, and efficient. We found our method to yield more accurate recovered images than both, separately recovering the images using the Bayesian GSBL algorithm and jointly recovering them using the deterministic method from [42]. In addition, our method avoids exhaustive parameter fine-tuning. Moreover, the deterministic method proposed in [42] is limited to Fourier data sets and images only containing objects with closed boundaries to pre-compute a binary change mask. By contrast, we treat the change mask that steers the coupling between neighboring images as a random variable, which we estimate together with the images and other parameters, making our method applicable to general modalities. We demonstrated this by considering a sequential image deblurring problem based on the GRAM road-traffic monitoring data set. Another distinct advantage of our method is that it allows us to quantify uncertainty, which is vital in applications without reference images. An additional valuable by-product is that our method allows for edge and change detection.

Future work will address the selection of hyper-parameters. Future efforts will also investigate the structural properties of the cost function and convergence of the proposed JHBL algorithm. Finally, a comparison or combination of the proposed BCD algorithm for Bayesian inference with other existing methods would be of interest.