We now build our model for applying the deterministic error modelling theory to diffusion tensor imaging. We start by building our forward model based on the Stejskal–Tanner equation, and then briefly introduce the regularisers we use.
The Forward Model
For \(u: {\varOmega }\rightarrow {{\mathrm{Sym}}}^2(\mathbb {R}^3)\), \({\varOmega }\subset \mathbb {R}^3\), a mapping from \({\varOmega }\) to symmetric second-order tensors, let us introduce non-linear operators \(T_j\), defined by
$$\begin{aligned}{}[T_j(u)](x) :=s_0(x) \exp (-\langle b_j,u(x)b_j\rangle ), \quad (j=1,\ldots ,N). \end{aligned}$$
Their role is to model the so-called Stejskal–Tanner equation [4]
$$\begin{aligned} s_j(x)=s_0(x) \exp (-\langle b_j,u(x)b_j\rangle ), \quad (j=1,\ldots ,N). \end{aligned}$$
(6)
Each tensor u(x) models the covariance of a Gaussian probability distribution at x for the diffusion of water molecules. The data \(s_j \in L^2({\varOmega })\), (\(j=1,\ldots ,N\)), are the diffusion-weighted MRI images. Each of them is obtained by performing the MRI scan with a different non-zero diffusion-sensitising gradient \(b_j\), while \(s_0\) is obtained with a zero gradient. After correcting the original k-space data for coil sensitivities, each \(s_j\) is assumed real. As a consequence, any measurement \({\hat{s}}_j\) of \(s_j\) has—in theory—Rician noise distribution [24].
Our goal is to reconstruct u with simultaneous denoising. Following [31, 55], we consider using a suitable regulariser R the Tikhonov model
$$\begin{aligned} \min _{u \geqq 0}~ \sum _{j=1}^N \frac{1}{2} \Vert {\hat{s}}_j-T_j(u)\Vert ^2 + \alpha R(u). \end{aligned}$$
(7)
The constraint \(u\geqq 0\) is to be understood in the sense that u(x) is positive semidefinite for \(\mathcal {L}^n\)-a.e. \(x \in {\varOmega }\) (see Appendix 2 for more details). Due to the Rician noise of \({\hat{s}}_j\), the Gaussian noise model implied by the \(L^2\)-norm in (7) is not entirely correct. However, in some cases the \(L^2\) model may be accurate enough, as for suitable parameters the Rician distribution is not too far from a Gaussian distribution. If one were to model the problem correctly, one should either modify the fidelity term to model Rician noise or include the (unit magnitude complex number) coil sensitivities in the model. The Rician noise model is highly non-linear due to the Bessel functional logarithms involved. Its approximations have been studied in [5, 22, 40] for single MR images and DTI. Coil sensitivities could be included either by knowing them in advance or by simultaneous estimation as in [30]. Either way, significant complexity is introduced into the model, and for the present work, we are content with the simple \(L^2\) model.
We may also consider, as is often the case, and as was done with TGV in [57], the linearised model
$$\begin{aligned} \min _{u \geqq 0}~ \Vert f-u\Vert ^2 + \alpha R(u), \end{aligned}$$
(8)
where, for each \(x \in {\varOmega }\), f(x) is solved by regression for u(x) from the system of equations (6) with \(s_j(x)={\hat{s}}_j(x)\). Further, as in [58], we may also consider
$$\begin{aligned} \min _{u \geqq 0}~ \sum _{j=1}^N \frac{1}{2} \Vert g_j-A_j u\Vert ^2 + \alpha R(u), \end{aligned}$$
(9)
with
$$\begin{aligned}{}[A_j u](x)&:=\langle b_j,u(x)b_j\rangle , \quad \text {and} \nonumber \\ g_j(x)&:=\log ({\hat{s}}_j(x)/{\hat{s}}_0(x)). \end{aligned}$$
(10)
In both of these linearised models, the assumption of Gaussian noise is in principle even more remote from the truth than in the non-linear model (7). We will employ (8) and (7) as benchmark models.
We want to further simplify the model, and forgo with accurate noise modelling. After all, we often do not know the real noise model for the data available in practice. It can be corrupted by process artefacts from black-box algorithms in the MRI devices. This problem of black-box devices has been discussed extensively in [44], in the context of Computed Tomography. Moreover, as we have discussed above, even without such artefacts, the correct model may be difficult to realise numerically. So we might be best off choosing the least assuming model of all—that of error bounds. This is what we propose in the reconstruction model
$$\begin{aligned} \min _u~ R(u) \quad \text {s.t.} \quad&u \geqq 0, \nonumber \\&g_j^l \leqslant A_j u \leqslant g_j^u, \nonumber \\&\quad \mathcal {L}^n\text {-a.e.},\ (j=1,\ldots ,N). \end{aligned}$$
(11)
Here \(g_j^l :=\log ({\hat{s}}_j^l/{\hat{s}}_0^u)\) and \(g_j^u :=\log ({\hat{s}}_j^u/{\hat{s}}_0^l)\), \(g_j^l, g_j^u \in L^2({\varOmega })\), are our upper and lower bounds on \(g_j\) that we derive from the data.
Choice of the Regulariser R
A prototypical regulariser in image processing is the total variation, first studied in this context in [45]. It can be defined for a \(u \in L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) as
$$\begin{aligned} \begin{aligned} {{\mathrm{TV}}}(u)&:=\Vert Eu\Vert _{\mathcal {M}({\varOmega }; {{\mathrm{Sym}}}^{k+1}(\mathbb {R}^m))} \\&:=\sup \left\{ \int _{\varOmega }\langle {{\mathrm{div}}}\phi (x),u(x)\rangle {{\mathrm{d}}}x \right. \\&\left. \qquad \qquad \qquad \Bigm | \begin{array}{l} \phi \in C_c^\infty ({\varOmega }; {{\mathrm{Sym}}}^{k+1}(\mathbb {R}^m)) \\ \sup _x \Vert \phi (x)\Vert _F \le 1 \end{array} \right\} . \end{aligned} \end{aligned}$$
Observe that for scalar or vector fields, i.e. the cases \(k=0,1\), we have \({{\mathrm{Sym}}}^0(\mathbb {R}^m)=\mathcal {T}^0(\mathbb {R}^m)=\mathbb {R}\), and \({{\mathrm{Sym}}}^1(\mathbb {R}^m)=\mathcal {T}^1(\mathbb {R}^m)=\mathbb {R}^m\). Therefore, for scalars in particular, this gives the usual isotropic total variation
$$\begin{aligned} {{\mathrm{TV}}}(u) = \Vert Du\Vert _{\mathcal {M}({\varOmega }))}. \end{aligned}$$
Total generalised variation was introduced in [9] as a higher-order extension of \({{\mathrm{TV}}}\). Following [11, 57], the second-order variant may be defined with the differentiation cascade formulation for symmetric tensor fields \(u \in L^1({\varOmega }; {{\mathrm{Sym}}}^{k}(\mathbb {R}^m))\) as the marginal
$$\begin{aligned} {{\mathrm{TGV}}}^2_{(\beta ,\alpha )}(u):= & {} \min \Big \{ {\varPhi }_{(\beta ,\alpha )}(u, w)\nonumber \\&\mid w \in L^1({\varOmega }; {{\mathrm{Sym}}}^{k+1}(\mathbb {R}^m)) \Big \} \end{aligned}$$
(12)
of
$$\begin{aligned} \begin{aligned} {\varPhi }_{(\beta ,\alpha )}(u, w)&:=\alpha \Vert E u - w\Vert _{F,\mathcal {M}({\varOmega }; {{\mathrm{Sym}}}^{k+1}(\mathbb {R}^m))}\\&\phantom {:=} +\beta \Vert E w\Vert _{F,\mathcal {M}({\varOmega }; {{\mathrm{Sym}}}^{k+2}(\mathbb {R}^m))}. \end{aligned} \end{aligned}$$
It turns out that the standard BV-norm
$$\begin{aligned} \Vert u\Vert _{{{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))} :=\Vert u\Vert _{L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))}+{{\mathrm{TV}}}(u) \end{aligned}$$
and the “BGV norm” [9]
$$\begin{aligned} \Vert u\Vert ' :=\Vert u\Vert _{L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))}+{{\mathrm{TGV}}}^2_{(\beta ,\alpha )}(u) \end{aligned}$$
are topologically equivalent norms [10, 11] on the space \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), yielding the same convergence results for TGV regularisation as for TV regularisation. The geometrical regularisation behaviour is, however, different, and TGV tends to avoid the staircasing observed in TV regularisation.
Regarding topologies, we say that a sequence \(\{u^i\}\) in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) converges weakly* to u, if \(u^i \rightarrow u\) strongly in \(L^1\), and
weakly* as Radon measures [2, 51, 57]. The latter means that for all \(\phi \in C_c^\infty ({\varOmega }; {{\mathrm{Sym}}}^{k+1}(\mathbb {R}^m))\) holds \(\int _{\varOmega }\langle {{\mathrm{div}}}\phi (x),u^i(x)\rangle {{\mathrm{d}}}x \rightarrow \int _{\varOmega }\langle {{\mathrm{div}}}\phi (x),u(x)\rangle {{\mathrm{d}}}x\).
Compact Subspaces
Now, for a weak* lower semi-continuous seminorm R on \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), let us set
$$\begin{aligned} {{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m)) :={{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m)) / \ker R. \end{aligned}$$
That is, we identify elements \(u, {\tilde{u}} \in {{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), such that \(R(u - {\tilde{u}})=0\). Now R is a norm on the space \({{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\); compare, e.g. [42] for the case of \(R={{\mathrm{TV}}}\).
Suppose
$$\begin{aligned} \Vert u\Vert ' :=\Vert u\Vert _{L^1({\varOmega })} + R(u) \end{aligned}$$
is a norm on \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), equivalent to the standard norm. If also the R
-Sobolev-Korn-Poincaré inequality
$$\begin{aligned} \inf _{R(v)=0} \Vert u-v\Vert _{L^1({\varOmega })} \le C R(u) \end{aligned}$$
(13)
holds, we may then bound
$$\begin{aligned}&\inf _{R(v)=0} \Vert u-v\Vert _{{{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))} \le \inf _{R(v)=0} C' \Vert u-v\Vert ' \\&\quad = \inf _{R(v)=0} C' \bigl ( \Vert u-v\Vert _{L^1({\varOmega })} + R(u-v)\bigr ) \\&\quad \le C' (1+C)R(u). \end{aligned}$$
Now, using the weak* lower semicontinuity of the BV-norm, and the weak* compactness of the unit ball in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\)—we refer to [2] for these and other basic properties of BVspaces—we may thus find a representative \({\tilde{u}}\) in the \({{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) equivalence class of u, satisfying
$$\begin{aligned} \Vert {\tilde{u}}\Vert _{{{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))} \le C' (1+C)R(u). \end{aligned}$$
Again using the weak* compactness of the unit ball in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), and the weak* lower semicontinuity of R, it follows that the sets
$$\begin{aligned} {{\mathrm{lev}}}_a R:= & {} \left\{ u \in {{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m)) \mid R(u) \le a \right\} , \nonumber \\&\quad (a>0), \end{aligned}$$
(14)
are weak* compact in \({{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), in the topology inherited form \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\). Consequently, they are strongly compact subsets of the space \(L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\). This feature is crucial for the application of the regularisation theory in Banach lattices above.
On a connected domain \({\varOmega }\), in particular
$$\begin{aligned} {{\mathrm{BV}}}_{0,{{\mathrm{TV}}}}({\varOmega }) \simeq \left\{ u \in {{\mathrm{BV}}}({\varOmega }) \big | \int _{\varOmega }u {{\mathrm{d}}}x=0 \right\} . \end{aligned}$$
That is, the space consists of zero-mean functions. Then \(u \mapsto \Vert Du\Vert _{\mathcal {M}({\varOmega }; \mathbb {R}^m)}\) is a norm on \({{\mathrm{BV}}}_{0,{{\mathrm{TV}}}}({\varOmega })\) [42], and this space is weak* compact. In particular, the sets \({{\mathrm{lev}}}_a {{\mathrm{TV}}}\) are compact in \(L^1({\varOmega })\).
More generally, we know from [8] that on a connected domain \({\varOmega }\), \(\ker {{\mathrm{TV}}}\) consists of \({{\mathrm{Sym}}}^k(\mathbb {R}^m)\)-valued polynomials of maximal degree k. By extension, the kernel of \({{\mathrm{TGV}}}^2\) consists of \({{\mathrm{Sym}}}^k(\mathbb {R}^m)\)-valued polynomials of maximal degree \(k+1\). In both cases, (13), weak* lower semicontinuity of R and the equivalence of \(\Vert \,\varvec{\cdot }\,\Vert '\) to \(\Vert \,\varvec{\cdot }\,\Vert _{{{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))}\) hold by the results in [8, 11, 51]. Therefore, we have proved the following.
Lemma 1
Let \({\varOmega }\subset \mathbb {R}^m\) and \(k \ge 0\). Then the sets \({{\mathrm{lev}}}_a {{\mathrm{TV}}}\) and \({{\mathrm{lev}}}_a {{\mathrm{TGV}}}^2\) are weak* compact in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) and strongly compact in \(L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\).
Now, in the above cases, \(\ker R\) is finite-dimensional, and we may write
$$\begin{aligned} {{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m)) \simeq {{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m)) \oplus \ker R. \end{aligned}$$
Denoting by
$$\begin{aligned} B_X(r) :=\{x \in X \mid \Vert x\Vert \le r\}, \end{aligned}$$
the closed ball of radius r in a normed space X, we obtain by the finite-dimensionality of \(\ker R\) the following result.
Proposition 1
Let \({\varOmega }\subset \mathbb {R}^m\) and \(k \ge 0\). Pick \(a>0\). Then the sets
$$\begin{aligned} V :={{\mathrm{lev}}}_a R \oplus B_{\ker R}(a) \end{aligned}$$
for both regularisers \(R={{\mathrm{TV}}}\) and \(R={{\mathrm{TGV}}}^2\), are weak* compact in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) and strongly compact in \(L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\).
The next result summarises Theorem 3 and Proposition 1.
Theorem 4
With \(U=L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), let the operator \(A: U \rightarrow F\) be linear, continuous and injective. Let \(f^l_n\) and \(f^u_n\) be sequences of lower and upper bounds for the right-hand such that
$$\begin{aligned} \begin{aligned}&f^l_n :f^l_{n+1} \geqslant f^l_n,&\quad&f^u_n :f^u_{n+1} \leqslant f^u_n, \\&f^l_n \leqslant f \leqslant f^u_n,&\quad&\Vert f^l_n - f^u_n \Vert \rightarrow 0 \quad \text {as } n \rightarrow \infty . \end{aligned} \end{aligned}$$
Supposing that there are no errors in the operator A and the exact solution \({\bar{u}}\) exists, define the feasible set as follows
$$\begin{aligned} U_n = \left\{ u \in U :\quad f^l_n \leqslant _F A u \leqslant _F f^u_n \right\} . \end{aligned}$$
Decomposing \(u \in U\) as \(u=u_0+u^\perp \) with \(u^\perp \in \ker R\), suppose
$$\begin{aligned} u \in U_n \implies \Vert u^\perp \Vert \le a \end{aligned}$$
(15)
for some constant \(a>0\), then for \(R={{\mathrm{TV}}}\) and \(R={{\mathrm{TGV}}}^2\), the sequence
$$\begin{aligned} u_n = \mathop {\hbox {arg min}}\limits _{u \in U_n} R(u) \end{aligned}$$
converges strongly in \(L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) to the exact solution \({\bar{u}}\) and \(R(u_n) \rightarrow R({\bar{u}})\).
Proof
With the decomposition \(u_n = u_{0,n} + u_n^\perp \), where \(u_n^\perp \in \ker R\), we have \(u_{0,n} \in {{\mathrm{lev}}}_a R\) for suitably large \(a>0\) through
$$\begin{aligned} R(u_{0,n}) = R(u_n) = \min _{u' \in U_n} R(u') \le R({\bar{u}}). \end{aligned}$$
The assumption (15) bounds \(\Vert u_n^\perp \Vert \le a\). Thus \(u_n \in V\) for V as in Proposition 1. The proposition thus implies the necessary compactness in \(U=L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) for the application of Theorem 3.
Remark 1
The condition (15) simply says for \(R={{\mathrm{TV}}}\) that the data have to bound the solution in mean. This is very reasonable to expect for practical data; anything else would be very non-degenerate. For \(R={{\mathrm{TGV}}}^2\) we also need that the data bound the entire affine part of the solution. Again, this is very likely for real data. Indeed, in DTI practice, with at least 6 independent diffusion-sensitising gradients, A is an invertible or even over-determined linear operator. In that typical case, the bounds \(f^l_n\) and \(f^u_n\) will be translated into \(U_n\) being a bounded set.