# Diffusion Tensor Imaging with Deterministic Error Bounds

## Abstract

Errors in the data and the forward operator of an inverse problem can be handily modelled using partial order in Banach lattices. We present some existing results of the theory of regularisation in this novel framework, where errors are represented as bounds by means of the appropriate partial order. We apply the theory to diffusion tensor imaging, where correct noise modelling is challenging: it involves the Rician distribution and the non-linear Stejskal–Tanner equation. Linearisation of the latter in the statistical framework would complicate the noise model even further. We avoid this using the error bounds approach, which preserves simple error structure under monotone transformations.

## Keywords

Diffusion tensor imaging Noise modelling Total generalised variation Error bounds Deterministic## Mathematics Subject Classification

92C55 94A08## 1 Introduction

*R*on the feasible set. The tensor field

*u*is our unknown image. It is subject to a positivity constraint, as well as partial order constraints imposed through the operators \([A_j u](x) :=-\langle b_j,u(x)b_j\rangle \), and the upper and lower bounds \(g_j^l :=\log ({\hat{s}}_j^l/{\hat{s}}_0^u)\) and \(g_j^u :=\log ({\hat{s}}_j^u/{\hat{s}}_0^l)\). These arise from the linearisation (via the monotone logarithmic transformation) of the Stejskal-Tanner equation

To shed more light on *u* and the equation (1), let us briefly outline the diffusion tensor imaging process. As a first step towards DTI, diffusion-weighted magnetic resonance imaging (DWI) is performed. This process measures the anisotropic diffusion of water molecules. To capture the diffusion information, the magnetic resonance images have to be measured with diffusion-sensitising gradients in multiple directions. These are the different \(b_i\)’s in (1). Eventually, multiple DWI images \(\{s_j\}\) are related through the Stejskal–Tanner equation (1) to the symmetric positive-definite diffusion tensor field \(u: {\varOmega }\rightarrow {{\mathrm{Sym}}}^2(\mathbb {R}^3)\) [4, 29]. At each point \(x \in {\varOmega }\), the tensor *u*(*x*) is the covariance matrix of a normal distribution for the probability of water diffusing in different spatial directions.

The fact that multiple \(b_i\)’s are needed to recover *u* leads to very long acquisition times, even with ultra fast sequences like echo planar imaging (EPI). Therefore, DTI is inherently a low-resolution and low-SNR method. In theory, the amplitude DWI images exhibit Rician noise [24]. However, as the histogram of an in vivo measurement in Fig. 1 illustrates, this may not be the case for practical datasets from black-box devices. Moreover, the DWI process is prone to eddy-current distortions [53], and due to the slowness of it, it is very sensitive to patient motion [1, 27]. We therefore have to use techniques that remove these artefacts in solving for *u*(*x*). We also need to ensure the positivity *u*, as non-positive-definite diffusion tensor are non-physical. One proposed approach for the satisfaction of this constraint is that of log-Euclidean metrics [3]. This approach has several theoretically desirable aspects, but some practical shortcomings [57]. Special Perona–Malik-type constructions on Riemannian manifolds can also be used to maintain the structure of the tensor field [14, 54]. Such anisotropic diffusion is, however, severely ill-posed [60]. Recently manifold-valued discrete-domain total variation models have also been applied to diffusion tensor imaging [6].

Our approach is also in the total variation family, first considered for diffusion tensor imaging in [48]. Namely, we follow up on the work in [55, 57, 58, 59] on the application of total generalised variation regularisation [9] to DTI. We should note that in all of these works, the fidelity function was the ROF-type [45] \(L^2\) fidelity. This would only be correct, according to the assumption that noise of MRI measurements is Gaussian, if we had access to the original complex k-space MRI data. The noise of the inverse Fourier-transformed magnitude data \(s_j\), that we have in practice access to, is however Rician under the Gaussian assumption on the original complex data [24]. This is not modelled by the \(L^2\) fidelity.

Numerical implementation of Rician noise modelling has been studied in [22, 40]. As already discussed, in this work, we take the other direction. Instead of modelling the errors in a statistically accurate fashion, not assuming to know an exact noise model, we represent them by means of pointwise bounds. The details of the model are presented in Sect. 3. We study the practical performance in Sect. 5 using the numerical method presented in Sect. 4. First we, however, start with the general error modelling theory in Sect. 2. Readers who are not familiar with notation for Banach lattices or symmetric tensor fields are advised to start with the Appendix 2 where we introduce our mathematical notation and techniques.

## 2 Deterministic Error Modelling

### 2.1 Mathematical Basis

*U*and

*F*are Banach lattices, \(A :U \rightarrow F\) is a regular injective operator. The inaccuracies in the right-hand side

*f*and the operator

*A*are represented as bounds by means of appropriate partial orders, i.e.

*F*and \(\leqslant _{L^\sim (U,F)}\) for the partial order for regular operators induced by partial orders in

*U*and

*F*. Further, we will drop the subscripts at inequality signs where it will not cause confusion.

*f*and operator

*A*are not available. Given the approximate data \((f^l,f^u,A^l,A^u)\), we need to find an approximate solution

*u*that converges to the exact solution \({\bar{u}}\) as the inaccuracies in the data diminish. This statement needs to be formalised. We consider monotone convergent sequences of lower and upper bounds

Let us ask the following question. What are the elements \(u \in U\) that could have produced data within the tolerances (4)? Obviously, the exact solution is one of such elements. Let us call the set containing all such elements the feasible set \(U_n \subset U\).

*a priori*that the exact solution is positive (by means of the appropriate partial order in

*U*), then it is easy to verify that the following inequalities hold for all \(n \in {\mathbb {N}}\)

*n*an element \(u_n\) of the set \(U_n\) such that the sequence \(u_n \in U_n\) will strongly converge to the exact solution \({\bar{u}}\). We do so by minimising an appropriate regularisation functional

*R*(

*u*) on \(U_n\):

### **Theorem 1**

- 1.
*R*(*u*) is bounded from below on*U*, - 2.
*R*(*u*) is lower semi-continuous, - 3.
level sets \(\{u :R(u) \leqslant C\}\) (\(C=\mathrm{const}\)) are sequentially compact in

*U*(in the strong topology induced by the norm).

Examples of regularisation functionals that satisfy the conditions of Theorem 1 are as follows. Total Variation in \(L^1({\varOmega })\), where \({\varOmega }\) is a subset of \({\mathbb {R}}^n\), assures strong convergence in \(L^1\), given that the \(L^1\)-norm of the solution is bounded. The Sobolev norm \(\Vert u \Vert _{W^{1,q}({\varOmega })}\) yields strong convergence in the spaces \(L^p({\varOmega })\), where \(p \geqslant 1\), \(q > \frac{np}{p+n}\). The latter fact follows from the compact embedding of the corresponding Sobolev \(W^{1,q}({\varOmega })\) space into \(L^p({\varOmega })\) [16].

However, the assumption that the sets \(\{u :R(u) \leqslant C\}\) are strong compacts in *U* is quite strong. It can be replaced by the assumption of weak compactness, provided that the regularisation functional possesses the so-called Radon–Riesz property.

### **Definition 1**

A functional \(F :U \rightarrow {\mathbb {R}}\) has the Radon–Riesz property (sometimes called the *H*-property), if for any sequence \(u_n \in U\) weak convergence \(u_n \rightharpoonup u_0\) and simultaneous convergence of the values \(F(u_n) \rightarrow F(u_0)\) imply strong convergence \(u_n \rightarrow u_0\).

### **Theorem 2**

- 1.
*R*(*u*) is bounded from below on*U*, - 2.
*R*(*u*) is weakly lower semi-continuous, - 3.
level sets \(R(u) \leqslant C\) (\(C=\mathrm{const}\)) are weakly sequentially compact in

*U*, - 4.
*R*(*u*) possesses the Radon–Riesz property.

It is easy to verify that the norm in any Hilbert space possesses the Radon–Riesz property. Moreover, this holds for the norm in any reflexive Banach space [16].

As we explain in the Appendix 2 \(L^p({\varOmega }; {{\mathrm{Sym}}}^{2}(\mathbb {R}^m))\) is not a Banach lattice. Therefore, Theorems 1 and 2 cannot be applied directly. Further theoretical work will be undertaken to extend the framework to the non-lattice case. For the moment, however, we will prove that if there are no errors in the operator *A* in (2), the requirement that the solution space *U* is a lattice can be dropped.

### **Theorem 3**

*U*be a Banach space, and

*F*be a Banach lattice. Let the operator

*A*in (2) be a linear, continuous and injective operator. Let \(f^l_n\) and \(f^u_n\) be sequences of lower and upper bounds for the right-hand side defined in (4), and suppose that there are no errors in the operator

*A*. Let us redefine the feasible set in the following way

*R*(

*x*) satisfies conditions of either Theorem 1 or Theorem 2. Then the sequence defined in (5) strongly converges to the exact solution \({\bar{u}}\) and \(R(u_n) \rightarrow R({\bar{u}})\).

### *Proof*

Now we will proceed with the proof of convergence \(\Vert u_n-{\bar{u}}\Vert \rightarrow 0\). Will prove it for the case when the regulariser *R*(*u*) satisfies conditions of Theorem 1. Suppose that the sequence \(u_n\) does not converge to the exact solution \({\bar{u}}\), then it contains a subsequence \(u_{n_k}\) such that \(\Vert u_{n_k} - {\bar{u}}\Vert \geqslant \varepsilon \) for any \(k \in {\mathbb {N}}\) and some fixed \(\varepsilon >0\).

*A*and \(\Vert \cdot \Vert \). Therefore, \(A u_0 = A {\bar{u}}\) and \(u_0 = {\bar{u}}\), since

*A*is an injective operator. By contradiction, we get \(\Vert u_n - {\bar{u}}\Vert \rightarrow 0\).

Finally, since the regulariser *R*(*u*) is lower semi-continuous, we get that \(\lim \inf R(u_n) = R({\bar{u}})\). However, for any *n* we have \(R(u_n) \leqslant R({\bar{u}})\), therefore, we get the convergence \(R(u_n) \rightarrow R({\bar{u}})\) as \(n \rightarrow \infty \). \(\square \)

### 2.2 Philosophical Discussion and Statistical Interpretation

*f*individual

*random*upper and lower bounds \({\hat{f}}^u\) and \({\hat{f}}^l\) such that

*f*), the interval \([{\hat{f}}^{u,i}, {\hat{f}}^{l,i}]\) will converge in probability to the true data \({\hat{f}}^i\), as the number of experiments

*m*increases. Thus we obtain a probabilistic version of the convergences in (4).

*n*increases. As an example, for a rather typical single \(128 \times 128\) slice of a DTI measurement, the probability that exactly \(\phi =5\,\%\) (to the closest discrete value possible) of the \(1-\theta =95\,\%\) confidence intervals do not cover the true parameter would be about \(1.4\,\%\), or

*at least*\(\phi =5\,\%\) of the pointwise \(95\,\%\) confidence intervals not covering the true parameter is in this setting approximately \(49\,\%\). This can be verified by summing the above estimates over \(m=\lceil \phi n \rceil ,\ldots ,n\).

In summary, unless \(\theta \) simultaneously goes to 1, the product intervals are very unlikely to cover the true parameter. Based on a single experiment, the deterministic approach as interpreted statistically through confidence intervals is therefore very likely to fail to discover the true solution as the data size *n* increases unless the pointwise confidence is very low. But, if we let the pointwise confidences be arbitrarily high, such that the intervals are very large, the discovered solution in our applications of interest would be just a constant!

*n*and \(\theta \), it is easy to see that the solution of the “deterministic” error model is an asymptotically consistent and hence asymptotically unbiased estimator of the true

*f*. That is, the estimates converge in probability to

*f*as the experiment count

*m*increases. Indeed, the error bounds-based estimator \({\tilde{f}}_m\), based on

*m*experiments, by definition satisfies \({\tilde{f}}_m \in \prod _{i=1}^n I_i\). Therefore, we have

*is*the Bayes estimator for certain Bregman distances. One possible critique of the result is that these distances are not universal and do depend on the regulariser

*R*, unlike the squared distance for CM. The CM estimate, however, has other problems in the setting of total variation and its discretisation [35, 36].

## 3 Application to Diffusion Tensor Imaging

We now build our model for applying the deterministic error modelling theory to diffusion tensor imaging. We start by building our forward model based on the Stejskal–Tanner equation, and then briefly introduce the regularisers we use.

### 3.1 The Forward Model

*u*(

*x*) models the covariance of a Gaussian probability distribution at

*x*for the diffusion of water molecules. The data \(s_j \in L^2({\varOmega })\), (\(j=1,\ldots ,N\)), are the diffusion-weighted MRI images. Each of them is obtained by performing the MRI scan with a different non-zero diffusion-sensitising gradient \(b_j\), while \(s_0\) is obtained with a zero gradient. After correcting the original

*k*-space data for coil sensitivities, each \(s_j\) is assumed real. As a consequence, any measurement \({\hat{s}}_j\) of \(s_j\) has—in theory—Rician noise distribution [24].

*u*with simultaneous denoising. Following [31, 55], we consider using a suitable regulariser

*R*the Tikhonov model

*u*(

*x*) is positive semidefinite for \(\mathcal {L}^n\)-a.e. \(x \in {\varOmega }\) (see Appendix 2 for more details). Due to the Rician noise of \({\hat{s}}_j\), the Gaussian noise model implied by the \(L^2\)-norm in (7) is not entirely correct. However, in some cases the \(L^2\) model may be accurate enough, as for suitable parameters the Rician distribution is not too far from a Gaussian distribution. If one were to model the problem correctly, one should either modify the fidelity term to model Rician noise or include the (unit magnitude complex number) coil sensitivities in the model. The Rician noise model is highly non-linear due to the Bessel functional logarithms involved. Its approximations have been studied in [5, 22, 40] for single MR images and DTI. Coil sensitivities could be included either by knowing them in advance or by simultaneous estimation as in [30]. Either way, significant complexity is introduced into the model, and for the present work, we are content with the simple \(L^2\) model.

*f*(

*x*) is solved by regression for

*u*(

*x*) from the system of equations (6) with \(s_j(x)={\hat{s}}_j(x)\). Further, as in [58], we may also consider

### 3.2 Choice of the Regulariser *R*

Regarding topologies, we say that a sequence \(\{u^i\}\) in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) converges *weakly** to *u*, if \(u^i \rightarrow u\) strongly in \(L^1\), and Open image in new window weakly* as Radon measures [2, 51, 57]. The latter means that for all \(\phi \in C_c^\infty ({\varOmega }; {{\mathrm{Sym}}}^{k+1}(\mathbb {R}^m))\) holds \(\int _{\varOmega }\langle {{\mathrm{div}}}\phi (x),u^i(x)\rangle {{\mathrm{d}}}x \rightarrow \int _{\varOmega }\langle {{\mathrm{div}}}\phi (x),u(x)\rangle {{\mathrm{d}}}x\).

### 3.3 Compact Subspaces

*R*on \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\), let us set

*R*is a norm on the space \({{\mathrm{BV}}}_{0,R}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\); compare, e.g. [42] for the case of \(R={{\mathrm{TV}}}\).

*R*

*-Sobolev-Korn-Poincaré inequality*

*u*, satisfying

*R*, it follows that the sets

More generally, we know from [8] that on a connected domain \({\varOmega }\), \(\ker {{\mathrm{TV}}}\) consists of \({{\mathrm{Sym}}}^k(\mathbb {R}^m)\)-valued polynomials of maximal degree *k*. By extension, the kernel of \({{\mathrm{TGV}}}^2\) consists of \({{\mathrm{Sym}}}^k(\mathbb {R}^m)\)-valued polynomials of maximal degree \(k+1\). In both cases, (13), weak* lower semicontinuity of *R* and the equivalence of \(\Vert \,\varvec{\cdot }\,\Vert '\) to \(\Vert \,\varvec{\cdot }\,\Vert _{{{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))}\) hold by the results in [8, 11, 51]. Therefore, we have proved the following.

### **Lemma 1**

Let \({\varOmega }\subset \mathbb {R}^m\) and \(k \ge 0\). Then the sets \({{\mathrm{lev}}}_a {{\mathrm{TV}}}\) and \({{\mathrm{lev}}}_a {{\mathrm{TGV}}}^2\) are weak* compact in \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) and strongly compact in \(L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\).

*r*in a normed space

*X*, we obtain by the finite-dimensionality of \(\ker R\) the following result.

### **Proposition 1**

The next result summarises Theorem 3 and Proposition 1.

### **Theorem 4**

*A*and the exact solution \({\bar{u}}\) exists, define the feasible set as follows

### *Proof*

*V*as in Proposition 1. The proposition thus implies the necessary compactness in \(U=L^1({\varOmega }; {{\mathrm{Sym}}}^k(\mathbb {R}^m))\) for the application of Theorem 3.

### *Remark 1*

The condition (15) simply says for \(R={{\mathrm{TV}}}\) that the data have to bound the solution in mean. This is very reasonable to expect for practical data; anything else would be very non-degenerate. For \(R={{\mathrm{TGV}}}^2\) we also need that the data bound the entire affine part of the solution. Again, this is very likely for real data. Indeed, in DTI practice, with at least 6 independent diffusion-sensitising gradients, *A* is an invertible or even over-determined linear operator. In that typical case, the bounds \(f^l_n\) and \(f^u_n\) will be translated into \(U_n\) being a bounded set.

## 4 Solving the Optimisation Problem

### 4.1 The Chambolle–Pock Method

*X*and

*Y*. The operator \(K:X \rightarrow Y\) is linear, although an extension of the method to non-linear

*K*has recently been derived [55]. The PDHGM can also be seen as a preconditioned ADMM (alternating directions method of multipliers); we refer to [18, 47, 56] for reviews of optimisation methods popular in image processing. For step sizes \(\tau ,\sigma >0\), and an over-relaxation parameter \(\omega >0\), each iteration of the algorithm consists of the updates

*u*) and dual (

*y*) updates here is reversed from the original presentation in [13]. The reason is that when reordered, the updates can, as discovered in [26], be easily written in a proximal point form.

The first and last update are the backward (proximal) steps for the primal (*x*) and dual (*y*) variables, respectively, keeping the other fixed. However, the dual step includes some “inertia” or over-relaxation, as specified by the parameter \(\omega \). Usually \(\omega =1\), which is required for convergence proofs of the method. If *G* or \(F^*\) is uniformly convex, by smartly choosing for each iteration the step length parameters \(\tau ,\sigma \), and the inertia \(\omega \), the method can be shown to have convergence rate \(O(1/N^2)\). This is similar to Nesterov’s optimal gradient method [43]. In the general case the rate is *O*(1 / *N*). In practice the method produces visually pleasing solutions in rather few iterations, when applied to image processing problems.

As a further implementation note, since the algorithm (17) is formulated in Hilbert spaces (see however [28]) while our problems are formulated in the Banach space \({{\mathrm{BV}}}({\varOmega }; {{\mathrm{Sym}}}^2(\mathbb {R}^3))\), we have to discretise our problems before application of the algorithm. We do this by simple forward-differences discretisation of the operator *E* with cell width \(h=1\) on a regular rectangular grid corresponding to the image voxels.

### 4.2 Implementation of Deterministic Constraints

*u*to

*g*, there in fact exists a solution to \(g=Au\). The condition (21) becomes \(g^l \leqslant g^u\), immediately guaranteed through the monotonicity of (18), and the trivial conditions \(s_j^l \leqslant s_j^u\).

Numerical results for the synthetic data

Method | Parameter choice | Frobenius PSNR | Pr. e.val. PSNR | Pr. e.vect. angle PSNR |
---|---|---|---|---|

Regression | 33.90 dB | 25.04 dB | 47.86 dB | |

Linear \(L^2\) | Discr. principle | 32.93 dB | 27.81 dB | 61.89 dB |

Linear \(L^2\) | Frob. error-optimal | 34.51 dB | 28.42 dB | 60.93 dB |

Non-linear \(L^2\) | Discr. principle | 37.33 dB | 27.81 dB | 61.89 dB |

Non-linear \(L^2\) | Frob. error-optimal | 37.44 dB | 28.03 dB | 61.12 dB |

Constraints | 90 % | 32.28 dB | 28.86 dB | 65.65 dB |

Constraints | 95 % | 30.97 dB | 28.14 dB | 64.80 dB |

Constrains | 99 % | 27.86 dB | 24.51 dB | 61.41 dB |

## 5 Experimental Results

Numerical results for the in vivo brain data. For the \(L^2\) and non-linear \(L^2\) reconstruction models the free parameter chosen by the parameter choice criterion is the regularisation parameter \(\alpha \), and for the constrained problem it is the confidence interval

Method | Parameter choice | Frobenius PSNR | Pr. e.val. PSNR | Pr. e.vect. angle PSNR |
---|---|---|---|---|

Regression | 32.35 dB | 33.67 dB | 28.56 dB | |

Linear \(L^2\) | Discr. principle | 34.80 dB | 36.35 dB | 24.81 dB |

Linear \(L^2\) | Frob. error-optimal | 34.81 dB | 36.32 dB | 24.97 dB |

Non-linear \(L^2\) | Discr. principle | 33.53 dB | 35.87 dB | 27.12 dB |

Non-linear \(L^2\) | Frob. error-optimal | 33.57 dB | 36.03 dB | 27.58 dB |

Constraints | 90 % | 33.71 dB | 34.93 dB | 27.00 dB |

Constraints | 95 % | 33.70 dB | 34.97 dB | 26.91 dB |

Constraints | 99 % | 33.67 dB | 34.89 dB | 26.88 dB |

### 5.1 Estimating Lower and Upper Bounds from Real Data

As we have already discussed, in practice the noise in the measurement signals \({\hat{s}}_j\) is not Gaussian or Rician; in fact we do not know the true noise distribution and other corruptions. Therefore, we have to estimate the noise distribution from the image background. To do this, we require a known correspondence between the measurement, the noise and the true value. As we have no better assumptions available, the standard one that we use is that of additive noise. Continuing in the statistical setting of Sect. 2.2, we now describe the procedure, working on discrete images expressed as vectors \({\hat{f}}=s_j \in \mathbb {R}^n\) for some fixed \(j \in \{0,1,\dots ,N\}\). We use superscripts to denote the voxel indices, that is \({\hat{f}}=(f^1,\ldots ,f^n)\).

*i*-th voxel, the measured value \({\hat{f}}^i\) is the sum of the true value \(f^i\) and additive noise \(\nu ^i\):

*F*of the noise, we could choose a confidence parameter \(\theta \in (0,1)\) and use the cumulative distribution function to calculate \(\nu _{\theta /2}, \nu _{1-\theta /2}\) such that

^{1}

The Dvoretzky–Kiefer–Wolfowitz inequality implies that the interval estimates converge to the true intervals, determined by (22), as the number of background pixels *k* increases with the image size *n*. This procedure, with large *k*, will therefore provide an estimate of a single-experiment (\(m=1\)) confidence interval for \(f^i\). We note that this procedure will, however, not yield the convergence of the interval estimate \([{\hat{f}}^{l,i}, {\hat{f}}^{u,i}]\) to the true data; for that we would need multiple experiments, i.e. multiple sample images (\(m>1\)), not just agglomeration of the background voxels into a single noise distribution estimate. In practice, however, we can only afford a single experiment (\(m=1\)), and cannot go to the limit.

### 5.2 Verification of the Approach with Synthetic Data

We apply the forward operators \(T_j(u_{g.t.})\) for each \(j=0, \ldots , 6\) to obtain the data \(s_j(x)\). We then add Rician noise to this data \(\bar{s_j}=s_j+\delta \) with \(\sigma =2\), which corresponds to \(PSNR \approx 27\hbox {dB}\).

We apply several models for solving the inverse problem of reconstructing *u*: the linear and non-linear \(L^2\) approaches (8) and (7), and the constrained problem (11). As the regulariser we use \(R={{\mathrm{TGV}}}^2_{(0.9\alpha ,\alpha )}\), where the choice \(\beta =0.9\alpha \) was made somewhat arbitrarily, however yielding good results for all the models. This is slightly lower than the range \([1, 1.5]\alpha \) discovered in comprehensive experiments for other imaging modalities [7, 38].

We find \(\alpha \) by solving this equation numerically using bisection method. We start by finding such \(\alpha _1,\alpha _2\) that \({\varDelta }\rho (\alpha _1)>0\) and \({\varDelta }\rho (\alpha _2)<0\). We calculate \({\varDelta }\rho (\alpha _3)\) for \(\alpha _3=\frac{\alpha _1+\alpha _2}{2}\) and depending on its sign replace either \(\alpha _1\) or \(\alpha _2\) with \(\alpha _3\). We repeat this procedure until the stopping criteria are reached.

As stopping criteria we use \(|f(\alpha )|<\epsilon \). We use \(\tau =1.05\), \(\epsilon =0.01\) for linear and \(\tau =1.2\), \(\epsilon =0.0001\) for non-linear \(L^2\) solution. A value of \(\tau \) yielding a reasonable degree of smoothness has been chosen by trial and error, and is different for the non-linear model, reflecting a different non-linear objective in the discrepancy principle. For the constrained problem we calculate \(\theta =90\), 95 and \(99\,\%\) confidence intervals to generate the upper and lower bounds. We, however, digress a little bit from the approach of Sect. 2.2. Minding that we do not know the true underlying distribution, which fails to be Rician as illustrated in Fig. 1, we do not use it to calculate the confidence intervals, but use the estimation procedure described in Sect. 5.1. We stress that we only have a single sample of each signal \(s_j\), so are unable to verify any asymptotic estimation properties.

*u*of 2-tensors on \({\varOmega }\subset \mathbb {R}^m\) as

*u*(

*x*) at a point \(x \in {\varOmega }\). It measures how far the ellipsoid prescribed by the eigenvalues and eigenvectors is from a sphere, with \(FA _u(x)=1\) corresponding a full sphere, and \(FA _u(x)=0\) corresponding to a degenerate object not having full dimension.

As we can see, the non-linear approach (7) performs overall the best by a wide margin, in terms of the pointwise Frobenius error, i.e. error in \(\Vert \cdot \Vert _{F,2}\). This is expressed as a PSNR in Table 1. What is, however, interesting, is that the constraint-based approach (11) has a much better reconstruction of the principal eigenvector angle, and a comparable reconstruction of its magnitude. Indeed, the 95 % confidence interval in Figs. 3(g) and 4(g) suggests a nearly perfect reconstruction in terms of smoothness. But, the Frobenius PSNR in Table 1 for this approach is worse than the simple unregularised inversion by regression. The problem is revealed by Fig. 5(f): the large white cloudy areas indicate huge fractional anisotropy errors, while at the same time, the principal eigenvector angle errors expressed in colour are much lower than for other approaches. Good reconstruction of the principal eigenvector is important for the process of tractography, i.e. the reconstruction of neural pathways in a brain. One explanation for our good results is that the regulariser completely governs the solution in areas where the error bounds are inactive due to generally low errors. This results in very smooth reconstructions, which is in the present case desirable as our synthetic tensor field is also smooth within the helix.

### 5.3 Results with In Vivo Brain Imaging Data

We now wish to study the proposed regularisation model on a real in vivo diffusion tensor image. Our data are that of a human brain, with the measurements of a volunteer performed on a clinical 3T system (Siemens Magnetom TIM Trio, Erlangen, Germany), with a 32 channel head coil. A 2D diffusion-weighted single-shot EPI sequence with diffusion-sensitising gradients applied in 12 independent directions (\(b = 1000\,\hbox {s}/\hbox {mm}^2\)). An additional reference scan without diffusion was used with the parameters: \(\hbox {TR}=7900\hbox {ms}\), \(\hbox {TE}=94\hbox {ms}\), flip angle \(90^\circ \). Each slice of the 3D dataset has plane resolution \(1.95\,\hbox {mm} \times 1.95\,\hbox {mm}\), with a total of \(128 \times 128\) pixels. The total number of slices is 60 with a slice thickness of 2mm. The dataset consists of 4 repeated measurements. The GRAPPA acceleration factor is 2. Prior to the reconstruction of the diffusion tensor, eddy-current correction was performed with FSL [50]. Written informed consent was obtained from the volunteer before the examination.

For error bounds calculation according to the procedure of Sect. 5.1, to avoid systematic bias near the brain, we only use about 0.6 % of the total volume near the borders, or roughly \(k \approx 6000\) voxels.

To estimate errors for the all the considered reconstruction models, for each gradient direction \(b_i\) we use only one out of the four duplicate measurements. We then calculate the errors using a somewhat less than ideal pseudo-ground-truth, which is the linear regression reconstruction from all the available measurements.

The results are in Table 2 and Figs. 6, 7, and 8, again with the first of the figures showing the colour-coded principal eigenvector of the reconstruction, the second showing the fractional anisotropy and principal eigenvectors and the last one the errors in the latter two, in a colour-coded manner. Again, all plots are masked to represent only the non-zero region. In the figures, we concentrate on error bounds based on 95 % confidence intervals, as the results for the 90 and 99 % cases do not differ significantly according to Table 2.

This time, the linear \(L^2\) approach (8) has best overall reconstruction (Frobenius PSNR), while the non-linear \(L^2\) approach (7) has clearly the best principal eigenvector angle reconstruction besides the regression, which does not seem entirely reliable regarding our regression-based pseudo-ground-truth. The constraints-based approach (11), with 95 % confidence intervals is, however, not far behind in terms of numbers. More detailed study of the *corpus callosum* in Fig. 8 (small picture in picture) and Fig. 7, however, indicates a better reconstruction of this important region by the non-linear approach. The constrained approach has some very short vectors there in the white region. Naturally, however, these results on the in vivo data should be taken with a grain of salt, as we have only a somewhat unreliable pseudo-ground-truth available for comparison purposes.

### 5.4 Conclusions from the Numerical Experiments

Our conclusion is that the error bounds-based approach is a feasible alternative to standard modelling with incorrect Gaussian assumptions. It can produce good reconstructions, although the non-linear \(L^2\) approach of [55] is possibly slightly more reliable. The latter does, however, in principle depend on a good initialisation of the optimisation method, unlike the convex bounds-based approach.

Further theoretical work will be undertaken to extend the partial-order-based approach to modelling errors in linear operators to the non-lattice case of the semidefinite partial order for symmetric matrices, which will allow us to consider problems of diffusion MRI with errors in the forward operator.

It also needs to be investigated whether the error bounds approach needs to be combined with an alternative, novel, regulariser that would ameliorate the fractional anisotropy errors that the approach exhibits. It is important to note, however, that from the practical point of view, of using the reconstruction tensor field for basic tractography methods based solely on principal eigenvectors, these are not that critical. As pointed out by one of the reviewers, the situation could differ with more recent geodesic tractography methods [20, 21, 25] employing the full tensor. We provide basic principal eigenvector tractography results for reference in Fig. 9, without attempting to extensively interpret the results. It suffices to say that the results look comparable. With this in sight, the error bounds approach produces a very good reconstruction of the direction of the principal eigenvectors, although we saw some problems with the magnitude within the *corpus callosum*.

## Footnotes

- 1.
Recall that for a random variable

*X*with a cumulative distribution function*F*, the quantile function \(F^{-1}\) returns a number \(x_\theta =F^{-1}(\theta )\) such that \(P(X \leqslant x_\theta ) = \theta \). - 2.
Recall that an indexed subset \(\{ x_\tau :\tau \in \{\tau \} \}\) of an ordered vector space

*X*is called directed upwards if for any pair \(\tau _1,\,\tau _2 \in \{\tau \}\) there exists \(\tau _3 \in \{\tau \}\) such that \(x_{\tau _3} \geqslant x_{\tau _1}\) and \(x_{\tau _3} \geqslant x_{\tau _2}\).

## Notes

### Acknowledgments

While at the Center for Mathematical Modelling of the Escuela Politécnica Nacional in Quito, Ecuador, T. Valkonen has been supported by a Prometeo scholarship of the Senescyt (Ecuadorian Ministry of Science, Technology, Education, and Innovation). In Cambridge, T. Valkonen has been supported by the EPSRC Grants No. EP/J009539/1 “Sparse & Higher-order Image Restoration” and No. EP/M00483X/1 “Efficient computational tools for inverse imaging problems”. A. Gorokh and Y. Korolev are grateful to the RFBR (Russian Foundation for Basic Research) for partial financial support (Projects 14-01-31173 and 14-01-91151). The authors would also like to thank Karl Koschutnig for the in vivo dataset, Kristian Bredies for scripts used to generate the tractography images and Florian Knoll for many inspiring discussions.

## References

- 1.Aksoy, M., Forman, C., Straka, M., Skare, S., Holdsworth, S., Hornegger, J., Bammer, R.: Real-time optical motion correction for diffusion tensor imaging. Magn. Reson. Med.
**66**(2), 366–378 (2011). doi: 10.1002/mrm.22787 CrossRefGoogle Scholar - 2.Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, Oxford (2000)zbMATHGoogle Scholar
- 3.Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med.
**56**(2), 411–421 (2006)CrossRefGoogle Scholar - 4.Basser, P.J., Jones, D.K.: Diffusion-tensor MRI: theory, experimental design and data analysis—a technical review. NMR Biomed.
**15**(7–8), 456–467 (2002). doi: 10.1002/nbm.783 CrossRefGoogle Scholar - 5.Basu, S., Fletcher, T., Whitaker, R.: Rician noise removal in diffusion tensor mri. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2006, Lecture Notes in Computer Science, vol. 4190, pp. 117–125. Springer, Berlin (2006). doi: 10.1007/11866565_15
- 6.Bačák, M., Bergmann, R., Steidl, G., Weinmann, A.: A second order non-smooth variational model for restoring manifold-valued images (2015)Google Scholar
- 7.Benning, M., Gladden, L., Holland, D., Schönlieb, C.B., Valkonen, T.: Phase reconstruction from velocity-encoded MRI measurements—a survey of sparsity-promoting variational approaches. J. Magn. Reson.
**238**, 26–43 (2014). doi: 10.1016/j.jmr.2013.10.003 CrossRefGoogle Scholar - 8.Bredies, K.: Symmetric tensor fields of bounded deformation. Annali di Matematica Pura ed Applicata
**192**(5), 815–851 (2013). doi: 10.1007/s10231-011-0248-4 MathSciNetCrossRefzbMATHGoogle Scholar - 9.Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci.
**3**, 492–526 (2011). doi: 10.1137/090769521 MathSciNetCrossRefzbMATHGoogle Scholar - 10.Bredies, K., Kunisch, K., Valkonen, T.: Properties of \(L^1\)-\(\text{ TGV }^2\): the one-dimensional case. J. Math. Anal. Appl.
**398**, 438–454 (2013). doi: 10.1016/j.jmaa.2012.08.053 MathSciNetCrossRefzbMATHGoogle Scholar - 11.Bredies, K., Valkonen, T.: Inverse problems with second-order total generalized variation constraints. In: Proceedings of the 9th International Conference on Sampling Theory and Applications (SampTA) 2011, Singapore (2011)Google Scholar
- 12.Burger, M., Lucka, F.: Maximum a posteriori estimates in linear inverse problems with log-concave priors are proper bayes estimators. Inverse Prob.
**30**(11), 114,004 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - 13.Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis.
**40**, 120–145 (2011). doi: 10.1007/s10851-010-0251-1 MathSciNetCrossRefzbMATHGoogle Scholar - 14.Chefd’hotel, C., Tschumperlé, D., Deriche, R., Faugeras, O.: Constrained flows of matrix-valued functions: Application to diffusion tensor regularization. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision–ECCV 2002, Lecture Notes in Computer Science, vol. 2350, pp. 251–265. Springer, Berlin (2002). doi: 10.1007/3-540-47969-4_17
- 15.Cox, D., Hinkley, D.: Theoretical Statistics. Taylor & Francis, London (1979)zbMATHGoogle Scholar
- 16.Dunford, N., Schwartz, J.T.: Linear Operators, Part I General Theory. Interscience Publishers, Hoboken (1958)zbMATHGoogle Scholar
- 17.Dvoretzky, A., Kiefer, J., Wolfowitz, J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat.
**27**(3), 642–669 (1956). doi: 10.1214/aoms/1177728174 MathSciNetCrossRefzbMATHGoogle Scholar - 18.Esser, E., Zhang, X., Chan, T.F.: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci.
**3**(4), 1015–1046 (2010). doi: 10.1137/09076934X MathSciNetCrossRefzbMATHGoogle Scholar - 19.Federer, H.: Geometric Measure Theory. Springer, New York (1969)zbMATHGoogle Scholar
- 20.Fuster, A., Dela Haije, T., Tristán-Vega, A., Plantinga, B., Westin, C.F., Florack, L.: Adjugate diffusion tensors for geodesic tractography in white matter. J. Math. Imaging Vis.
**54**(1), 1–14 (2016). doi: 10.1007/s10851-015-0586-8 MathSciNetCrossRefzbMATHGoogle Scholar - 21.Fuster, A., Tristan-Vega, A., Haije, T., Westin, C.F., Florack, L.: A novel riemannian metric for geodesic tractography in dti. In: Schultz, T., Nedjati-Gilani, G., Venkataraman, A., O’Donnell, L., Panagiotaki, E. (eds.) Computational Diffusion MRI and Brain Connectivity, Mathematics and Visualization, pp. 97–104. Springer, New York (2014). doi: 10.1007/978-3-319-02475-2_9
- 22.Getreuer, P., Tong, M., Vese, L.A.: A variational model for the restoration of MR images corrupted by blur and Rician noise. In: Advances in Visual Computing, Lecture Notes in Computer Science, vol. 6938, pp. 686–698. Springer, Berlin (2011). doi: 10.1007/978-3-642-24028-7_63
- 23.Grasmair, M., Haltmeier, M., Scherzer, O.: The residual method for regularizing ill-posed problems. Appl. Math. Comp.
**218**(6), 2693–2710 (2011). doi: 10.1016/j.amc.2011.08.009 MathSciNetCrossRefzbMATHGoogle Scholar - 24.Gudbjartsson, H., Patz, S.: The Rician distribution of noisy MRI data. Magn. Reson. Med.
**34**(6), 910–914 (1995)CrossRefGoogle Scholar - 25.Hao, X., Whitaker, R., Fletcher, P.: Adaptive riemannian metrics for improved geodesic tracking of white matter. In: Székely, G., Hahn, H.K. (eds.) Information Processing in Medical Imaging, Lecture Notes in Computer Science, vol. 6801, pp. 13–24. Springer, Berlin (2011). doi: 10.1007/978-3-642-22092-0_2
- 26.He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: From contraction perspective. SIAM J. Imaging Sci.
**5**(1), 119–149 (2012). doi: 10.1137/100814494 MathSciNetCrossRefzbMATHGoogle Scholar - 27.Herbst, M., Maclaren, J., Weigel, M., Korvink, J., Hennig, J., Zaitsev, M.: Prospective motion correction with continuous gradient updates in diffusion weighted imaging. Magn. Reson. Med. (2011). doi: 10.1002/mrm.23230
- 28.Hohage, T., Homann, C.: A generalization of the Chambolle-Pock algorithm to Banach spaces with applications to inverse problems (2014)Google Scholar
- 29.Kingsley, P.: Introduction to diffusion tensor imaging mathematics: Parts I-III. Concepts Magn. Reson. A
**28**(2), 101–179 (2006). doi: 10.1002/cmr.a.20048 CrossRefGoogle Scholar - 30.Knoll, F., Clason, C., Bredies, K., Uecker, M., Stollberger, R.: Parallel imaging with nonlinear reconstruction using variational penalties. Magn. Reson. Med.
**67**(1), 34–41 (2012)CrossRefGoogle Scholar - 31.Knoll, F., Raya, J.G., Halloran, R.O., Baete, S., Sigmund, E., Bammer, R., Block, T., Otazo, R., Sodickson, D.K.: A model-based reconstruction for undersampled radial spin-echo dti with variational penalties on the diffusion tensor. NMR Biomed.
**28**(3), 353–366 (2015). doi: 10.1002/nbm.3258 CrossRefGoogle Scholar - 32.Korolev, Y.: Making use of a partial order in solving inverse problems: II. Inverse Prob.
**30**(8), 085,003 (2014)MathSciNetCrossRefGoogle Scholar - 33.Korolev, Y., Yagola, A.: On inverse problems in partially ordered spaces with a priori information. J. Inverse Ill-Posed Prob.
**20**(4), 567–573 (2012)MathSciNetzbMATHGoogle Scholar - 34.Korolev, Y., Yagola, A.: Making use of a partial order in solving inverse problems. Inverse Prob.
**29**(9), 095,012 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - 35.Lassas, M., Saksman, E., Siltanen, S.: Discretization-invariant bayesian inversion and Besov space priors. Inverse Prob. Imaging
**3**(1), 87–122 (2009). doi: 10.3934/ipi.2009.3.87 MathSciNetCrossRefzbMATHGoogle Scholar - 36.Lassas, M., Siltanen, S.: Can one use total variation prior for edge-preserving bayesian inversion? Inverse Prob.
**20**(5), 1537 (2004). doi: 10.1088/0266-5611/20/5/013 MathSciNetCrossRefzbMATHGoogle Scholar - 37.Lehmann, E., Romano, J.: Testing Statistical Hypotheses. Springer Texts in Statistics. Springer, New York (2008)Google Scholar
- 38.de Los Reyes, J.C., Schönlieb, C.B., Valkonen, T.: Bilevel parameter learning for higher-order total variation regularisation models. arXiv:1508.07243 (2015)
- 39.Luxemburg, W., Zaanen, A.: Riesz Spaces. North-Holland Publishing Company, Amsterdam (1971)zbMATHGoogle Scholar
- 40.Martín, A., Schiavi, E.: Automatic total generalized variation-based DTI Rician denoising. In: Image Analysis and Recognition, Lecture Notes in Computer Science, vol. 7950, pp. 581–588. Springer, Berlin (2013). doi: 10.1007/978-3-642-39094-4_66
- 41.Massart, P.: The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Prob.
**18**(3), 1269–1283 (1990). doi: 10.1214/aop/1176990746 MathSciNetCrossRefzbMATHGoogle Scholar - 42.Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. American Mathematical Society, Boston (2001)CrossRefzbMATHGoogle Scholar
- 43.Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl.
**27**(2), 372–376 (1983)zbMATHGoogle Scholar - 44.Pan, X., Sidky, E.Y., Vannier, M.: Why do commercial ct scanners still employ traditional, filtered back-projection for image reconstruction? Inverse Prob.
**25**(12), 123,009 (2009). doi: 10.1088/0266-5611/25/12/123009 MathSciNetCrossRefzbMATHGoogle Scholar - 45.Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D
**60**, 259–268 (1992)MathSciNetCrossRefzbMATHGoogle Scholar - 46.Schaefer, H.: Banach Lattices and Positive Operators. Springer, New York (1974)CrossRefzbMATHGoogle Scholar
- 47.Setzer, S.: Operator splittings, bregman methods and frame shrinkage in image processing. Int. J. Comput. Vis.
**92**(3), 265–280 (2011). doi: 10.1007/s11263-010-0357-3 MathSciNetCrossRefzbMATHGoogle Scholar - 48.Setzer, S., Steidl, G., Popilka, B., Burgeth, B.: Variational methods for denoising matrix fields. In: Weickert, J., Hagen, H. (eds.) Visualization and Processing of Tensor Fields, pp. 341–360. Springer, New York (2009)Google Scholar
- 49.Shiryaev, A.N.: Probability. Graduate Texts in Mathematics. Springer, New York (1996)Google Scholar
- 50.Smith, S.M., Jenkinson, M., Woolrich, M.W., Beckmann, C.F., Behrens, T.E.J., Johansen-Berg, H., Bannister, P.R., Luca, M.D., Drobnjak, I., Flitney, D.E., Niazy, R.K., Saunders, J., Vickers, J., Zhang, Y., Stefano, N.D., Brady, J.M., Matthews, P.M.: Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage
**23**(Suppl 1), S208–S219 (2004). doi: 10.1016/j.neuroimage.2004.07.051 CrossRefGoogle Scholar - 51.Temam, R.: Mathematical problems in plasticity. Gauthier-Villars, Paris (1985)zbMATHGoogle Scholar
- 52.Tikhonov, A.N., Goncharsky, A.V., Stepanov, V.V., Yagola, A.G.: Numerical Methods for the Solution of Ill-Posed Problems. Kluwer, Dordrecht (1995)CrossRefzbMATHGoogle Scholar
- 53.Tournier, J.D., Mori, S., Leemans, A.: Diffusion tensor imaging and beyond. Magn. Reson. Med.
**65**(6), 1532–1556 (2011). doi: 10.1002/mrm.22924 CrossRefGoogle Scholar - 54.Tschumperlé, D., Deriche, R.: Diffusion tensor regularization with constraints preservation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 948–953 (2001)Google Scholar
- 55.Valkonen, T.: A primal-dual hybrid gradient method for non-linear operators with applications to MRI. Inverse Prob.
**30**(5), 055,012 (2014). doi: 10.1088/0266-5611/30/5/055012 MathSciNetCrossRefzbMATHGoogle Scholar - 56.Valkonen, T.: Big images. In: Emrouznejad, A. (ed.) Big Data Optimization: Recent Developments and Challenges, Studies in Big Data. Springer, New York (2015). AcceptedGoogle Scholar
- 57.Valkonen, T., Bredies, K., Knoll, F.: Total generalised variation in diffusion tensor imaging. SIAM J. Imaging Sci.
**6**(1), 487–525 (2013). doi: 10.1137/120867172 MathSciNetCrossRefzbMATHGoogle Scholar - 58.Valkonen, T., Knoll, F., Bredies, K.: TGV for diffusion tensors: a comparison of fidelity functions. J. Inverse Ill-Posed Prob.
**21**, 355–377 (2013). doi: 10.1515/jip-2013-0005. Special issue for IP:M&S 2012, Antalya, TurkeyMathSciNetzbMATHGoogle Scholar - 59.Valkonen, T., Liebmann, M.: GPU-accelerated regularisation of large diffusion tensor volumes. Computing
**95**, 771–784 (2013). doi: 10.1007/s00607-012-0277-x. Special issue for ESCO2012, Pilsen, Czech RepublicMathSciNetCrossRefGoogle Scholar - 60.Weickert, J.: Anisotropic Diffusion in Image Processing, vol. 1. Teubner, Stuttgart (1998)zbMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.