1 Introduction

Diffusion MRI is a unique in-vivo and non-invasive imaging technique that probes the microstructure of tissues by tracking water diffusion [5]. The diffusion process is a probability distribution that describes the random 3-dimensional displacements of water molecules due to thermal agitation. The signal recovered in diffusion MRI is the Fourier transform of the diffusion density evaluated at the magnetic field spatial gradient to which the imaged brain was subjected.

A practical approach to access the diffusion density pertains to resorting to parametric models that lead to an analytic expression of the observed diffusion signals. The first parametric model was introduced two decades ago in [2] and is called single tensor (ST) model. The ST model essentially assumes that the diffusion in the voxel is well characterized by a zero-mean Gaussian distribution with covariance matrix proportional to the so-called diffusion tensor. It has become widely used in clinical practice, which raised the demand for a solid mathematical framework for processing tensor images (estimation, smoothing, registration, atlasing, statistical analysis). As a response, several frameworks have been proposed, including a Riemannian affine-invariant framework [6, 9] and the popular log-Euclidean space for tensors, embedding them in a Lie group structure [1].

However, because of the low spatial resolution in diffusion MRI, it has been shown that the ST model is an average of several diffusion processes arising from multiple populations of tissues in the voxel, ultimately leading to an inaccurate description of the microstructure in most parts of the brain white matter. In addition, even within a single homogeneous tissue population, non-Gaussian diffusion has been observed [13]. This has led to the extension of the ST model to mixture models, often called multi-compartment models (MCM), where the voxelwise diffusion signal is modeled as a linear combination of compartmental diffusion signals arising from underlying homogeneous diffusion processes [8]. To the best of our knowledge, the only mathematical framework available for multi-compartment image processing [12] is however limited to multi-tensor images (mixture of single-tensor signals, see [10]) as it relies on log-Euclidean geometry.

In this work, we present an alternative to the log-Euclidean space on tensors called the Bayes Hilbert space [3] for processing compartmental diffusion signals instead of tensors. It provides a unified framework that can accommodate analytic compartment diffusion signals of any type (Gaussian and non-Gaussian). Section 2 gives a general introduction to Bayes Hilbert spaces with motivation and setup in diffusion MRI. Section 3 describes a simulation study and a tractography application on real data to compare Euclidean, log-Euclidean and Bayes interpolation. Results commented in Sect. 4 show that Bayes interpolation features improved robustness to noise and yields better pyramidal tract reconstructions when combined with MCM-based deterministic tractography to account for streamline atlas priors when diffusion models cannot be locally trusted.

2 Theory

2.1 Bayes Hilbert Spaces and Compartmental Diffusion

The signal in diffusion MRI, hereafter diffusion signal, observed after the application of a diffusion gradient \(\mathbf {q}\), where \(\Vert \mathbf {q}\Vert ^2 = b\) is the well known b-value, undergoes a signal decay starting from a baseline signal. The form of this decay depends on how water molecules diffuse in tissues in the vicinity of the spatial location where the signal is observed. In general, in a voxel composed of K different tissue structures, the compound diffusion signal reads:

$$\begin{aligned} S(\mathbf {q}) = \sum _{j=1}^K S_j(\mathbf {q}) = \sum _{j=1}^K S_{0j} A_j(\mathbf {q}), \quad S_{0j} > 0, \quad A_j \in [0,1], \end{aligned}$$
(1)

in which all the information pertaining to the microstructure (set of surrounding tissue structures) is grasped by the signal attenuations \(A_j\)’s. This is the signal decomposition at the foundation of diffusion compartment imaging [8]. Hence, up to some multiplicative constant, the compartmental diffusion signal \(S_j\) can be interpreted as the density of a probability measure \(\nu _j\) on the space of diffusion gradients, which is absolutely continuous w.r.t. the Lebesgue measure \(\lambda \) such that \(\frac{d\nu _j}{d\lambda }\) (Radon-Nykodym derivative w.r.t. the Lebesgue measure \(\lambda \)) is proportional to \(S_j\). Furthermore, this multiplicative constant is related to the \(T_2\) relaxation time of tissue j and thus not relevant for microstructure mapping.

To account for this, we propose to embed the probability measures induced by compartmental diffusion signals into the Bayes space \(B^2(P)\) of (classes of equivalence of \(\mathcal {B}(\mathbb {R}^3)\)-finite) measures on the Borel space \((\varOmega , \mathcal {B}(\varOmega ))\), with square-integrable log-densities w.r.t. a reference measure P [3], which reads:

$$\begin{aligned} B^2(P) = \left\{ \nu : \int _{\mathbb {R}^3} \log ^2 \left( \frac{d\nu }{dP} \right) dP < \infty , \nu > 0 \right\} . \end{aligned}$$
(2)

In this view, if a diffusion signal \(S_{j1}\) (or, equivalently a probability measure \(\nu _1\) induced by \(S_{j1}\)) carries a piece of information about the structure of tissue j, then a diffusion signal \(S_{j2}\) proportional to \(S_{j1}\) does not carry additional information about the structure of tissue j. Hence, \(S_{j1}\) and \(S_{j2}\) are regarded as equivalent for microstructure mapping. This key property, known as scale invariance, is accounted for in the Bayes space \(B^2(P)\) by the induced classes of equivalence. Not every Bayes space \(B^2(P)\) however can accommodate compartmental diffusion signals because, depending on the choice of the reference measure P, the logarithm of compartmental diffusion signals might not be square-integrable. The choice of the reference measure of the Bayes space is thus critical. Provided that an appropriate reference measure P exists, embedding diffusion signals in \(B^2(P)\) provides a unified framework for processing any type of analytic microstructure compartment models [8] at the cost of numerical integrations.

The space \(B^2(P)\) is a vector space when endowed with the perturbation and powering operations \((\oplus , \odot )\), defined for \(\nu _1, \nu _2 \in B^2(P)\), \(A\in \mathcal {B}(\mathbb {R}^3)\), \(\alpha \in \mathbb {R}\) as:

$$\begin{aligned} (\nu _1 \oplus \nu _2)(A) =_{B(P)} \int _A \frac{d\nu _1}{dP} \frac{d\nu _2}{dP} dP, \quad (\alpha \odot \nu _1)(A) =_{B(P)} \int _A \left( \frac{d\nu _1}{dP}\right) ^\alpha dP, \end{aligned}$$

and becomes a separable Hilbert space when equipped with a proper inner product [4]. In the following subsection, we focus on the special case of Gaussian compartmental diffusion for which analytic expressions can be obtained for the perturbation and powering operations as well as for the distance.

2.2 Gaussian Compartmental Diffusion

The diffusion signal arising from Gaussian compartmental diffusion reads [2]:

$$\begin{aligned} S(\mathbf {q}) = S_0 e^{-\mathbf {q}^\top D \mathbf {q}}, \end{aligned}$$
(3)

where D is the so-called diffusion tensor, i.e. a \(3\times 3\) symmetric positive definite (SPD) matrix. Hence, the probability measure \(\nu \) induced by S is a Gaussian distribution with mean \(\mathbf {0}\) and covariance matrix \(D^{-1}/2\) and we can write, for the purpose of notation, \(S \sim \mathcal {N}(\mathbf {0},D^{-1} / 2)\). As a result, Gaussian diffusion signals cannot be embedded in \(B^2(\lambda )\) because choosing the Lebesgue measure as reference makes the integral of the squared logarithm of S diverge. However since all moments of the Gaussian distribution are finite, if we define the reference measure P as a Gaussian distribution with mean \(\mathbf {0}\) and covariance matrix \(D_\mathrm {ref}\) SPD, i.e. \(P \sim \mathcal {N}(\mathbf {0}, D_\mathrm {ref})\), then Gaussian diffusion signals can be embedded in \(B^2(P)\). Furthermore, it is possible to obtain analytic expression for the operations \(\oplus \) and \(\odot \). The perturbation of measure \(\nu _1\) by measure \(\nu _2\) reads:

$$\begin{aligned} (\nu _1 \oplus \nu _2)(A) =_B \int _A \frac{d\nu _1}{d\lambda } \frac{d\nu _2}{d\lambda } \frac{d\lambda }{dP} d\lambda =_B \int _A e^{-\mathbf {q}^\top ( D_1 + D_2 - D_\mathrm {ref} ) \mathbf {q}} d\lambda (\mathbf {q}). \end{aligned}$$

Hence, \((\nu _1 \oplus \nu _2) \sim \mathcal {N}(\mathbf {0}, D_\mathrm {ref} + (D_1 - D_\mathrm {ref}) + (D_2 - D_\mathrm {ref}))\). The result is another Gaussian distribution whose covariance matrix is centered around the one of the reference Gaussian distribution but perturbed by the covariance information in \(D_1\) and \(D_2\) that are not already present in the reference covariance structure. The multiplication of measure \(\nu \) by a scalar \(\alpha \in \mathbb {R}\) reads:

$$\begin{aligned} (\alpha \odot \nu )(A) =_B \int _A \left( \frac{d\nu }{d\lambda } \right) ^\alpha \left( \frac{d\lambda }{dP} \right) ^{\alpha -1} d\lambda =_B \int _A e^{-\mathbf {q}^\top ( \alpha D + (1-\alpha ) D_\mathrm {ref} ) \mathbf {q}} d\lambda (\mathbf {q}). \end{aligned}$$

Hence, \((\alpha \odot \nu ) \sim \mathcal {N}(\mathbf {0}, \alpha D + (1-\alpha ) D_\mathrm {ref})\). The result is another Gaussian distribution whose covariance matrix is a linear combination between the covariances of \(\nu \) and P. This offers the possibility of giving more or less weight to the information content of a diffusion signal in terms of microstructure. If one think of \(\alpha \in [0,1]\), then \((\alpha \odot \nu )\) means that less weight is put on the information content of \(\nu \) (which could come from a poor model estimate, low SNR, etc.) in favor of the information content of the reference P. In practice, it shrinks the diffusion tensor of the signal towards the one of the reference.

Note that writing the results of the operations of \(\oplus \) and \(\odot \) as Gaussian distributions is an abuse of notation. In effect, the space of diffusion signals is a subspace of \(B^2(P)\) that is not closed in the Bayes geometry. This is because operations on diffusion signals in \(B^2(P)\) might yield non positive “covariance” matrices. However, most applications in which analytic diffusion model computing is required involve weighted average operations, i.e. linear combinations with positive weights whose sum is less than one, which always produce SDP tensors in \(B^2(P)\). In effect, it is easy to show that:

$$\begin{aligned} \bigoplus _{i=1}^N w_i \odot \nu _i \sim \mathcal {N} \left( \mathbf {0}, \sum _{i=1}^N w_i D_i + \left( 1 - \sum _{i=1}^N w_i \right) D_\mathrm {ref} \right) . \end{aligned}$$
(4)

In the rare events that operations in Bayes space produce non-positive matrices, one can perform an orthogonal projection of the non-positive matrix back into the space of diffusion signals by solving the following minimization problem:

$$\begin{aligned} \min _{D \in \mathcal {B}(P) \text { s.t. } D > 0} a d_{\mathcal {B}(P)}^2(M, D) + (1-a) d_{\mathcal {B}(P)}^2(D_\mathrm {ref}, D), \quad a \in [0,1], \end{aligned}$$
(5)

where M is the symmetric matrix assumed to have some negative eigenvalues and \(d_{\mathcal {B}(P)}\) is the distance on \(B^2(P)\) given by:

$$\begin{aligned} \Vert \nu _1 \ominus \nu _2 \Vert _{\mathcal {B}(P)}^2 := \mathrm {Tr} \left( (D_1 - D_2) D_\mathrm {ref}^{-1} (D_1 - D_2) D_\mathrm {ref}^{-1} \right) = \Vert (D_1 - D_2) D_\mathrm {ref}^{-1} \Vert _F^2. \end{aligned}$$

3 Experimental Setup

3.1 Simulations: Robustness to Noise

The goal of the simulated data is to assess the robustness of Bayes space interpolation to MRI-induced noise, compared to more traditional approaches that focus directly on the tensor and embed it either in Euclidean or log-Euclidean space. In Bayes space, as shown in Eq. (4), the tensor associated with the interpolated signal is a convex combination of the tensor interpolated in Euclidean space and the tensor of the reference signal, where the weight \(w_\mathrm {data}\) associated with the tensor interpolated in Euclidean space shall indicate how much we are willing to trust its information content. For this purpose, we set \(w_\mathrm {data} := 1 - e^{-\mathrm {SNR}/ \beta }\) and include in the comparison three Bayes spaces (Bayes10, Bayes20, Bayes30), using \(\beta =10,20,30\) respectively. We generate (i) a set of 31 noise-free diffusion signals according to Eq. (3) with a baseline signal \(S_0=1\) and 31 diffusion gradients \(\mathbf {q}\) uniformly distributed on the hemisphere of radius \(\sqrt{b}\), with \(b=1000\) s/mm\(^2\) and (ii) a set of \(n=8\) normalized weights to define spatial weights of the 8 neighbors. Next, for a given SNR and a given method, we obtain a Monte-Carlo estimate of the mean-square error (MSE) of the interpolated tensor by averaging the squared distances between the ground truth tensor and \(R=1000\) replicates of interpolated tensors from noisy neighboring tensors produced as follows:

  1. (a)

    Add Rician noise to the noise-free signals using \(\sigma = S_0 / \mathrm {SNR}\);

  2. (b)

    Get a realistic noisy tensor field by estimating a diffusion tensor in each neighboring voxel via maximum likelihood estimation [11];

  3. (c)

    Interpolate the tensor in the central voxel from the tensors in the 8 neighboring voxels to which we associate the initially simulated spatial weights;

  4. (d)

    Compute distance between the interpolated and ground truth tensor.

We used 8 \(\mathrm {SNR} \in [6.25, 50]\) and three metrics to compare interpolations with ground truth: the angle between principal orientations (direction distance), the Euclidean distance between axial diffusivities (diffusivity distance) and the Euclidean distance between radial diffusivities (radius distance as it is often used as a proxy to measure axon radii). We chose these metrics as they focus on microstructural parameters of direct clinical relevance and are independent from the compared spaces. For interpolation in Bayes space, for each SNR, we set the reference tensor as a noisy version of the ground truth tensor using the same procedure as above using \(\mathrm {SNR}\sqrt{N}\). This is because, often, the available data to use as reference comes from atlases generated on similar data but averaged on multiple subjects (N). Hence, there is still uncertainty in the reference tensor but its variance is likely to be divided by N. In this simulation, we set \(N=20\).

3.2 Case Study: Tractography of the Pyramidal Tract

The pyramidal tract (PT) is of primary importance as it handles volontary motion. Neurons initiate in the primary motor cortex (R0), goes successively through the corona radiata (R1), genu and posterior limb of the internal capsule (R2) and cerebral peduncles (R3) to eventually enter the spinal cord (R4). The PT is difficult to reconstruct from tractography due to the large spreading of neurons on the cortex. We aim at showing that interpolation in Bayes space makes deterministic FACT tractography [7] feasible for PT reconstruction. In details, we compare four approaches obtained combining single- or multi-tensor FACT tractography (ST or MT) with log-Euclidean or Bayes interpolation (ST-LogEuclidean, ST-Bayes, MT-LogEuclidean and MT-Bayes).

We scanned a healthy subject on a 3T Siemens Verio magnet for a \(T_1\) MPRAGE image at 1 mm\(^3\) resolution and diffusion data at 2 mm\(^3\) resolution using the same gradient table as in the simulations. We used an available PT atlas in MNI space based on 20 healthy subjects that underwent the same acquisition protocol. We estimated ST and MT models from the diffusion data [11] and brought the resulting images into MNI space where FACT tractography was performed. In the case of MT-FACT, at a given point, a single tensor was picked from each neighboring mixture according to the highest orientation similarity w.r.t. arrival direction so that we could proceed as in ST-FACT. We performed tractography of the PT following regions of interest (ROI). We used R0 as seeding mask, stopped a streamline generation when FA fell below 0.1 (for ST-FACT) or when linearly interpolated fraction of free water exceeded 0.8 (for MT-FACT) and filtered the resulting streamlines progressively through R1, R2, R3 and R4. We defined the reference weight at position \(\mathbf {x}\) for Bayes interpolation as:

$$\begin{aligned} w_\mathrm {ref}(\mathbf {x}, \mathrm {FA}, \mathrm {SNR}, \mathbf {x}_\mathrm {ref}; \beta , \delta ) := \left( 1 - \mathrm {FA} (1 - e^{-\mathrm {SNR} / \beta }) \right) e^{-\Vert \mathbf {x} - \mathbf {x}_\mathrm {ref} \Vert / \delta }, \end{aligned}$$

where \(\mathrm {FA}\) is the fractional anisotropy of the tensor interpolated in Euclidean space, \(\mathbf {x}_\mathrm {ref}\) is the position of the closest point on the PT atlas and \((\beta , \delta )\) are user-defined parameters that control the decay velocity of SNR and distance weights. In essence, we put more trust in the data when both SNR and orientation coherence between neighbors are high or when the reference is too far.

Fig. 1.
figure 1

MSE of interpolators as a function of noise. Axial diffusivity recovery (left), principal eigenvector recovery (middle), radial diffusivity recovery (right). Ground truth tensor: orientation \((\sqrt{2}/2,\sqrt{2}/2,0)^\top \); diffusivities \(10^{-3}(1.71,0.3,0.1)^\top \) mm\(^2\)/s.

Fig. 2.
figure 2

Coronal View of Reconstructed Pyramidal Tracts (with overlaid PT atlas). Columns correspond to increasing number of filtering ROIs: R1 (1st column), R1 + R2 (2nd column), R1 + R2 + R3 (3rd column), R1 + R2 + R3 + R4 (4th column). Rows correspond to the methods (from top to bottom): single-tensor FACT – log-Euclidean, and Bayes, multi-tensor FACT – log-Euclidean, and Bayes.

4 Results and Discussion

Simulations: Robustness to Noise. Figure 1 shows the MSE between the ground truth tensor in central voxel and the interpolated tensor from noisy neighbors, with increasing amount of noise. For all metrics, interpolation in Bayes space performs uniformly better in terms of robustness to MRI-induced noise for recovering microstructural parameters. The MSE curves for the three Bayes spaces that give increasing importance to the reference measure (from Bayes10 to Bayes30) reveals that even when we trust mostly the data (Bayes10), interpolation in Bayes space is preferable.

Case Study: Tractography of the Pyramidal Tract. Figure 2 shows PT reconstruction with increasing number of filtering ROIs (from left to right) for all four methods. First, observe that only methods based on Bayes interpolation (rows 2 and 4) successfully manage to reconstruct streamlines that go through all four ROIs and thus are more likely to belong to the PT. Also, in general, deterministic FACT tractography fails to recover PT streamlines with the traditional log-Euclidean interpolation (rows 1 and 3). This is well documented in the literature on ST-FACT. We hypothesize that MT-FACT has a lot more directional information for tracking but, without prior anatomical knowledge, the FACT algorithm (which stepwise follows the most collinear direction) has no mechanism to channel streamlines into following the PT shape. This explanation is supported by the sequential filtering that shows high streamline variability when filtering only by the corona radiata (1st column, 3rd row) and almost no remaining streamlines at the end of the entire filtering process (4th column, 3rd row). The ST-Bayes method seems to mainly follow the shape of atlas streamlines. This is due to an inherent model mis-specification in most parts of the white matter where the ST model provides an insufficient description of the microstructure and therefore presents an articially low FA which uniformly inflates the weight of the reference measure. Conversely, the MT-Bayes version nicely preserves the PT streamlines after complete filtering without heavy influence of the atlas since the MT model provides an accurate description of the microstructure and thus the interpolated tensor has low FA only when neighboring tensors have heterogeneous orientations.