1 Introduction

Diffusion magnetic resonance imaging (dMRI) is a 6D neuroimaging modality that produces 3D q-space signals at every voxel of a 3D brain MRI volume and can be used to estimate the orientation and integrity of neuronal fiber bundles in vivo. Diffusion tensor imaging (DTI), which models the probability of water diffusion in each voxel with a 3D Gaussian distribution, requires a relatively low number of q-space signal measurements. While this makes DTI well-suited for clinical research, the diffusion tensor cannot model multiple fiber orientations crossing in a single voxel.

High angular resolution diffusion imaging (HARDI) can provide more accurate estimations of anatomical fiber networks than DTI by estimating a higher-order probability distribution function. However, HARDI reconstruction algorithms require a larger number of q-space directions, which leads to a significantly longer patient scan time in comparison to DTI. As a consequence, HARDI has not been broadly accepted as a clinically viable dMRI protocol. Therefore, reducing HARDI scan times to the rate of DTI while maintaining accurate orientation estimation is an important open question in HARDI research.

Compressed sensing (CS) is a very useful tool to solve problems of this type and hinges on finding a sparse representation of a signal with respect to some basis. If such a sparse representation exists, CS can recover nearly perfect signals with sub-Nyquist sampling, allowing the potential to reduce signal acquisition time. Many approaches [9] have applied CS to dMRI protocols like multi-shell/single-shell HARDI and DSI to sparsely reconstruct signals, estimate orientation distribution functions (ODFs), fiber orientation distribution functions (FODs) [4], and ensemble average propagators (EAPs) [7], using dictionary learning or fixed sparsifying bases such as spherical ridgelets/wavelets [15], spherical polar Fourier bases, spherical Fourier-Bessel bases, directional radial bases, higher order tensors, and many more. These approaches reduce the number of q-space measurements needed from hundreds to tens by representing a q-space signal with as few q-space basis atoms as possible. However, since these methods compute one sparse representation for each and every voxel, the sparsest representation they can possibly achieve is with a single atom per voxel. Since spatial correlations between atoms are not exploited, these methods are still likely to represent a brain volume with millions of possibly redundant parameters.

In practice, measurements from neighboring voxels share much of the same information, hence there may exist further redundancies in the spatial domain when signals are modeled sparsely over the angular domain alone. Therefore, to reduce the number of measurements further, recent work [3, 8, 10, 11] aims to apply CS to DSI/HARDI in the joint (k,q)-space. These methods apply joint (k,q) undersampling but still only apply sparse coding in the angular domain. Some works add sparse spatial regularization such as total-variation, yet with disjoint spatial and angular terms, global sparsity is still limited by the size of the data and (k,q)-CS may not be fully utilized.

In this paper, we propose to model HARDI signals using a global spatial-angular basis. However, because of the large size and complexity of HARDI data, optimizing over an entire HARDI volume is a computationally challenging problem. Our main contribution is an efficient joint spatial-angular sparse coding algorithm that exploits a separable model of the spatial and angular domains of HARDI data. With this proposed framework we aim to efficiently find a significantly sparser HARDI representation than state-of-the-art voxel-based methods can theoretically allow. In future work, joint spatial-angular sparse coding will allow us to more naturally apply (k,q)-CS with joint undersampling [8, 11] to further reduce the total number of HARDI measurements.

2 HARDI Data Representation

Angular (Voxel-Based) HARDI Representation. For each voxel v in a HARDI brain volume \(\varOmega \subset \mathbb {R}^3\), q-space measurements are acquired at gradient directions \(\vec {g} \in \mathbb {S}^2\) and are modeled by an angular (spherical) basis, \(\{\gamma _i(\vec {g})\}_{i=1}^{N_\varGamma }\), with \(N_\varGamma \) atoms such that \(s_v(\vec {g}) = \sum _{i=1}^{N_\varGamma } a_i \gamma _i(\vec {g})\) where \(s_v(\vec {g})\) denotes the HARDI signal at voxel v measured at gradient direction \(\vec {g}\). Classical HARDI reconstruction methods model the q-space signals from each voxel separately and add a regularization term \(\mathcal {R}\) to enforce desirable properties such as spatial coherence, ODF non-negativity, or sparsity. Some recent methods [1, 5, 12, 13] have considered simultaneous voxel-based reconstruction over an entire volume by solving

$$\begin{aligned} A^* =\mathop {{{\mathrm{arg\,min}}}}\limits _A ||S - \varGamma A||_F^2 + \lambda \mathcal {R} (A), \end{aligned}$$
(1)

where \(S = [s_1 \dots s_V] \in \mathbb {R}^{G \times V}\) is the concatenation of signals \(s_v\in \mathbb {R}^{G}\) sampled at G gradient directions over V voxels, \(A = [a_1 \dots a_V] \in \mathbb {R}^{N_\varGamma \times V}\) is the concatenation of coefficients and \(\varGamma \in \mathbb {R}^{G\times N_\varGamma }\) is the discretization of the basis \(\gamma \). While these methods attempt to reduce redundancies by adding spatial regularization, signal reconstruction still only operates on angular basis \(\varGamma \) at every voxel.

Spatial-Angular HARDI Representation. In this work, we propose to model the HARDI signal \(\mathcal {S}(v,\vec {g})\) based on a single global basis, say \(\varphi (v,\vec {g})\), to explicitly reduce redundancies in both the spatial and angular domains. However, typical HARDI contains on the order of \(V \approx 100^3\) voxels each with \(G \approx 100\) q-space measurements for a total of \(100^4 \approx 100\) million signal measurements. Since many sparse coding applications often use bases that are over-redundant, this leads to a massive matrix \(\varPhi \) of size greater than \(100^4 \times 100^4\). Therefore efficiently optimizing over a global basis is a very difficult problem. To overcome this challenge, we introduce additional structure on the dictionary atoms by considering separable functions over \(\varOmega \) and \(\mathbb {S}^2\), namely a set of atoms of the form \(\varphi (v,\vec {g})_{i,j} = (\psi _j(v)\gamma _i(\vec {g}))\), where \(\psi (v)\) is a spatial basis for \(\varOmega \) with \(N_\varPsi \) atoms. The HARDI signal may then be decomposed as:

$$\begin{aligned} \mathcal {S}(v,\vec {g}) = \sum _{i=1}^{N_\gamma } \sum _{j=1}^{N_\psi } c_{i,j} \psi _j(v)\gamma _i(\vec {g}) = \sum _{k=1}^{N_\psi N_\gamma } c_k \varphi _k(v,\vec {g}), \end{aligned}$$
(2)

where \(c = [c_k] \in \mathbb {R}^{N_{\varPsi } N_{\varGamma }}\) is the vectorization of \(C = [c_{i,j}] \in \mathbb {R}^{N_{\varGamma } \times N_{\varPsi }}\). In discretized form, our global basis \(\varphi \) is the separable Kronecker product matrix \(\varPhi \triangleq \varPsi ~\otimes ~\varGamma ~\in ~\mathbb {R}^{VG \times N_{\varPsi } N_{\varGamma }}\) with \(\varPsi \in \mathbb {R}^{V \times N_\varPsi }\) and \(\varGamma \in \mathbb {R}^{G \times N_\varGamma }\) such that

$$\begin{aligned} s = \left( \begin{array}{c} s_1 \\ s_2 \\ \vdots \\ s_V \end{array} \right) = \left( \begin{array}{cccc} \varPsi _{1,1}\varGamma &{} \varPsi _{1,2} \varGamma &{} \cdots &{} \varPsi _{1,N_\varPsi } \varGamma \\ \varPsi _{2,1}\varGamma &{} \varPsi _{2,2} \varGamma &{} \cdots &{} \varPsi _{2,N_\varPsi } \varGamma \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \varPsi _{V,1}\varGamma &{} \varPsi _{V,2} \varGamma &{} \cdots &{} \varPsi _{V,N_\varPsi } \varGamma \end{array} \right) \left( \begin{array}{c} c_1 \\ c_2 \\ \vdots \\ c_{N_\varPsi N_\varGamma } \end{array} \right) = \varPhi c . \end{aligned}$$
(3)

Alternatively, in matrix form, (3) can be written compactly as \(S = \varGamma C \varPsi ^\top \). In the special case of \(\varPsi = I_V\), the identity, we can see this leads to the state-of-the-art voxel-based formulation (1) with \(C\equiv A\).

For HARDI, ODFs, p, can be estimated globally for all voxels with a single equation \(p(v,\vec {x}; c) = \frac{1}{4\pi } + \tilde{\varPhi }(v,\vec {x})c\), where \(\tilde{\varPhi }(v,\vec {x}) \triangleq \varPsi (v) \otimes \tilde{\varGamma }(\vec {x})\) and \(\tilde{\varGamma }(\vec {x})\) is the transformed angular basis into the space of ODFs for \(\vec {x}\in \mathbb {S}^2\). This novel global formulation could provide a nice framework for HARDI applications like enforcing global non-negativity, extracting global features or global fiber segmentation. Though this current spatial-angular formulation may be specific to HARDI, our framework is generalizable to any dMRI protocol such as DSI, or multi-shell methods, by choosing an appropriate \(\varGamma \) which represents these data in q-space.

3 Efficient Globally Sparse HARDI Reconstruction

With our proposed spatial-angular basis representation of HARDI, the goal of this paper is to accurately reconstruct an entire HARDI volume with fewer atoms than the state-of-the-art voxel-based methods can theoretically allow. To find a globally sparse representation c, we aim to solve the \(l_1\) minimization problem:

$$\begin{aligned} c^* = \mathop {{{\mathrm{arg\,min}}}}\limits _c \frac{1}{2}|| s - \varPhi c ||_2^2 + \lambda ||c||_1, \end{aligned}$$
(4)

where \(\lambda >0\) is the sparsity trade-off parameter. The Alternating Direction Method of Multipliers (ADMM) [2] is a popular method for solving (4), however, its application in the case of a large dictionary \(\varPhi \) remains prohibitive. To reduce computation, we first note that when using over-redundant dictionaries (i.e., \(V\!\!<\!N_{\varPsi },G\!<\!N_{\varGamma }\)), it is more efficient to apply ADMM to the dual of (4) so that we can switch from calculating \(\varPhi ^\top \varPhi \) of size \(N_\varPsi N_\varGamma \!\times \!N_\varPsi N_\varGamma \) to \(\varPhi \varPhi ^\top \) of size \(VG\!\times \!VG\). The dual of (4) is:

$$\begin{aligned} \max _\alpha -\frac{1}{2}||\alpha ||^2_2 + \alpha ^\top s \ \ s.t. \ \ ||\varPhi ^\top \alpha ||_\infty \le \lambda , \end{aligned}$$
(5)

and update equations of the Dual ADMM (DADMM) [6] are given by:

$$\begin{aligned} \alpha _{k+1}&= (I + \eta \varPhi \varPhi ^\top )^{-1}(s - \varPhi (c_k - \eta \nu _k)) \end{aligned}$$
(6)
$$\begin{aligned} \nu _{k+1}&= P^\infty _\lambda (\frac{c_k}{\eta } + \varPhi ^\top \alpha _{k+1}) \end{aligned}$$
(7)
$$\begin{aligned} c_{k+1}&= {{\mathrm{shrink}}}_{\lambda \eta }(c_k + \eta \varPhi ^\top \alpha _{k+1}), \end{aligned}$$
(8)

where \(P^\infty _\lambda (x)\) is an element-wise projection to \([-\lambda ,\lambda ]\), \({{\mathrm{shrink}}}_\rho (x)\) is the soft-threshold operator and \(\eta > 0\) is an optimization parameter. The globally sparse output vector \(c^*\) minimizes the primal problem (4). DADMM reduces inner product computations to matrices of smaller size \(G\!\times \!V\) instead of \(N_{\varGamma }\!\times \!N_{\varPsi }\) and is therefore not dependent on the size of basis but only on the size of the data. Furthermore, soft-thresholding is now done directly on c, which reduces the number of iterations for reaching a sparsity level by building up from 0 atoms instead of descending down from the total \(N_\varPsi N_\varGamma \) atoms. However, this naïve formulation with large \(\varPhi \) still has complexity \(O(GVN_\varGamma N_\varPsi )\) per iteration. Even submitting \(\varPhi \) into memory may be an issue, as well as the expensive cost of an inverse. We address this issue by exploiting the separability of \(\varPhi \) to perform computations with the much smaller \(\varPsi \) and \(\varGamma \). Our proposed method, called Kronecker DADMM, exploits the Kronecker product in two ways:

1. Kronecker SVD. Computing \((I + \eta \varPhi \varPhi ^\top )^{-1}\) for large \(\varPhi \) is challenging and even taking an SVD to reduce the inverse to a diagonal of singular values is \(O((GV)^2 N_\varGamma N_\varPsi )\). Instead, we can exploit the Kronecker product and compute separate SVDs of the smaller \(\varPsi \varPsi ^\top \) and \(\varGamma \varGamma ^\top .\) Let \(\varPsi \varPsi ^\top = U_{\varPsi }\varSigma _\varPsi U_\varPsi ^\top \) and \(\varGamma \varGamma ^\top = U_\varGamma \varSigma _\varGamma U_\varGamma ^\top \), then we have \((I + \eta \varPhi \varPhi ^\top )^{-1} = (U_{\varPsi } \otimes U_\varGamma )(I+ \eta ( \varSigma _\varPsi \otimes \varSigma _\varGamma ))^{-1}(U_\varPsi \otimes U_\varGamma )^\top \), where the inverse is now simply taken over a diagonal matrix. Computing these two SVDs is now of complexity \(O(V^2 N_\varPsi + G^2 N_\varGamma ).\) In the case of tight frames, such as Wavelets, where \(\varPsi \varPsi ^\top = I\), we can significantly reduce computations to only involve \(\varGamma \). We then simplify computations for (6) by pre-multiplication, setting \(\alpha ' \triangleq (U_{\varPsi } \otimes U_\varGamma )^\top \alpha \), \(s' \triangleq (U_\varPsi \otimes U_\varGamma )^\top s\) and \(\varPhi ' \triangleq (U_\varPsi \otimes U_\varGamma )^\top \varPhi = (U_\varPsi \otimes U_\varGamma )^\top (\varPsi \otimes \varGamma ) = (U_\varPsi ^\top \varPsi ) \otimes (U_\varGamma ^\top \varGamma ) \triangleq (\varPsi ' \otimes \varGamma ')\). [6] proves that we can now replace our original variables with \(\alpha '\), \(s'\), and \(\varPhi '\) and our primal and dual optimization problems do not change.

2. Kronecker Matrix Formulation. We can further reduce the size of our problem and avoid any computations with \(\varPhi '\) by using the Kronecker product matrix formulation of (3) to write \(\varPhi ' (c_k - \eta \nu _k) = \varGamma ' (C_k - \eta \mathcal {V}_k)\varPsi '^\top \), where C and \(\mathcal {V}\) are the \(N_\varGamma \!\times \!N_\varPsi \) matrix forms of c and \(\nu \). Likewise, \(\varPhi '^\top \alpha ' = \varGamma '^\top A' \varPsi '\), where \(A'\) is the \(G\!\times \!V\) matrix form of \(\alpha '\) and we can pre-compute \(S' = U_\varGamma ^\top S U_\varPsi \) where S is the \(G\!\times \!V\) matrix form of s. To fit the \(G\!\times \!V\) matrix dimensions, we simplify diagonal \((I+ \eta ( \varSigma _\varPsi \otimes \varSigma _\varGamma ))^{-1}\) to \(\varSigma ^{-1}_\eta \!\triangleq \!1/(1+\eta \varSigma )\) where \(\varSigma \!\triangleq \!d_\varGamma d_\varPsi ^\top \) and \(d_\varPsi \) and \(d_\varGamma \) are the diagonals of \(\varSigma _\varPsi \) and \(\varSigma _\varGamma \). This Kronecker matrix formulation has a significantly reduced complexity of \(O(GVN_\varPsi )\) per iteration compared with \(O(GVN_\varGamma N_\varPsi )\) for the naïve approach. Furthermore, in the case where \(\varPsi \) is a tight frame transformation, such as Wavelets, for which \(\varPsi '=\varPsi \), fast decomposition and reconstruction algorithms can be used to replace multiplication by \(\varPsi \) and \(\varPsi ^\top \) reducing complexity to \(O(GV\log _2(N_\varPsi ))\).

figure a

Our proposed algorithm for globally sparse HARDI reconstruction (Kron-DADMM) is presented in Algorithm 1. The symbol “\(\circ \)” in Step 1 denotes element-wise matrix multiplication. We follow [6] to update penalty parameter \(\eta \) and stop when the duality gap is sufficiently small.

4 Experiments

Data Sets. We provide experiments on the ISBI 2013 HARDI Reconstruction Challenge Phantom dataset, a \(50 \times 50 \times 50\) volume consisting of 20 phantom fibers crossing intricately within an inscribed sphere, measured with \(G = 64\) gradient directions (SNR \(=30\)). Figures 1 and 2 show quantitative signal error vs. sparsity and qualitative ODF estimations at specific sparsity levels, respectively. We also experimented on a real \(128 \times 128 \times 26\) HARDI brain volume with \(G = 384\) from the Hippocampal Connectivity Project (FOV: 192, resolution: 1.5 mm isotropic, b-value: 1400 s/mm\(^2\), TR/TE: 3500/86). Figure 3 shows sparse reconstruction using Kron-DADMM with Haar-SR compared to full voxel-based SH reconstruction. We can achieve a good reconstruction with \(\sim \)2 atoms per voxel in about 2.5 h.

Choice of Spatial and Angular Bases. Spatial (\(\varPsi \)): A popular choice of sparsifying basis for MRI is a wavelet basis. For our experiments we compared Haar and Daubechies wavelets to get an indication of which can more sparsely represent the spatial organization of HARDI. Importantly, we also compared to state-of-the-art voxel-wise methods by simply choosing the identity I as the spatial basis. We compare these basis choices in terms of sparsity and reconstruction error in Fig. 1 (left). Angular (\(\varGamma \)): The over-complete spherical ridgelet/wavelet (SR/SW) basis pair [15] has been shown to sparsely model HARDI signals/ODFs. We also compare this to the popular SH basis, though for order L the SH is a low-pass truncation and does not exude sparse signals. With order \(L\!=\!4\), SH and SR have \(N_\varGamma \!=\!15\) and \(N_\varGamma \!=\!1169\) atoms, respectively. We compare these angular choices in Fig. 1 (right). As a note, these basis choices are preliminary and future work will involve exploring more advanced basis options to increase sparsity.

Fig. 1.
figure 1

Phantom Data. Left: Comparison of various spatial basis choices using Kron-DADMM paired with the SR angular basis. Haar-SR achieves the lowest residual (\(\sim \)0.074) with the sparsest number of coefficients (\(\sim \)0.5). The black line is voxel-based angular reconstruction where the identity I is the chosen spatial basis. Voxel-based is unable to achieve sparsity below 1 atom/voxel without forcing some voxels to 0 atoms. Right: Comparison between Kron-OMP and Kron-DADMM. Kron-OMP takes 40 h to reach 1 atom/voxel, while Kron-DADMM takes 40 min and provides better accuracy. SR outperforms SH basis as expected.

Fig. 2.
figure 2

Phantom Data. Comparison of Kron-DADMM with Haar-SR (middle) and voxel-based I-SR (right) against a full voxel-based least squares reconstruction with SH (left). The global reconstruction provides a more accurate signal with less than 1 atom per voxel while the voxel-based sparse reconstruction has difficulty estimating crossing fibers at this sparsity level and is forced to model isotropic ODFs with 0 atoms.

Fig. 3.
figure 3

Real Data. Global sparse reconstruction via Kron-DADMM with Haar-SR basis (right) compared to full voxel-based least squares reconstruction with SH basis (left). We can achieve an accurate reconstruction with only 2.23 atoms/voxel.

Comparison with a Baseline Algorithm. Orthogonal Matching Pursuit (OMP) is a widely used algorithm to approximate a solution to the \(l_0\) problem by greedily selecting and orthogonalizing the K basis atoms that are most correlated with signal s. The Kronecker OMP (Kron-OMP) proposed in [14] exploits bases with separable structure but the method still needs to orthogonalize a \(K\!\times \!K\) matrix where the sparsity level K empirically approaches the number of voxels \(V \approx 100^4\). Because of this, our implementation takes on the order of 40 h to optimize over the phantom dataset compared to 40 min for Kron-DADMM. The results are presented in Fig. 1 (right).

5 Conclusion

We have presented a new efficient algorithm for globally sparse reconstruction of HARDI using a global basis representation which exploits spatial-angular separability to significantly reduce computational complexity. Our experiments show that greater sparsity for the representations may be achieved using spatial-angular bases instead of voxel-based approaches. So far, these were conducted as a proof-of-concept with dictionaries involving very simple spatial wavelet bases such as Haar, but the versatility of the algorithm enables the use of possibly more adequate directional Wavelets (e.g. shearlets, curvelets) or dictionary learning strategies, which are both important directions for future work. Our next step is to develop spatial-angular sensing matrices to jointly subsample (k,q)-space using a form of Kronecker CS. Finally, a globally sparse representation can be utilized in many other areas of HARDI applications including fiber segmentation, global tractography, global feature extraction, and sparse disease classification.