1 Introduction

Diffusion imaging is a non-invasive technique that provides information on the diffusion of water molecules in biological tissues, which can be used to infer the underlying tissue microstructure [1]. However, analyzing high-dimensional diffusion data requires complex image processing techniques, which can be time-consuming and computationally demanding. Therefore, the use of deep learning (DL) methods has gained attention in recent years to accelerate the analysis of diffusion data and improve the accuracy of the reconstruction [2,3,4]. This project aimed to develop an innovative and disruptive DL pipeline for diffusion imaging that can handle complex spherical signals and achieve optimal reconstruction quality. To achieve this goal, four fundamental research questions were identified and addressed in this project:

  1. 1.

    Lack of Transferability: The transferability of DL models between clinical sites has been a significant challenge due to the variability in image acquisition protocols, scanners, and patient populations. A model trained on a specific dataset may not perform well on new datasets with different acquisition protocols. This problem is particularly challenging in diffusion imaging, where the diffusion properties vary depending on the tissue type and pathology. To address this problem, the project explored supervised, semi-supervised, and unsupervised harmonization approaches to transfer datasets between different scanners or sites. The results showed that using harmonization approaches could improve the transferability of the DL model and reduce the reliance on large datasets for single clinical sites.

  2. 2.

    Lack of Training and Label Data: The availability of training and label data is a significant challenge in diffusion imaging, particularly in small-scale studies or rare diseases. The lack of ground-truth data makes it challenging to train DL models and evaluate their accuracy. In this project, the use of synthetic data based on novel diffusion models was explored to generate training and label data for the DL pipeline. The results showed that synthetic data could provide accurate ground-truth data for training and evaluating DL models, particularly in small-scale studies or rare diseases.

  3. 3.

    Potential of Complex Diffusion Data: The diffusion signal in magnetic resonance imaging (MRI) is a complex signal consisting of amplitude and phase information. The phase information contains valuable information about tissue microstructure, but its use in diffusion imaging is challenging due to its sensitivity to motion and susceptibility artifacts. This project explored integrating phase information in the DL model, which may improve the reconstruction quality of complex diffusion data.

  4. 4.

    Spherical Signals in Neural Networks: Diffusion data is a spherical signal, and the gradient directions are highly correlated with the gradients in their immediate neighborhood. Therefore, the spherical character of the diffusion data should be preserved in the DL model to improve the accuracy of the reconstruction. In this project, the use of spherical harmonics was explored to preserve the spherical character of the diffusion data in the DL model.

The above research questions were addressed in a research project funded by the German Research Foundation (DFG) under grant number 417063796. In this project report, all five work packages of the project are presented, and for each work package, a summary of the methodological approaches as well as the results is provided, which have previously been presented in several publications [5,6,7,8,9,10]. Also, an overall discussion and conclusion are presented.

2 Work Packages—Methods and Results

WP1—Signal Harmonization

Diffusion MRI (dMRI) data from different sites exhibit considerable variability due to various factors, including hardware, acquisition settings, and reconstruction algorithms. The variability can be in the same range as biological variability, which complicates clinical studies, and reliable harmonization methods are needed for multicenter studies. Two approaches to diffusion harmonization are presented here: an unsupervised algorithm that can perform harmonization without paired data and an approach that requires paired data and incorporates a diffusion tensor imaging loss for harmonization. The cyclic neural network architecture forms the basis of the unsupervised algorithm. Unsupervised learning is possible with cyclic networks because of their unique capacity to generate images without the need for paired data. Training can occur in unsupervised, semi-supervised, or supervised settings depending on the amount of paired data available. The methods and results were previously published in [5, 6]. In the first step, the unsupervised harmonization algorithm was created and evaluated on the HCP dataset. A test group of 86 subjects who were not part of the training procedure was used to evaluate the trained generators. The mean squared error (MSE) of the dMRI raw data, mean diffusivity and fractional anisotropy were chosen for evaluation. The proposed cyclic network was trained using a completely unsupervised method (without paired images), a fully supervised method (using paired images acquired in WP3), and a nested combination of the unsupervised and supervised training methods. Furthermore, a baseline was incorporated for comparison. The baseline data underwent the same preprocessing procedures across all subjects, including registering paired data and resampling diffusion directions. These procedures introduced some degree of smoothing, thereby reducing the mean squared error (MSE) between the two datasets. Table 1 shows the baseline and all outcomes.

Table 1 Mean squared error between target and harmonized data for supervised, unsupervised, and mixed training in the 3T range (see also Table 1 in [6])

The development of this algorithm was then evaluated on the data collected in WP3. The proposed approach was compared to SHResNet [11], the standalone MICA method [12], and the baseline between the two images to evaluate its performance. The 24 individuals were randomly split into four-subject groups for a six-fold cross-validation as part of the evaluation process. Before applying the complete pipeline, cross-validation was carried out, and it was maintained for every single step. Separate but identical harmonization pipelines were employed for each of the three diffusion shells (b = 1000 s/mm\(^2\), b = 2000s/mm\(^2\), b = 3000 s/mm\(^2\)) for MICA and the proposed method. The final product is a merger of the three harmonized shells. The study conducted three distinct evaluations: (1) a comparison of the single shell metrics utilized for training the proposed neural network, including the raw diffusion attenuation and DTI metrics FA and MD, (2) an analysis of the variations in multi-shell microstructure modeling using NODDI [13], and (3) an assessment of the perception-based similarity using the LPIPS metric [14]. The Wilcoxon signed-rank test was used to determine if a statistically significant number of subjects showed significantly better harmonization results than with the previously available methods. This was done for all evaluations in comparison to the best baseline result or the second-best approach. First, the raw diffusion attenuation harmonization performance and the metrics derived from DTI were compared for each shell. The raw diffusion attenuation signal’s mean squared error (MSE) was assessed independently for each of the three distinct b-value shells. Individual FA and MD maps were generated for each diffusion shell. The findings are presented in Table 2. On the three derived metrics, our proposed method beats all other approaches, and it is the only examined method that did better than the baseline on all measures. The SHResNet approach, on the other hand, exhibited the best results in terms of raw diffusion attenuation error. Second, we utilized NODDI [13] to measure intra-head coil impacts and harmonization performance on microstructural estimations by fitting it to the whole multi-shell data. The MSE of the neurite orientation dispersion index (ODI) and neurite density index (NDI) were calculated for each method. In addition, we compared the effect on fiber direction by calculating the average orientation of the Watson distribution modified for the neurite compartment. Our algorithm yields statistically significant gains in NDI and ODI image similarity (p<0.005), with NDI seeing the greater of the two effects [5]. The fiber orientation similarities between the two acquisitions have also been enhanced, but not to a statistically significant degree.

Table 2 Difference between the two head coil acquisitions before and after harmonization for the three b-value shells, according to [5], Table 2

Third, given that basic signal smoothing techniques have demonstrated enhancements in the majority of signal comparison assessments [9], we proceeded to assess the disparities in raw diffusion-weighted images using a perception-based metric. The selection of the LPIPS metric was based on its ability to replicate human perception closely [14]. The calculation was performed on a per-individual basis across all diffusion-weighted images and subsequently averaged across both images and subjects. The approach we propose exhibits superior performance compared to both the baseline and the two comparison algorithms across all three b-value shells (see Table 3 in [5]). It is worth mentioning that the Mean Squared Error (MSE) reduces as the b-value increases for raw data, Fractional Anisotropy (FA), and Mean Diffusivity (MD) owing to the lower signal intensity (see Table 2). Simultaneously, the LPIPS scores exhibit an upward trend with an increase in the b-value, suggesting a heightened level of dissimilarity among images with more robust diffusion weighting. The results of the three distinct harmonization approaches are shown in Figure 3 in [5]. The proposed method and MICA significantly reduce signal distortion compared to SHResNet. The MICA technique is incapable of signal smoothing since it does not intrinsically consider location-dependent effects. However, our suggested approach takes local context into account. Figure 3 in [5] shows an example of this impact, shown by the green circles.

WP2—Development of Methods for Data Synthesis

The method of data synthesis relies significantly on the generation of training data using voxels obtained from each subject individually. In this study, using an eroded Fast segmentation mask [15], cerebrospinal fluid (CSF) and gray matter (GM) voxels were extracted from a registered T1 image. The diffusion tensors are calibrated for individual b-value shells in order to accommodate these specific voxels. The mean diffusivity of cerebrospinal fluid (CSF) and gray matter (GM) are determined based on these voxels. Subsequently, these values are employed to produce artificial training data. A single diffusion tensor may prove inadequate in representing the white matter (WM) diffusion signal throughout the brain, given that multiple microstructural compartments may impact the diffusion within a voxel. The development of a single-fiber response function in constrained spherical deconvolution (CSD) [16] is used to extract single-fiber WM voxels from voxels within the corpus callosum with an FA larger than a pre-defined threshold (FA>0.7). In comparison to the approach of constrained spherical deconvolution (CSD), our method involves the preservation of all single-fiber voxels extracted from the white matter. Then, for each voxel and shell, we fit prolate diffusion tensors, predicting three diffusion tensors for a three-shell acquisition, identical to the diffusion sequence used in the Human Connectome Project (HCP). Importantly, these tensors are estimated independently of one another. This leads to a range of diffusion tensors that represent various white matter microstructures. This method relies on the assumptions that GM and CSF diffusion are isotropic and that diffusion can be defined by single fibers with prolate tensors for a single b-value. Because diffusion characteristics are evaluated separately for each shell, b-value-dependent effects (such as kurtosis) are incorporated in the synthetic data. In this study, an autoencoder [17], a network architecture originally created for denoising purposes, is used to evaluate the quality of synthetic diffusion data. When synthetic data is utilized for training and real data is utilized for evaluation, it can serve as a metric for assessing the quality of synthetic data. In cases where the generative model of synthetic data aligns with only a subset of the features present in the in vivo data, the trained autoencoder may face limitations in its ability to generalize across the entire spectrum of in vivo data. Conversely, if the generative model induces excessive diversity, the autoencoder’s performance may be compromised due to the neural network’s loss of specificity. Consequently, the utilization of synthetic data that is more appropriate will result in enhanced reconstruction performance in in-vivo data [8]. The autoencoder proposed in this study is designed to be self-adaptive to the shape of the input in order to optimize the balance between the generality and specificity of the data, regardless of the type of diffusion acquisition. Consequently, the dimensions of the architecture are dynamically adapted to the number of diffusion directions. Four distinct synthetic diffusion data models were subjected to training and evaluation on single-voxel data in order to assess their respective quality. We evaluate the autoencoder’s reconstruction performance on various synthetic datasets using the diffusion-weighted raw signal and metrics obtained from adaptive diffusion tensors. In order to evaluate the reconstruction performance, first of all the diffusion-weighted signal is considered; this is the diffusion signal divided by the b\(_0\) value. A detailed evaluation is provided in [8], where Figure 1 depicts the standard deviation of the reconstructed signal from dMRI images of the brain. Reconstruction performance for Syn, Sample &Mean, and Sample are almost comparable and display no statistically significant variations in HCP and study data, whereas RandomWM mode performs far worse. The performance of reconstruction varies between the HCP acquisitions and study data. The voxel sizes employed in the HCP study (1.25 mm) and the local study (2.4 mm) provide an explanation for this difference in results. Accordingly, one voxel in the local study data corresponds to \((2.4/1.25)^3\) \(\approx\) 7 voxels in the HCP data. Without taking into account variations caused by other acquisition parameters and the pre-processing pipeline, an SNR difference between the two datasets of \((2.4/1.25)^3\) \(\approx\) 2.66 is expected. The observed difference in the raw reconstruction performance is precisely reflected by the mean absolute error values obtained for the three multi-tensor simulations, which are 0.014 and 0.037 for the local study data and HCP data, respectively. Subsequently, the reconstruction performance was evaluated utilizing metrics that were derived from adapted diffusion tensors. The study found insignificant variation in fractional anisotropy (FA) among the three multi-tensor modes, namely Syn, Sample& Mean, and Sample. The mean absolute error for the three modes was 0.0075 in the local study data and 0.0104 in the HCP data. In contrast, the RandomWM model exhibited considerably poorer performance, with a mean absolute deviation of 0.17 in the local study data and 0.22 in the HCP data. Figure 2 in [8] illustrates notable dissimilarities among the multi-tensor modes in their ability to reconstruct the main fiber direction. Significant dissimilarities are apparent between the Syn and WM sampling models. In contrast, variations in the average or sampling of GM and CSF diffusion attenuation do not exert a comparable level of impact on the outcomes. The RandomWM method’s performance, as indicated by the local study (42.4 ± 3.2) and HCP (42.7 ± 2.8), is notably inferior to the other methods. Consequently, it has been omitted in the figure to enhance clarity.

This approach for generating synthetic single-voxel data was successfully applied for training free-water correction approaches in diffusion imaging data [18, 19].

Fig. 1
figure 1

Upper row: the phase in diffusion-weighted images. Middle row: the phase in diffusion tensor maps. Bottom row: in comparison, the corresponding representations of the amplitude images

WP3—Collection of diffusion data for the evaluation of the project

Thirty healthy subjects were scanned on different MRI machines using high-resolution diffusion sequences. For this purpose, two different 3T MRI systems (a Siemens Prisma Scanner and a Siemens Prisma Fit Scanner with coils with 20 and 64 channels, respectively) from the University Hospital Aachen were used. The Siemens Prisma Fit Scanner is identical in design to the Siemens Prisma Scanner. For this study, 30 healthy male subjects were recruited through lectures and social media. The subjects’ ages were divided into two groups (21-25 and 26-30). Identical MRI measurements of the head were performed on both MRI systems. An anatomical measurement of the brain was performed at the beginning of each scan, which was necessary to examine different brain areas. Then, a diffusion imaging sequence was used based on the Human Connectome Project sequence. This very high-resolution sequence allows for optimal evaluation of newly developed methods for reconstructing nerve pathways. Furthermore, this recording can simulate a clinical recording, which does not need to be additionally taken. Measurements on both MRI systems were conducted on the same day, where possible. If this was not possible, the measurements were performed on two consecutive days and as close together as possible at the same time of day. This excluded side effects caused by growth or cyclic hormone effects. In a separate Zoom meeting before the first scan, the subjects were comprehensively informed about the study and its risks. The first measurement was then taken at the first MRI machine, and the second measurement was taken at the second appointment. Each of the two measurements took 1 h and 30 min. In total, it is assumed that the participation in the experiment (4 h) was compensated with 50€. A written consent form for participation in the study will be obtained from all subjects. This study was conducted in accordance with the Declaration of Helsinki. Further information on the recording sequence and study participants can be found in [20], Chapter 6. The data preprocessing was implemented using a Nipype pipeline [21]. In the first step, the DICOM raw images were converted to the NIFTI format using dcm2niix [22]. Skull-stripping and segmentation are performed with ANTs [23] on two T1 recordings, one from the 64 channel head coil session and one from the 20 channel head coil session. These processes were carried out jointly with the ”CorticalThickness” tool, which uses established criteria for brain extraction and tissue segmentation to carry out an anatomical T1 brain processing. The brain is segmented into cerebrospinal fluid, cortical gray matter, deep gray matter, and white matter. FSL [24] tools are used to process the dMRI scans. The anterior-posterior scans are corrected for susceptibility-induced distortions using the posterior-anterior data with FSL Topup [25]. The brain is extracted using BET [26]. Eddy currents are removed using FSL Eddy [27] to correct motion artifacts. The segmentation map of the brain tissue is transformed into the diffusion space. Dipy [28] is used to fit diffusion tensors to the dMRI data and derive fractional anisotropy (FA) maps. The FA maps are used for affine registration to the T1 scan using the T1 scan from the same scanner. All registrations are performed using the special ANTs registration toolbox [29]. Finally, the four separate dMRI recordings for each subject must be merged in order to analyze differences in scanner setup and subsequent harmonization.

WP4—Complex Signals (Amplitude and Phase)

The B\(_0\) field strength shows small local changes depending on the magnetizability of the tissue. For example, blood (which is rich in iron) has a different susceptibility than surrounding tissue. In T2* weighted images, the phase image provides a clear contrast because T2* weighted images are sensitive to susceptibility artifacts and, therefore, to phase. This is mainly due to the gradient-echo sequence used for T2* weighted images. To suppress these T2* effects, pulse-echo sequences are used for T1 and T2 weighted images, which suppress dephasing. Therefore, the remaining phase is very noisy because it is close to zero. In the case of diffusion-weighted EPI sequences, a multi-echo GRE sequence is used during acquisition and a spin-echo sequence during excitation. The spin-echo sequence’s echo occurs during acquisition (Fig. 1). For this reason, only a noisy phase can be measured precisely at TE. Although the ratio of phase to noise is better in the higher-frequency parts of the k-space, they contain only a few image details. Thus, a classical DWI sequence in its current form is unsuitable for obtaining phase information. Changing the weighting and acquisition sequence could improve the phase data’s noise. However, a research scanner would be necessary for this purpose. Therefore, whether phase information provides added value cannot be definitively determined. It is known that susceptibility depends on the applied B\(_0\) field. However, the B\(_0\) field is almost unchanged during a DWI sequence because the diffusion gradients are two orders of magnitude smaller than the applied B\(_0\) field.

WP5—Spherical Signals

Recent results in DL suggest that incorporating additional information improves the results of the respective DL approach [30, 31].Although novel network architectures such as the UNet [32], ResNet [33], or DenseNet [34] can improve performance, it is commonly agreed that it is still crucial to develop sensible training strategies, individualized loss functions and individual layers [35]. In order to integrate the spherical character of a measured signal into a DL network, novel spherical DL layers were developed in this work package and analyzed with respect to a diffusion signal to ensure optimal processing of the diffusion signal in a deep neural network.

With respect to diffusion imaging data, the effect of different activation functions on the diffusion signal (Fig. 2), as well as on the Fourier signal of a diffusion signal (Fig. 3), was analyzed. This is particularly important since the spherical character of the signal has to be maintained within a DL network. The first activation function (Fig. 2 left) shows the Rectified Linear Unit (ReLU). This activation function shows the input signal without modification, as a ReLU only sets negative values to zero. Since the diffusion signal has positive values only, this type of activation has no effect. This changes when a sigmoid or a hyperbolic tangent (TanH) function are chosen as activation functions. Here, a slight contrast reduction (sigmoid) as well as a contrast enhancement (TanH) can be observed after applying these functions.

A slightly different situation arises when the activation function is applied in the Fourier space. A regular diffusion signal in the Fourier space usually has many values close to zero. Only the DC component of the signal has larger absolute values. A comparison of the different activation functions applied in the Fourier space can be seen in Fig. 3.

As illustrated in Fig. 3, a ReLU activation leads to reduced contrast as well as a drastic change in the signal, as any negative coefficients are removed. In case of the sigmoid function, negative values are converted to positive values, which destroys parts of the diffusion signal and therefore has a substantially stronger effect on the signal. In contrast, the TanH function accomplishes a contrast-enhancing effect by altering small values minimally, but large values significantly. This mainly affects the DC component of the signal, which is reduced by the function.

Fig. 2
figure 2

Visualization of the diffusion signal for three different activation functions (from left to right: ReLU activation, sigmoid activation, and TanH activation), applied in the signal space. (Images courtesy Simon Koppers, Aachen)

Fig. 3
figure 3

Visualization of the diffusion signal for three different activation functions (from left to right: ReLU activation, sigmoid activation, and TanH activation), applied in the Fourier space. (Images courtesy Simon Koppers, Aachen)

To evaluate the reconstruction accuracy of different spherical layers, a denoising autoencoder [36] was employed. Diffusion signals were artificially enhanced with noise, and the different layer-specific autoencoders were then used to remove this noise. Different combinations of spherical layers and activation functions were used to evaluate the individual autoencoders. Subsequently, the performance was assessed through three distinct evaluations: runtime, signal denoising performance, and the impact on reconstruction accuracy in state-of-the-art reconstruction methods. Twenty subjects from the Human Connectome Project were used for evaluation purposes. This database provides highly resolved diffusion images of healthy subjects. In detail, each diffusion image has a voxel dimension of 1.5 mm\(^3\), three different b-values with 90 gradient directions each, and an additional set of 18 non-weighted diffusion images. The different methods were evaluated only on the white matter region of the individual brains. All results, as well as a detailed description of the experiments, were published in [11]. In general, it can be said that spherical layers lead to a significant gain of information within a DL network.

3 Discussion

The development of a DL analysis pipeline for diffusion imaging data is an exciting and innovative approach that has the potential to significantly improve the efficiency and accuracy of clinical brain research. The project consists of four methodological work packages (WP1, WP2, WP4, and WP5), each addressing a fundamental research question related to the application of DL in diffusion imaging data, plus one work package (WP3) for diffusion data acquisition. The first work package focuses on the lack of transferability of diffusion MRI data between clinical sites. This is a major barrier to the widespread adoption of diffusion MRI in clinical practice, as different MRI systems can produce significantly different results. The development of an optimal method for harmonizing MRI signals is essential to ensure that the DL pipeline can be used effectively across different clinical sites. The second work package addresses the issue of the lack of training and label data for DL. This is a common problem in medical imaging, as ground truth data is often difficult to obtain. The proposed framework for synthesizing diffusion data based on important diffusion characteristics and statistics is a promising solution to this problem. This approach will allow for the creation of large single voxel datasets with corresponding ground truth that can be used to train DL pipelines for diffusion imaging data analysis. The fourth work package explores the potential of complex diffusion data. While the phase information is often discarded during acquisition, complex MRI signals comprising amplitude and phase may carry important tissue information that could potentially be used to improve the accuracy of reconstruction. However, our results were not conclusive yet and would need sequence development on a research MRI scanner for further exploration. The fifth work package focuses on the integration of spherical signals in neural networks to improve accuracy by explicitly considering the spherical character of the signals. Previous DL methods have not been able to incorporate angle-related diffusion signals per voxel, which is why new methods are needed to adopt previous DL methods to spherical signals. This approach will allow for the inclusion of neighboring information within the signal as well as between signals to ensure optimal reconstruction. Overall, the proposed DL approaches for diffusion imaging data have the potential to further support clinical brain research by improving the accuracy and efficiency of diffusion MRI analyses. The presented work packages address fundamental research questions related to the application of DL in diffusion imaging, and the results of this project could have a significant impact on the future application of diffusion imaging data for clinical purposes.

4 Conclusion

In conclusion, the development of an innovative and disruptive DL pipeline for diffusion imaging data is a significant step towards improving clinical practice in brain research. The identified research questions, including the lack of transferability of diffusion MRI data between clinical sites, the lack of training and label data, the potential of complex diffusion data, and the integration of spherical signals in neural networks, have been effectively addressed in this project. The results show that the developed pipeline can handle harmonization issues, synthesize single-voxel diffusion imaging data, and integrate spherical signals into DL models to improve the reconstruction quality. Future research may further refine and expand upon these developments to enhance the capabilities of DL in the field of dMRI for clinical application.