1 Introduction

Computational neuroimaging methods have aroused interest in identifying group differences in white matter (WM) via diffusion magnetic resonance imaging (dMRI). Traditional methods rely on techniques such as voxel-based morphometry (VBM) [2] and tract-based spatial statistics (TBSS) [24] for voxel-based analyses. Tractography-based analysis, on the other hand, has enabled measurement of macrostructural WM properties of specific subpopulations of fibers [5]. In this work, we focus on identifying WM group differences using whole brain tractography.

A standard tractography-based group difference analysis includes first selecting a tract of interest (e.g. the corpus callosum) and then comparing groups to find statistical differences in WM diffusion features (e.g. anisotropy), either using feature mean values [1] or along-tract measures [6, 7, 9]. These studies suggest that fibers with similar WM anatomy (fiber geometric trajectory) in general share similar diffusion properties. However, they are generally limited to a small number of selected tracts. Another strategy, which can efficiently identify multiple tracts, is to establish the tract correspondence directly based on fiber geometry [17]. One recent work has applied this strategy to identify 30 tracts (a subset of the WM, e.g. only corticospinal tract but not full corona radiata) and performed statistical analysis in the whole brain [28]. Our method is based on a study-specific whole brain WM parcellation into more regions (a total of 1416 WM parcels from all input tractography) and hence can allow identification of potential group differences more specific to local WM anatomy.

For group comparison, a hypothesis test (e.g. Student’s t-test) is normally used to identify group differences, followed by multiple comparison correction (e.g. false discovery rate (FDR) [4] and Bonferroni [10] methods) for corrected statistical significance. Since these commonly-used correction methods can be less sensitive in finding significance, voxel-based multiple comparison correction has been conducted in a cluster-thresholding manner that utilizes spatial neighborhoods to boost belief in extended cluster areas [12]. Studies using voxel-cluster-thresholding methods have found WM group differences [19, 23]. One work also applied the cluster-thresholding method to identify group differences in fiber segments [26]. However, we found no related work has been conducted to identify tractography-based group differences for whole brain analysis.

In light of the above, we propose a supra-threshold fiber cluster (STFC) method to identify WM group differences using whole brain tractography. The novelty is that, for the first time, the proposed method leverages the whole brain fiber geometry during the statistical analysis of tractography. Specifically, we define a WM parcel neighborhood according to the WM anatomy and we propose a novel method that uses these WM neighborhoods for determining statistical significance. The method uses a study-specific data-driven WM parcellation for parcel neighborhood construction (Sect. 2.2). The STFC test then leverages the neighborhoods to identify fiber clusters of multiple WM tract parcels for statistical significance with multiple comparison correction (Sect. 2.4). We demonstrate our method by application to a multi-shell dMRI dataset from attention deficit hyperactivity disorder (ADHD) patients and healthy controls (HCs).

2 Methods

2.1 Dataset

We used a multi-shell (b values of 1000/2000/3000 s/mm\(^2\)) diffusion weighted imaging (DWI) dataset from 59 individuals (30 ADHD, 7 females and 23 males, age: 10.6 ± 1.7 years; 29 HC, 10 females and 19 males, age: 10.7 ± 1.7 years). The two groups were matched for age and socioeconomic status (SES). DWI data were acquired using a multi-slice acquisition (x2) at spatial resolution of 2 \(\times \) 2 \(\times \) 2 mm\(^3\) with 70 gradient directions, and then processed using a semi-automated quality control to remove gradients with signal drop and to correct head motion.

We conducted whole brain tractography using the unscented Kalman filter method [21, 22]. Tractography was seeded 5 times per voxel and return-to-the-origin probability (RTOP) was measured with a two tensor biexponential model [20] at each point while tracking. The RTOP estimates the net displacement of water molecules between the application of two diffusion gradients and it is known to be sensitive to the anisotropy of WM tissue [13]. We chose RTOP because it is known to be sensitive to the anisotropy of WM tissue and may increase pathophysiological specificity compared to traditional diffusion anisotropy measures, e.g. fractional anisotropy (FA) [3, 18]. Therefore, we used this measurement to explore potential changes in WM anisotropy in ADHD. We performed diffusion MRI tractography visualization in 3D Slicer (www.slicer.org) via the SlicerDMRI project (dmri.slicer.org).

2.2 Data-Driven WM Parcellation and WM Parcel Neighborhood

Whole brain WM parcellation was conducted using a data-driven pipeline according to the common WM anatomy from the whole population. In brief, the parcellation started with a simultaneous joint alignment of tractography across all subjects (using affine then b-spline transforms) (Fig. 1a) [15]. Next, we learned a study-specific data-driven groupwise WM parcellation (atlas) using a spectral clustering of pairwise fiber trajectory distances across all subjects (Fig. 1b) [16]. Then, we applied the study-specific parcellation to each individual subject (Fig. 1c) [16]. Valid parcels were identified as those that passed a nonparametric one-tailed sign test in the population, based on fiber numbers in each parcel, as in [8]. We obtained a total of 1416 valid hemispheric and commissural parcels.

Parcel neighborhoods (Fig. 1d) were constructed according to the mean of the pairwise fiber distances between parcels in the atlas, i.e. \(D_p=\frac{\sum _I\sum _J d_{ij}}{IJ} \), where \(d_{ij}\) is the fiber pair distance and I and J are the total numbers of fibers in the two parcels. A pair of parcels that had \(D_p\) smaller than a user-given distance threshold \(T_d\) were considered to be neighbors. We applied the mean closest point fiber distance [16] to measure \(d_{ij}\), the same distance used in the spectral clustering. In this way, the neighborhood could capture the anatomical similarity between the parcels.

Fig. 1.
figure 1

Study-specific data-driven WM parcellation. (a) Tractography alignment using groupwise tractography registration, where color indicates subject. (b) Study-specific groupwise WM parcellation (atlas) (left) and example subject-specific tractography parcellation (right). Fibers from different parcels are colored differently, where similar colors represent white-matter-anatomy-similar fibers. (c) Example atlas WM tract parcels (left) and the corresponding subject-specific WM tract parcels (right). (d) Example neighbors of the red tract parcel include the yellow and the blue parcels but not the more distant green one.

2.3 Group Difference at Individual Parcel Level

We performed a null hypothesis test for each individual parcel to find the parcel-level WM difference. Specifically, we measured the median of the RTOP values of all points in each parcel, i.e. \(M_{RTOP}\). An one-tailed Student’s t-test was then performed under null hypothesis \(H_0: \mu _{HC}(M_{RTOP}) \le \mu _{ADHD}(M_{RTOP})\), as studies in ADHD widely suggest decreased diffusion anisotropy [11]. We considered the parcels with p-value <0.05 as the ones with parcel-level differences.

2.4 Supra-Threshold Fiber Cluster Test

Then, the parcel-level differences were tested for significance in a multiple comparison correction using a permutation-based STFC test. This is similar to the process in a voxel-image-based supra-threshold cluster test [12]. However, our method leverages the fiber spatial relationships from the whole brain fiber geometry to build the parcel neighborhoods, while a voxel-image-based method relies on voxel spatial neighborhoods. In detail, an STFC was defined as a cluster of multiple parcels with parcel-level differences, where each parcel neighbored at least one other parcel(s) in the cluster (under the distance threshold \(T_d\)). The STFC test performed a nonparametric permutation test using a summary statistic of maximal STFC size (maxSTFCS). N = 10000 permutations were performed in all experiments, resulting in a distribution of maxSTFCS (the null distribution) that enabled computation of corrected STFC significance. Algorithm 1 shows the pseudocode of the method. Specifically, we first computed a histogram of \( maxSTFCS \) across multiple permutation runs (lines 1 to 5). Then, for each STFC from the correctly labeled groups, its corrected significance value was computed by comparing its STFC size (\( STFCS \)) to the \( maxSTFCS \) histogram (lines 6 to 11).

figure a

3 Experimental Results

3.1 Synthetic Data

We first illustrate our method on synthetic data. To simplify the assessment and tractography visualization, we created a realistic synthetic dataset with true group difference in the corpus callosum (CC), generated as follows. We identified a total of 34 CC parcels from the whole brain parcellation, as shown in Fig. 2a. For each CC parcel, we added white Gaussian noise (signal-to-noise ratio at 1 [25]) to the actual measured features of all HC subjects. Repeating this process twice generated two synthetic groups of G1 and G2, each with 29 subjects. We then modified 15 CC parcels-of-interest to have true group difference by adding synthetic feature changes to the G2 subjects. For each of the 15 parcels, we decreased its group mean \(M_{RTOP}\) values in G2 (as a percent of its original feature mean) for a null hypothesis test: \(H_0:\mu _{G1}(M_{RTOP}) \le \mu _{G2}(M_{RTOP})\). These 15 parcels were selected to form 3 different synthetic ground truth clusters (with sizes of 4, 5 and 6, as shown in Figs. 2b–d). Larger synthetic feature changes led to the parcels-of-interest showing more significant group differences. Then, the evaluation goal was to test if a method could correctly identify the parcels with true significance, even when the added change was small.

Fig. 2.
figure 2

Synthetic data experiment: (a) All CC parcels (a1, size 34) and 15 CC parcels-of-interest (a2, a3 and a4 cluster, with sizes of 4, 5 and 6 respectively) with synthetic group differences. (b) Number of correctly identified CC parcels, versus the level of synthetic group difference. (c) Mean number of correctly identified parcels across the different synthetic changes (as in b), versus the distance threshold \(T_d\). \(T_d\) lower than 44 mm could not identify all ground truth parcels, while \(T_d\) larger than 47 mm grouped inter-cluster parcels together. (Note that \(T_d\) should be higher than the within-parcel mean fiber distance, which was 27.9 ± 5.3 mm.) (d) Distributions of the parcel pair distance \(D_p\) from intra-cluster (blue) and inter-cluster (orange) among the ground truth STFCs (a2–a4).

Comparisons were conducted among the uncorrected t-test, the proposed STFC method, a standard permutation test (Perm-T, N = 10000) that used the minimal t-test-based p-value of all parcels for the summary statistic (as applied in [28]), and two traditional FDR and Bonferroni multiple comparison correction methods. The same significance level of \(\alpha \,=\,0.05\) was used for all compared methods. Figure 2b displays the number of significantly different parcels that were correctly identified in each method. For the STFC method, we displayed the result with \(T_d\) = 44 mm. As shown in Fig. 2c, the reasonable range for \(T_d\) in the synthetic data testing was 44 to 47 mm. This range corresponded to the settings that could identify the 3 ground truth clusters, as illustrated in Fig. 2d. We note that in the STFC experiment we considered a parcel as misidentified if it belonged to a STFC including parcels from more than one ground truth cluster.

3.2 Real Data

Next, we show experimental results on the real data for the corpus callosumFootnote 1 and the whole brain. A one-tailed t-test was first performed with the null hypothesis \(H_0: \mu _{HC}(M_{RTOP}) \le \mu _{ADHD}(M_{RTOP})\). Multiple comparison correction was then conducted using the STFC, the Perm-T, the FDR and the Bonferroni methods respectively, at a significance level of \(\alpha \) = 0.05. For the STFC method, the results were reported at the smallest \(T_d\) where we could identify significances.

Corpus Callosum: 8 of the 34 CC parcels passed the initial t-test. While none survived the Perm-T, the FDR or the Bonferroni methods, a STFC of 4 parcels (Fig. 3a) was identified using the proposed method when \(T_d=32\) mm. The corrected significance value of the identified STFC was 0.0271 as illustrated in Fig. 3b and the group mean feature values of the parcels are shown in Fig. 3c.

Fig. 3.
figure 3

Real data experiment for the corpus callosum: (a) The STFC consisting of 4 CC parcels identified with significant group difference. (b) 270 permutation tests had \(maxSTFCS \ge 4\), leading to the corrected significance value of 271/10001 = 0.0271. (c) Comparison of \(M_{RTOP}\) values per identified parcel, plotted in sorted order.

Fig. 4.
figure 4

Real data experiment for the whole brain WM: (a) The STFC consisting of 15 parcels identified with significant group difference. (b) 455 permutation tests had \(maxSTFCS \ge 15\), leading to the corrected significance value of the identified STFC 456/10001=0.0456. (c) Comparison of \(M_{RTOP}\) values per identified parcel, plotted in sorted order.

Whole Brain: For the whole brain parcellation of a total of 1416 parcels, 654 had a p-value smaller than 0.05 in the initial parcel-level t-test. The FDR and the Bonferroni methods did not find any significantly different parcels between the groups. The Perm-T method identified one individual significant parcel (corrected p-value 0.0272) connecting the middle precentral gyrus to the supra-marginal and the superior-parietal gyri in the right hemisphere. Our method identified one significant STFC of 15 WM parcels when \(T_d=24\) mm, which located in the temporal and occipital lobes (Fig. 4a). The corrected significance value of the identified STFC was 0.0456 as illustrated in Fig. 4b. A comparison of the group mean feature values is given in Fig. 4c.

4 Discussion and Conclusion

We tested our method on a synthetic dataset with known group differences. The results showed that the STFC method was more sensitive to detect the ground truth group differences, while the Perm-T, FDR and the Bonferroni methods could only find the significance when there were large differences (over 15% change of the group mean feature value). As for the experiments on the real data, in CC we found a significantly different STFC of 4 WM parcels connecting to the superior-parietal gyri and the precuneus, with decreased median RTOP values in the ADHD group. For the whole brain WM analysis, our method identified 15 parcels from a STFC with significant group difference in the right temporal and occipital lobes, which have been previously reported to be affected in ADHD [14, 27]. These parcels had lower median RTOP values in the ADHD group when compared to the HC group, suggesting potentially reduced WM anisotropy in ADHD. We did not find any significance using the FDR or Bonferroni methods in two real data tests. One parcel that potentially belonged to the anterior segment of right arcuate fasciculus was identified from the whole brain using the Perm-T method, but no parcel survived in the CC Perm-T analysis.

The STFC method had a parameter of distance threshold \(T_d\) that was used to form the WM parcel neighborhood. Our experiments using the synthetic data showed that too large or too small \(T_d\) values found fewer true significant parcels (Fig. 2c). Given the synthetic ground truth clusters, small \(T_d\) values (e.g. \(T_d<25\) mm, Fig. 2d) were not able to form any neighborhoods within a cluster, while larger values tended to group ground truth clusters together (e.g. \(T_d>65\) mm, Fig. 2d). For our real data tests, we applied the minimal setting of \(T_d\) that could find a significance. This allowed us to identify the parcels that were most similar to each other in terms of their WM anatomy. On the other hand, increasing \(T_d\) could help to find more WM structures. For example, in the whole brain analysis given \(T_d\) = 25 mm, we identified a larger significant STFC of 24 parcels that included the parcels from \(T_d\) = 24 mm and extended to the inferior parietal lobe.

In this paper, we have presented a novel STFC analysis to identify WM group differences using whole brain tractography. Experimental results suggest that our method in general is more sensitive for identifying WM group differences when compared to several traditional multiple comparison correction methods. Similar to voxel-cluster-thresholding analyses, the proposed method aims to find large clusters of WM parcels with significance; thus it could potentially miss some significantly different parcels located in small neighborhoods, e.g. the one individual significant parcel identified in the standard permutation test.