Introduction

Graph-based models have become mainstream tools for the visualization and characterization of relationship structures within complex systems in machine learning, computer vision, and bioinformatics. Unfortunately, their application is not without challenges—particularly when dealing with representations at higher resolutions, as their computational and storage demands can exceed the capacity of most computing resources. Data reduction is indispensable when the feasibility of graph-based models is impacted. Such reductions can be achieved through statistical techniques (e.g., principal component, factor analyses), data downsampling approaches (e.g., local averaging based on spatial or topographic structure), or both. Here, using graph-based models in neuroimaging as an example, we demonstrate the need for careful consideration of the order in which these two reduction strategies are applied, suggesting an optimized workflow when both are pursued.

In neuroimaging, graph-based representations are increasingly used to characterize topographic patterns of gradual transitions and boundaries between systems—i.e., connectivity gradients1,2,3. In contrast to typical functional connectivity analysis, which focuses on examining relationships between brain regions4,5, connectivity gradients capture patterns of systematic spatial variation in the functional connectivity structure1,6. They leverage an affinity matrix, which encodes the spatial similarities between region-wise connectivity profiles, to uncover the underlying low-dimensional principles of functional brain organization within the high-dimensional connectivity data. While functional connectivity focuses on the connections between specific brain regions, connectivity gradients provide a more global and continuous representation of functional brain organization1,7. But maintaining a full spatial representation of the connectome is challenging. The number of connections scales quadratically with the number of nodes, and a common spatial resolution of the brain comprising 60,000 cortical nodes (vertices) constitutes 1,799,970,000 pairwise connections (edges). This presents a substantial challenge to characterize the full-scale connectivity structure on consumer-grade hardware and without access to dedicated computational resources. A common practice to overcome this challenge is to reduce the spatial resolution of the data, typically by averaging across vertices within spatial regions of interest (ROIs) defined using a parcellation template8. However, the loss of individual-specific detail in such parcellation approaches may result in the loss of meaningful information; there is also little agreement on the appropriate choice of parcellation templates in connectome studies9,10. Moreover, the capability to maintain a fine-grained spatial resolution allows researchers to fully characterize the individualized spatial layout of functional regions, an important and often overlooked aspect of modeling brain connectivity11,12,13, which could improve brain-behavior associations. The emphasis on spatial specificity aligns with the core principle of connectivity gradients—capturing axes of systematic connectivity change which delineate gradual transitions across brain areas7. Discrete parcellation schemes might obscure the topography of such smooth transitions and unclear border regions, failing to fully capture the continuous and gradual nature of connectivity gradients1.

Here, to facilitate connectivity gradients at a fine-grained spatial resolution, we propose Fast Connectivity Gradient Approximation (FCGA). Instead of using a full-scale vertex-by-vertex connectivity matrix, at its core, FCGA leverages a set of landmarks to approximate the underlying connectivity structure at full spatial resolution, with an efficiency that makes it possible to run on common computer hardware. These landmarks are flexible, and can, for example, be based on individual vertices or predefined ROIs.

Results and discussion

A schematic workflow of our proposed approach is displayed in Fig. 1a. After (i) establishing a set of k landmarks, the (ii) functional connectivity between all n vertices (or voxels) and the k landmarks is calculated, resulting in a n-by-k connectivity matrix (CMn-by-k) between all pairs of vertices and landmarks. Following current practices when calculating connectivity gradients1,14, we quantify the spatial similarity of vertex-wise connectivity profiles. To do so, (iii) a k-by-k (landmark-by-landmark) connectivity matrix (CMk-by-k) that characterizes the functional connectivity between all pairs of landmarks is established. Then, row-wise thresholding, retaining the 10% strongest positive connections, is applied to both CMn-by-k and CMk-by-k. Subsequently, (iv) the row-wise cosine similarity between all pairs of vertices in CMn-by-k and CMk-by-k is calculated. This results in an n-by-k affinity matrix (Wn-by-k) that characterizes the spatial similarities between the voxel-wise connectivity profiles with the landmark regions. Finally, (v) we use Principal Component Analysis (PCA) for dimensionality reduction of Wn-by-k, establishing the approximated connectivity gradients.

Fig. 1: Gradient approximation facilitates spatially fine-grained connectivity gradients while reducing computational costs.
figure 1

a A schematic workflow of the proposed Fast Connectivity Gradient Approximation (FCGA) approach. b FCGA provides high fidelity to the full connectivity structure with only a fraction of landmarks. Thin line plots denote the spatial similarity of the first 25 gradients, and thick line plots denote the respective average. c FCGA facilitates a high spatial similarity with reduced computational time and memory requirements. d The core topography is similar across a different number of landmarks and sampling choices.

We evaluated the validity of FCGA on the group and individual level, using high-resolution fMRI data (59,412 cortical vertices) from the Human Connectome Project (HCP)15 and the Nathan Kline Institute-Rockland Sample (NKI-RS)16. We parametrically varied the amount of downsampling (number of landmarks) across sampling choices (vertex- and ROI-level).

At the group level, we used Spearman rank correlation to compare the spatial similarity between the approximated connectivity gradients (GFCGA) and the connectivity gradients based on the full 59,412-by-59,412 connectivity matrix (Gfull_fc). The spatial similarity between GFCGA and Gfull_fc increased with the number of landmarks used to calculate the approximated gradients (Fig. 1b). Remarkably, using 1000 ROIs as landmarks (~1.7% of the full connectivity matrix), the average spatial similarity across 25 connectivity gradients reached  = 0.86. Using 3000 (~5%) uniformly distributed vertices, a spatial similarity of  = 0.98 was achieved with <10% of the computational time and memory usage as compared to the calculation of Gfull_fc (Fig. 1c). This was replicated on the individual level, where across 100 individuals, GFCGA with landmarks based on ~5% uniformly distributed vertices yielded an average spatial similarity of  = 0.9 (Supplementary Fig. S1). The spatial topography of the top connectivity gradients, shown in Fig. 1d, further illustrates the high spatial similarity between GFCGA and Gfull_fc across a number of sampling choices. Vertex-wise similarities between the gradient profiles of GFCGA and Gfull_fc further emphasized the accuracy of the gradient approximation (Supplementary Fig. S2).

At the individual level, reliability and discriminability analysis confirmed the repeatability and the preservation of individual features with GFCGA. To quantify the agreement between GFCGA and Gfull_fc, we calculated the intraclass correlation coefficient (ICC) and discriminability for the same acquisition. We observed an average vertex-wise ICC of >0.9 from 1000 (~1.7%) landmarks onwards, and discriminability was already nearly perfect with 300 (<1%) landmarks (Fig. 2a). Next, we compared the ICC and the discriminability of GFCGA and Gfull_fc across sessions (Fig. 2b). We observed that landmarks based on single vertices (uniformly or randomly distributed) yielded slightly lower ICC and discriminability for GFCGA than for Gfull_fc. However, landmarks based on predefined parcels on the group or individual level yielded a slightly better discriminability and a similar ICC for GFCGA when compared to Gfull_fc (Fig. 2b).

Fig. 2: Gradient approximation preserves informative individual features and improves brain-behavior predictions.
figure 2

Intraclass correlation and discriminability analysis confirms the reliability, repeatability, and the preservation of individual features (a) within and (b) across sessions. Boxplots show intraclass correlation and discriminability for the first 25 gradients, and colored boxes indicate interquartile range (iqr) with whiskers spanning 1.5*iqr. Markers and line plots denote respective means. c Parcel-averaging of spatially fine-grained gradients outperforms gradient calculation on a parcel level for both age and FSIQ prediction. Fast connectivity gradient approximation (FCGA) with landmarks based on the group-average Schaefer parcellation (n = 1000) was used to establish vertex-level gradients before parcel averaging. The first five gradients were used as features. Boxplots show 100 runs of tenfold cross-validation, and colored boxes indicate interquartile range (iqr) with whiskers spanning 1.5*iqr. Predictions that are not significantly better than prediction with randomly shuffled labels are denoted as n.s.

Finally, we evaluated the practical implications of our FCGA approach on brain-wide association studies by predicting age and full-scale intelligence (FSIQ) across the lifespan using the first five gradients. Specifically, we compared the predictive performance of coarse-grained gradients calculated from parcellated time-series to FCGA constructed parcel-wise summarized fine-grained gradients (Fig. 2c). Importantly, we observed that fine-grained gradients (i.e., GFCGA) subjected to parcel-averaging outperformed coarse-grained gradients for predicting both age and FSIQ across all tested parcellations (pBonferroni < 0.05 in 32/32 comparisons) (Fig. 2c). Notably, for FSIQ we observed that gradients calculated on a low number of parcels did not perform better than chance. The improved prediction for parcel averaging of spatially fine-grained gradients was replicated in the HCP sample across age-adjusted composite cognition scores and an estimate for fluid intelligence (Supplementary Fig. S3). This indicates that constructing connectivity gradients from parcellated data might lose meaningful information in the process that is otherwise preserved in the fine-grained gradient topography.

The proposed approximation framework is not limited to the landmarks we used in the current study and is flexible to incorporate study-specific landmarks if needed. For example, landmarks based on individualized parcellations might yield higher discriminability across sessions, indicating potential benefits of optimized landmarks for capturing individual differences. While identifying optimal regional homologies across development or aging requires further research, we demonstrated that the use of uniformly or randomly distributed vertices as landmarks already offers a highly accurate approximation on the group and individual levels.

It is also important to note that a vertex-wise representation is not necessarily superior to parcellation-based representations when the quality of the data is suboptimal17,18. Region-wise averaging helps to increase the signal-to-noise ratio, and spatially fine-grained data can entail computational challenges for further analyses. Nonetheless, our findings indicate that summarizing the gradient coefficients of a vertex-level representation is more informative than calculating the gradients directly on parcellated (downsampled) data. While future research is necessary to evaluate the potential applications for connectivity gradients14,19, an intermediate vertex-level representation might benefit further parcellation-based gradient analysis.

Taken together, our results demonstrate the feasibility of establishing spatially fine-grained connectivity gradients while mitigating the computational burden of vertex-level functional connectivity data by reducing computational time and memory usage. The ability to fully appreciate the spatial layout of functional regions in gradient representations at full resolution could improve associations between functional imaging data and cognitive traits11, and inform structure-function relationships20. Importantly, the advantage of preserving informative individual features in gradients of spatially fine-grained connectivity data was emphasized by the improved prediction of age and intelligence over gradients based on coarse-grained connectivity data. Overall, FCGA strengthens the core concept of connectivity gradients by maintaining the full spatial resolution to study spatial transitions of brain organization. Finally, its efficiency also removes computational barriers and can enable the widespread application of connectivity gradients to capture signatures of the connectome.

Methods

Data

To evaluate the validity of FCGA on the group and individual levels, we used resting-state fMRI data from the HCP15. The HCP data was acquired at Washington University at St. Louis on a customized Siemens 3T Connectome Skyra scanner. Institutional Review Board approval was obtained at the Washington University in St Louis, and written informed consent was obtained for all study participants. All ethical regulations relevant to human research participants were followed. Resting-state fMRI was acquired with a multiband factor of 8, 2 mm isotropic resolution, and a repetition time of 0.72 s for a duration of 14.4 min, resulting in 1200 volumes per acquisition. Participants were asked to relax, keep eyes open, and fixated on a crosshair, and not to fall asleep. Four resting-state runs were collected in different sessions across two days (REST1 and REST2), where each session comprised two runs with different phase encoding directions (LR and RL). We used the minimally preprocessed fMRI data provided by the HCP21. Briefly, the resting-state data has been motion corrected, minimally spatially smoothed (2 mm), high-pass filtered (2000s cutoff), denoised for motion-related confounds and artifacts using independent component analysis22, and spatially aligned to the 2 mm standard CIFTI grayordinates space21,23.

At the group level, we used the dense group-average functional connectome provided as part of the HCP S1200 data release. In brief, this connectivity matrix was constructed from the minimal preprocessed data across 1003 individuals that each had ~1 h (4 × 14.4 min) of resting-state fMRI acquisitions. The dense connectivity matrix of size 91,282 × 91,282 grayordinates (59,412 cortical vertices and 31,870 subcortical voxels) was calculated based on components of an incremental group-PCA. For the group-average connectome of the HCP, we averaged the precomputed functional connectivity measures for each landmark instead of recalculating landmark-based connectivity. At individual level, we used the 100 unrelated subjects cohort from the HCP (54F/46M, age = 29 ± 3.7 years) and two repeated resting-state fMRI runs (REST1_LR and REST2_LR) for each individual. We used the minimally preprocessed data as provided by the HCP. For both the group and individual levels, we focused our analysis on the 59,412 cortical vertices.

We used a cross-sectional lifespan sample with 313 healthy participants (214 female, age: 6–85 years, 42.2 ± 22.4 years) to evaluate the practical implications of our approach on brain-wide association studies. Participants were selected from the NKI-RS16, who have no diagnosis of any mental or neurological disorders and passed quality control of a head motion criteria (mean framewise displacement < 0.25 mm). The NKI-RS data was acquired at the Nathan Kline Institute on a Siemens TrioTim 3 Tesla scanner. Institutional review board approval was obtained at the Nathan Kline Institute, and written informed consent was obtained for all study participants. All ethical regulations relevant to human research participants were followed. Resting-state fMRI was acquired with a multiband factor of 4, 3 mm isotropic resolution, and a repetition time of 0.645 s for a duration of 9.7 min, which resulted in 900 volumes per run. Preprocessing was performed with the Connectome Computational System24 and included discarding the first five time points, compressing temporal spikes, slice timing correction, motion correction, 4D global mean intensity normalization, nuisance regression (Friston’s 24 model, cerebrospinal fluid and white matter), linear and quadratic detrending, band-pass filtering (0.01–0.1 Hz), as well as global signal regression. The preprocessed data were then projected on the 32k fsLR surface template with 32,492 vertices per hemisphere.

Connectivity landmarks

We used a varying number of landmarks across different sampling strategies. We utilized randomly and uniformly distributed vertices across the cortex, where the selected vertices were consistent across individuals. On the ROI-level, connectivity landmarks were defined by the average time series based on group parcellations25,26,27 and individualized parcellations for the HCP sample28.

Comparing gradients

For the group-level analysis and comparisons, no alignment or reordering of the gradients (PCA components) was performed. For the analysis on the individual level, we used one set of reference gradients based on the HCP dense connectome for both the approximated and full-scale gradients. For each individual, sampling, and gradient construction approaches, the gradients were aligned to this reference with orthogonal Procrustes alignment29. Orthogonal Procrustes finds the optimal linear transformation so that two sets of gradients have a matched component order and coefficient signs.

Intraclass correlation and discriminability

We compared the reliability and discriminability of the approximated gradients (GFCGA) and the full-scale gradients (Gfull_fc) within and across sessions, using the two repeated resting-state acquisitions from the 100 unrelated individual samples of the HCP. Within-session analysis treated GFCGA and Gfull_fc of the same acquisition as repeated measures. Across-session analysis evaluated differences in the reliability and reproducibility between GFCGA and Gfull_fc. Discriminability30 was used to measure the similarity of connectivity gradients. It is a nonparametric multivariate statistic that quantifies the degree to which repeated measurements (e.g., gradients of different approaches or sessions) are relatively similar to each other. At the vertex level, we quantified the ICC, a univariate measure of the degree of absolute agreement31,32.

Prediction of individual-specific measures

We used the NKI-RS lifespan sample to evaluate implications of our proposed approach for the prediction of individual-specific measures such as age and FSIQ, which were only weakly correlated (r = 0.13). As described above, connectivity gradients are often calculated on a reduced data representation on a parcel level. In this study, we compared the predictive performance of connectivity gradients calculated on parcellated time series (parcellation-to-gradients) to a parcel-wise averaging of fine-scale gradients constructed with FCGA (gradients-to-parcellation). Connectivity gradients of the parcellated data are established as in prior work1, using all parcels as landmarks instead of a subset. We evaluated distinct, commonly used parcellations25,26,27, and focused on the first five gradients as features to avoid excessively increasing the feature space. For prediction, we used ridge regression with a L2 regularization as implemented with Glmnet33, and a nested tenfold cross-validation scheme for hyperparameter (lambda) selection. The predictions of each fold were aggregated, and performance was measured with mean absolute error and correlation to the true age and FSIQ values. We repeated the tenfold cross-validation runs 100 times, with random splits for each fold. Additionally, we tested the prediction results against a baseline of 100 prediction runs with randomly shuffled labels to evaluate if the predictive performance is greater than chance.

Statistics and reproducibility

Spatial similarity between the approximated connectivity gradients (GFCGA) and the connectivity gradients based on the full connectivity matrix (Gfull_fc) was quantified using Spearman correlation at both group- and individual levels. A tenfold cross-validation scheme was used for the predictive modeling, utilizing a nested cross-validation within the training set for hyperparameter selection. The tenfold cross-validation was repeated 100 times with random splits. Predictive modeling was performed on the NKI-RS lifespan sample and replicated with the HCP sample.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.