Introduction

Graph theory analyses of structural brain connectivity have been vital to providing breakthroughs in our understanding of how the underlying structure of the brain can influence the patterns of coordinated functional activity (see Avena-Koenigsberger et al. 2018 for a review; see also Goñi et al. 2014; Neudorf et al. 2020a, b, 2022). Defining this relationship between structural and functional connectivity using advanced techniques including graph theory has recently been highlighted as an important frontier in neuroscience (Suárez et al. 2020). When it comes to choosing graph theory measures of connectivity, important assumptions must be made about how information is transferred through the structural network, and the effectiveness of these measures for predicting functional connectivity is dependent on the accuracy of these assumptions about the human brain. Two primary graph theory models of information transfer in the brain include shortest path routing and diffusion. The shortest path routing model relies on the calculation of the shortest path to the destination region. This model is straightforward to calculate and underlies many useful graph theory measures that have been helpful in describing brain networks and networks in general (e.g., characteristic path length, Watts and Strogatz 1998; global efficiency as an indicator of small-worldness, Latora and Marchiori 2001; nodal and local efficiency, Latora and Marchiori 2001; van den Heuvel and Sporns 2013; etc.). One problem with the shortest path routing model when it comes to brain networks is that it assumes each region has whole-brain level knowledge about the most efficient path to use (Avena-Koenigsberger et al. 2019; Seguin et al. 2018, 2022; Zamani Esfahlani et al. 2022).

An alternative graph theory model has been proposed that does not assume whole-brain knowledge about the shortest path but instead assumes that information diffuses along random paths in the network influenced by the relative weighting of each path. Under these model assumptions, information propagates through the network as a “random walker” that is constrained by the structural architecture. Furthermore, information can be transferred in parallel, whereas shortest path routing describes information traveling along a single path to the destination (Fornito et al. 2016a).

Novel graph theory metrics combining both diffusion and shortest path routing models have been developed for use in brain research and applied to the task of predicting functional connectivity from the underlying structural connectivity (Goñi et al. 2014). Search information was developed as a measure of how many distractor paths may lead a random walker away from the shortest path, while path transitivity measures how likely a random walker on a detour will end up back on the shortest path. While these measures were more successful than the shortest path length alone at predicting functional connectivity from structural connectivity, there are other graph theory measures of connectivity that consider the full range of possible paths based on a diffusion model, rather than hinging on what happens around the shortest path during information transfer. One such measure of diffusion efficiency is the mean first passage time (Wang and Pei 2008), which calculates the number of steps it takes a random walker on average to travel from region A to region B. This measure has been used to show that biological brain networks typically display a balance between diffusion efficiency and global efficiency (sensitive to shortest path length; Goñi et al. 2013). Another measure relying on the diffusion model of information transfer is communicability (Estrada and Hatano 2008), which takes into consideration all possible walks from region A to region B. Walks with less edges, n, are weighted much higher than those with more, with walks weighted by the factor 1/n!. Communicability is described as reflecting the capacity for a network to transfer information in parallel assuming a diffusion model of information transfer (Fornito et al. 2016a, b). This measure has been useful in distinguishing patients from controls, including stroke (Crofts et al. 2011) and multiple sclerosis (Li et al. 2013).

Considering past success with the hybrid measures combining the diffusion and shortest path routing models of information transfer (Goñi et al. 2014), this research will apply the exclusively diffusion-based measures of mean first passage time and communicability as well as the shortest path routing measure of shortest path length to structural connectivity, and the results will be used to predict functional connectivity to determine to what extent these measures are able to account for variance in functional connectivity. Crucially, this research will extend past research that has examined the ability of multiple graph theory communication measures to predict functional connectivity from structural connectivity (Betzel et al. 2022; Vázquez-Rodríguez et al. 2019; Zamani Esfahlani et al. 2022) and benchmarking the ability for different communication measures to predict functional connectivity (Seguin et al. 2018, 2020, 2022), by directly comparing two commonly used models (diffusion and shortest path routing) using multiple linear regression analyses, partial least squares regression, and principal components analysis to determine which graph theory model is most important in this relationship. Research suggests that brain networks (at both the macroscale and microscale) typically demonstrate a balance of diffusion efficiency and global efficiency (Goñi et al. 2013), while also suggesting that this balance may lean more towards dominance of diffusion efficiency in human brains, in which case we expect that the diffusion measures examined here will be more relevant than shortest path length to the structure–function relationship in the brain.

Methods

Dataset

MRI data for 998 subjects from the Human Connectome Project (HCP; Van Essen et al. 2013) were used including diffusion tensor imaging (DTI) and resting state functional magnetic resonance imaging (rsfMRI). We used the preprocessed version of the rsfMRI data. This data has been preprocessed using FSL FIX (Salimi-Khorshidi et al. 2014). The DTI data used was also preprocessed. The HCP pipelines for preprocessing are described by Glasser et al. (2013). The Automated Anatomical Labelling 90 region atlas (AAL; Tzourio-Mazoyer et al. 2002) was used as well as the Brainnetome 246 region atlas (Fan et al. 2016). Activation at each rsfMRI acquisition was used to calculate the mean activation for the atlas regions. The rsfMRI sessions were standardized using a z-score for the regions for each session separately. The activation in these regions was then submitted to bandpass filtering (separately for each session) allowing only frequencies within 0.01 Hz and 0.1 Hz (see Hallquist et al. 2013).

Connectivity measures

To calculate the functional connectivity measures for each combination of regions, we calculated the Pearson correlation coefficient using all of the 4800 acquisitions. To calculate the structural connectivity measures, DSI Studio (http://dsi-studio.labsolver.org) was used with quantitative anisotropy (Yeh et al. 2013) as the termination index to calculate the streamline count. Generalized q-sampling (Yeh et al. 2010) was used, and tracking used 1 million fibers, 75° maximum angular deviation, and a 20 mm minimum and 500 mm maximum fiber length. To calculate the structural connectivity matrix containing the number of streamlines for each cell, a whole brain seed was used. The connectivity values for structural and functional connectivity were averaged using the mean for all subjects. The weighted structural connectivity density (the sum of connection weights divided by the total possible connection weights, where each weight has a maximum of 1.0) was 0.016 for the AAL atlas and 0.004 for the Brainnetome atlas.

Graph theory structural connectivity measures of mean first passage time (Wang and Pei 2008) and communicability (Estrada and Hatano 2008) were calculated as diffusion model measures (also discussed in Fornito et al. 2016a, b). Mean first passage time was calculated as

$$MFPT_{ij} = { }\mathop \sum \limits_{n = 1}^{N} \left[ {\left( {I - U_{j} } \right)^{ - 1} } \right]_{ni},$$
(1)

where I is the identity matrix, i is the starting node, j is the destination node, and N is the number of regions in the network, and

$$U_{j} = { }WS^{ - 1},$$
(2)

but with the jth row set to zero so that a random walker is unable to enter j. W is the weighted structural connectivity adjacency matrix, and

$$S = { }\left[ {\begin{array}{*{20}c} {s_{1} } && 0 && 0 \\ 0 && \ddots && \vdots \\ 0 && \ldots && {s_{N} } \\ \end{array} } \right],$$
(3)

with \({s}_{n}\) representing the strength (weighted number of connections) of region n. Communicability was calculated as

$$Com_{ij} = { }\left[ {e^{{S^{ - 1/2} WS^{ - 1/2} }} } \right]_{{ij^{{\prime}} }},$$
(4)

where \({e}^{{S}^{-1/2}W{S}^{-1/2}}\) is the matrix exponential of \({S}^{-1/2}W{S}^{-1/2}\), the reduced structural connectivity adjacency matrix (see Crofts et al. 2011). Long walks are weighted more weakly (by a factor of n! where n is the number of steps) in this formula as the series expansion equates to

$$Com_{ij} = { }\mathop \sum \limits_{n = 0}^{\infty } \frac{{\left[ {(S^{ - 1/2} WS^{ - 1/2} )^{n} } \right]_{ij} }}{n!}.$$
(5)

Shortest path length was calculated as the shortest path routing model measure using the NetworkX python library (Hagberg et al. 2008; function shortest_path_length, using the Dijkstra algorithm described by Dijkstra 1959, and given the inverse value of structural connectivity edges so that the edges represent resistance in the network).

Permutation testing was performed using 500 null models created from the structural connectivity following the generalized Maslov–Sneppen (Maslov and Sneppen 2002) rewiring algorithm developed by Rubinov and Sporns (2011) for use with weighted networks to control for node strength and degree while randomizing the connection weights. Permutation p-values were calculated as the number of null models resulting in the same or better variance accounted for in the models as a proportion of the total number of null models.

Results

AAL

Linear regression models

Linear regression models were computed for each log-transformed independent variable (mean first passage time, communicability, and shortest path length) with functional connectivity as the dependent variable using the lm function from the lme4 library (Bates et al. 2015) in R (R Core Team 2018). Mean first passage time demonstrated an inverse relationship with functional connectivity, whereby a high mean first passage time was associated with poorer functional connectivity, as expected, R(4003) = −0.376, p < 0.001 (null model permutation p = 0.044; see Fig. 1A). Communicability demonstrated a positive relationship with functional connectivity, whereby high communicability was associated with better functional connectivity, as expected, R(4003) = 0.316, p < 0.001 (null model permutation p = 0.014; see Fig. 1B). Shortest path length demonstrated an inverse relationship with functional connectivity, whereby a high shortest path length was associated with poorer functional connectivity, as expected, R(4003) = −0.379, p < 0.001 (null model permutation p < 0.002; see Fig. 1C). The magnitude of the structure–function relationship for each of these measures was relatively comparable, so additional multiple linear regression approaches were also taken to determine which measures are primarily driving the relationship between structural and functional connectivity.

Fig. 1
figure 1

AAL linear regression models with functional connectivity as the independent variable and dependent variables of: A mean first passage time (MFPT; log transformed), Radj = –0.376; B communicability (COM; log transformed), Radj = 0.316; C and shortest path length (SPL; log transformed), Radj = –0.379

Multiple linear regression models

Multiple linear regression models were then investigated starting with mean first passage time and shortest path length included in the model as independent variables, with functional connectivity as the dependent variable. These models were again calculated in R using the lm function from the lme4 library, as well as spcor from the ppcor library to calculate the semi-partial correlation (Kim 2015) and vif from the car library to calculate the variance inflation factor (Fox and Weisberg 2019). Prior to this, the correlation matrix of these measures was examined, which indicated that there were no extreme correlations (e.g., greater than 0.9) between the independent variables, with the highest value being R = 0.841 between mean first passage time and shortest path length (see Table 1). This correlation is theoretically interesting though, as it indicates there is a high level of redundancy between mean first passage time and shortest path length, suggesting that information in the diffusion model naturally follows paths that are similarly efficient when compared to the shortest path. This potential for decentralized information transfer strategies to take advantage of the shortest paths in the network has been noted in past research (Avena-Koenigsberger et al. 2017; Goñi et al. 2014; Seguin et al. 2018; Vézquez-Rodríguez et al. 2020). As seen in Table 2, mean first passage time and shortest path length both produced significant effects, with the shortest path length having a slightly larger semi-partial correlation (see Fig. 2A for predicted vs. empirical functional connectivity). However, when adding communicability to the model as seen in Table 3, the overall variance accounted for increased, and the semi-partial correlation of shortest path length was greatly reduced (though still significant), while the diffusion-based measures of mean first passage time and communicability had a much larger combined magnitude of semi-partial correlation. This model accounted for more variance than any of the measures independently (R2 = 0.165; see Fig. 2B for predicted vs. empirical functional connectivity). It should be noted that the variance inflation factor (VIF) for the shortest path length in model 2 was greater than 5 (VIF = 5.567), indicating that multicollinearity between the independent variables may have affected the variance of the shortest path length coefficient. To address this, we also examined these variables using partial least squares regression, which is robust against multicollinearity.

Table 1 AAL independent variable correlation matrix
Table 2 AAL multiple linear model 1, with dependent variable functional connectivity. R2 = 0.157, Radj2 = 0.157 (null model permutation p = 0.012)
Fig. 2
figure 2

AAL multiple linear regression models with empirical functional connectivity as a function of the predicted functional connectivity, for A model 1 and B model 2

Table 3 AAL multiple linear model 2, with dependent variable functional connectivity. R2 = 0.165, Radj2 = 0.165 (null model permutation p = 0.016)

Partial least squares regression

A partial least squares regression analysis was conducted with a dependent variable of functional connectivity and independent variables of mean first passage time, communicability, and shortest path length, using the plsr function from the pls library in R (Mevik and Wehrens 2007). The independent variables were log transformed and standardized to have a mean of 0 and a standard deviation of 1. To validate the model and check for overfitting a k-fold cross-validation scheme was used with 10 folds. The number of components to include was decided when additional components no longer substantially decreased the root mean squared error of prediction. With 2 components included, the root mean squared error of prediction reached its minimum of 0.161, so 2 components were used. Cross-validation determined that the model was able to account for 16.3% (R2 = 0.163, Radj2 = 0.162) of the variance in functional connectivity of novel validation samples, while the model accounted for 16.5% (R2 = 0.165, Radj2 = 0.164) of the variance in functional connectivity when predicting the data for all connections (null model permutation p = 0.012). These cross-validation results indicate that over-fitting is minimal. Finally, by investigating the coefficients for each of the independent variables, the pattern of results seen in the multiple linear regression model can be confirmed. For diffusion measures, mean first passage time had a coefficient of −0.037, communicability had a coefficient of 0.018, and shortest path length had a coefficient of −0.025. Note that the negative relationship between structure and function for mean first passage time and shortest path length was expected, as a higher value for these structural measures indicates weaker connectivity, while a positive relationship was expected for communicability as higher values indicate stronger connectivity. These coefficients support what was observed for the multiple linear regression, indicating that the effects of the diffusion model-based measures were greater in combined magnitude than that of the shortest path length.

Principal components analysis

The principal components analysis for functional connectivity, mean first passage time, communicability, and shortest path length shown in Fig. 3 demonstrates the unique component space occupied by each measure. This analysis was conducted using the prcomp function of the core stats library in R. To aid the interpretation of the principal component loadings of each variable, mean first passage time and shortest path length were multiplied by −1 so that larger values indicate better connectivity for all measures. In particular, Principal Component 1 seems to be sensitive to the variance in common between functional connectivity and the graph theory measures, as these all have loadings in the same direction (Fig. 3A and Table 4). Conversely, functional connectivity loads strongly onto Principal Component 2, while the graph theory measures load weakly and in the opposite direction, suggesting that this component identifies variance in functional connectivity that is not well accounted for by the graph theory measures (Fig. 3A and Table 4). Finally, functional connectivity and shortest path length load very weakly onto Principal Component 3, while the loadings for mean first passage time and communicability are strong and in opposite directions, suggesting that this component speaks to the unique position in the component space of these diffusion model measures (Fig. 3B and Table 4).

Fig. 3
figure 3

AAL principal components analysis with data points and variable loadings as vectors. Variables considered were functional connectivity (FC), mean first passage time (MFPT; log transformed and multiplied by −1 so that more positive values indicate better connectivity), communicability (COM; log transformed), and shortest path length (SPL; log transformed and multiplied by −1 as with MFPT)

Table 4 AAL principal components analysis loadings for all 3 principal components

Brainnetome

Linear regression models

Using the Brainnetome atlas, there was again an inverse relationship between mean first passage time and functional connectivity, a positive relationship between communicability and functional connectivity, and an inverse relationship between shortest path length and functional connectivity. Again, the magnitude of the structure–function relationship for each of these measures was relatively comparable (see Table 5 and Fig. 4).

Table 5 Brainnetome linear regression analyses
Fig. 4
figure 4

Brainnetome linear regression models with functional connectivity as the independent variable and dependent variables of: A mean first passage time (MFPT; log transformed), Radj = –0.231; B communicability (COM; log transformed), Radj = 0.270; C and shortest path length (SPL; log transformed), Radj = –0.250. With outlier clusters removed from A by excluding cases with log(MFPT) > 8 and C by excluding cases with log(SPL) > 5.33 the correlation remains significant, with R = –0.207 for log(MFPT) and R = –0.224 for log(SPL). These outlier clusters are due to more isolated regions of the atlas that take more steps to reach than most other regions

Multiple linear regression models

Multiple linear regression models were also investigated for the Brainnetome atlas, demonstrating that the semi-partial correlation for the shortest path length was much less than the combined magnitude for the diffusion-based measures of mean first passage time and communicability (see Tables 6, 7 and 8 and Fig. 5).

Table 6 Brainnetome independent variable correlation matrix
Table 7 Brainnetome multiple linear model 1, with dependent variable functional connectivity. R2 = 0.067, Radj2 = 0.067 (null model permutation p = 0.002)
Table 8 Brainnetome multiple linear model 2, with dependent variable functional connectivity. R2 = 0.090, Radj2 = 0.090 (null model permutation p < 0.002)
Fig. 5
figure 5

Brainnetome multiple linear regression models with empirical functional connectivity as a function of the predicted functional connectivity, for A model 1 and B model 2

Partial least squares regression

The partial least squares regression again demonstrated that the combined magnitude of the coefficients for the diffusion model measures of mean first passage time and communicability were much greater than the shortest path routing model measure of shortest path length (see Table 9).

Table 9 Brainnetome partial least squares (PLS) regression

Principal components analysis

The principal components analysis was again conducted, but this time for the Brainnetome atlas. This analysis replicated the pattern of results seen for the AAL atlas (see Fig. 6 and Table 10).

Fig. 6
figure 6

Brainnetome principal components analysis with data points and variable loadings as vectors. Variables considered were functional connectivity (FC), mean first passage time (MFPT; log transformed and multiplied by −1 so that more positive values indicate better connectivity), communicability (COM; log transformed), and shortest path length (SPL; log transformed and multiplied by −1 as with MFPT)

Table 10 Brainnetome principal components analysis loadings for all 3 principal components

Supplementary analyses

Individual-level analyses

Individual-level analyses were performed that mirrored the mean-level analyses, demonstrating that the variance accounted for was reduced but significant in all cases, with the same pattern of results as in the mean-level analyses (see Supplementary Tables 1 through 10). These analyses utilized the upper triangle of each connectivity matrix, reshaped to a single dimensional array for each individual and then concatenated across all individuals into a single large array for each of the mean first passage time, communicability, shortest path length, and functional connectivity measures. These arrays were used as variables in the statistical analyses in the same way as for the mean-level data.

Split-half analyses

Half of the data was used as a training set to train the models and the other half was used as a test set to test the models on novel data to demonstrate the predictability of out-of-sample data. In all cases, the test sets were predicted with comparable accuracy (see Supplementary Tables 11 through 18).

PCA null models

Null models were used to produce a null distribution for loadings of each of the 4 variables on the 3 Principal Components, and demonstrated that there was a significant distance between the null and empirical PC loadings (see Supplementary Tables 19 and 20).

Discussion

This investigation and comparison of graph theory structural connectivity measures based on two different theories of how information passes from one region to another in the brain has highlighted the importance of the diffusion model relative to the more straightforward but less biologically plausible shortest path routing model. Diffusion measures of mean first passage time and communicability as well as the shortest path routing measure of shortest path length were calculated from the brain structural connectivity, and these measures were related to functional connectivity. In isolation, each of these measures were comparable in the level to which they were related to functional connectivity. When analysed as hybrid models with each of these measures included, more variance was accounted for than any measure on its own, supporting past research suggesting that a hybrid/balance of both the diffusion and shortest path routing models is appropriate for describing the structure–function relationship (Goñi et al. 2013, 2014). However, when analysed together in multiple linear regression, partial least squares regression, and principal components analysis, it was clear that the diffusion model and the ideas it can express capture more of the variance in functional connectivity than shortest path routing, suggesting that the diffusion model may be closer to describing how information travels from one region to another in the structural connectivity network to produce the observed patterns of functional connectivity. These findings suggest that shortest path routing may be somewhat redundant when diffusion models are able to travel along similarly efficient routes in the brain and that diffusion models are able to additionally tap into aspects of functional connectivity that are not considered by shortest path routing. These results are not surprising, given that diffusion models are more biologically plausible than shortest path routing as there is no evidence to suggest that regions have global network knowledge about what path would be the shortest (Avena-Koenigsberger et al. 2019; Seguin et al. 2018, 2022; Zamani Esfahlani et al. 2022). Past research has demonstrated that both diffusion models and shortest path routing models are important frameworks for understanding the architecture of brain networks (Goñi et al. 2013, 2014), but this work has taken an important next step in distinguishing the greater relative ability for the diffusion measures to accurately predict function from the underlying structural connectivity.

Limitations and future directions

While diffusion and shortest path routing models are the most commonly discussed in network neuroscience, other models have been suggested that may add a unique perspective on how information travels in the brain. One such proposed theory for how information may transfer is greedy navigation, in which the Euclidean (three dimensional) or geodesic (two-dimensional flattened surface of cortex) distance between regions is used to travel through the network to whichever region is closest to the target region. This model has been investigated in simulated networks, which have shown that greedy navigation is able to successfully send information between regions without getting stuck (i.e., ending up at a node that is closer than all neighbours to the destination but unconnected to the destination) when there is a balance between clustered connections of proximal nodes and a power-law like distribution whereby clusters are connected by a small number of highly connected hub nodes (Boguñá et al. 2009). The pattern of navigation under these conditions is such that the path tends to travel to a nearby hub node, travel a long distance to another hub node, then travel to a low-degree node in a cluster close to the destination. This pattern has also been demonstrated in the macaque brain network (Harriger et al. 2012) and in C. elegans (Towlson et al. 2013). Research using this theory in the human brain has demonstrated the prediction of functional connectivity from structural connectivity (Seguin et al. 2018, 2020). Future research should implement a model of greedy navigation to replicate that the path length of a greedy navigator predicts functional connectivity, and additionally investigate whether the pattern of paths resembles that seen in simulated models and animal models.

Internet and computer analogies have also been proposed for application to brain networks, with encouraging results (Graham and Rockmore 2011; Mišić et al. 2014). Information flow in computer networks (such as the internet) is limited by bandwidth, which represents the amount of information that can travel in a certain amount of time (e.g., bits/s). Information can be transmitted via packet switching, which breaks messages into packets that are labeled with the intended destination. These packets traverse the network efficiently by utilizing connections at separate times and buffering in a node if a connection is full, with the downside that if the node buffer is full then the information is lost. The structural connectivity of the macaque has been used to simulate a message-switched variant of this model of information flow, showing that compared to other networks there was more message loss, lower throughput, but faster transit times (Mišić et al. 2014). This suggests that under the model assumptions speed would be optimized in the macaque brain over maintaining the integrity of each individual signal. Future research should work on also applying a packet switching simulation model using the human brain to uncover whether similar signatures of information flow are seen between human and macaque networks, and whether the time taken for information to travel between regions is predictive of functional connectivity measures.

As another example of further work to be done in the field, brain network analysis of the cat brain has found that regions with similar connectivity profiles (connect to the same or similar regions) tended to correspond with groups of regions performing similar tasks (Zamora-López et al. 2010). One measure of the similarity between connectivity profiles of regions is cosine similarity, which has been used to demonstrate that functionally similar regions in the macaque brain also had high cosine similarity, and that regions with high cosine similarity were also likely to be connected and located close together (Song et al. 2014). This measure has recently been investigated along with many other measures in the context of the human brain (Zamani Esfahlani et al. 2022), and more research should be done to investigate whether human brain regions with similar connectivity profiles are more likely to be connected, to what extent this pattern differs between functionally similar clusters and core hub regions, and to what extent the structural cosine similarity is able to predict the functional connectivity and functional cosine similarity.

Finally, a recent line of research has investigated the potential of hybrid models of information transfer that allow for regional heterogeneity in the navigation strategy applied. This is an extension of research that has shown differences in structure–function coupling when looking at unimodal vs. transmodal regions, and the development of these patterns with age (Baum et al. 2020). These models allow for different regions to apply unique strategies of information transfer, and have shown promising findings (Avena-Koenigsberger et al. 2019; Vázquez-Rodríguez et al. 2019; Zamani Esfahlani et al. 2022). This is a promising line of research that we expect will continue improving our understanding of how information is propagated throughout the structural connectivity network of the brain.

Conclusion

Although the models described here do not claim to be a fully accurate and comprehensive description of how information is transferred in the brain, especially considering that the nature of the structure–function relationship will inevitably vary as the scale of interest goes from the macroscale to the microscale, by examining the diffusion and shortest path routing models together, this research has demonstrated that diffusion models are better suited to describing the relationship between structural and functional connectivity at the macroscale. In the future, alternative models discussed here could be examined together, to contribute to a fuller picture of which aspects of these theoretical models are able to best approximate the ground truth of information transfer in the human brain network.