Skip to main content

Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity


The human brain undertakes highly sophisticated information processing facilitated by the interaction between its sub-regions. We present a novel method for interregional connectivity analysis, using multivariate extensions to the mutual information and transfer entropy. The method allows us to identify the underlying directed information structure between brain regions, and how that structure changes according to behavioral conditions. This method is distinguished in using asymmetric, multivariate, information-theoretical analysis, which captures not only directional and non-linear relationships, but also collective interactions. Importantly, the method is able to estimate multivariate information measures with only relatively little data. We demonstrate the method to analyze functional magnetic resonance imaging time series to establish the directed information structure between brain regions involved in a visuo-motor tracking task. Importantly, this results in a tiered structure, with known movement planning regions driving visual and motor control regions. Also, we examine the changes in this structure as the difficulty of the tracking task is increased. We find that task difficulty modulates the coupling strength between regions of a cortical network involved in movement planning and between motor cortex and the cerebellum which is involved in the fine-tuning of motor control. It is likely these methods will find utility in identifying interregional structure (and experimentally induced changes in this structure) in other cognitive tasks and data modalities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    The TE can be formed as T k,l(YX), where l past states of Y are considered as the information source \(y_n^{(l)}=\{ y_n, y_{n-1}, \ldots ,y_{n-l+1} \}\).

  2. 2.

    Note that the TE is equivalent to the directed transinformation (DTI) measure under certain parameter settings for the DTI (specifically M = 1 and N = 0) as per Hinrichs et al. (2006). Also, note that the TE is equivalent to the specific formulation of the DTI used in Saito and Harashima (1981) if the TE parameter l (discussed in footnote 1) is set equal to k.

  3. 3.

    Note the TE could be computed in the style of Kraskov et al. (2004) and Kraskov (2004) but with a direct conditional MI calculation as per Frenzel and Pompe (2007).

  4. 4.

    For example, fMRI regions contain potentially hundreds of voxels.

  5. 5.

    The following explanation assumes that only one previous state y n of the source is used in the computation of T k (YX); i.e. the parameter l = 1 (see Schreiber 2000).

  6. 6.

    We use z-tests in our experiments in Section 4 because we are comparing to very low α values after making Bonferroni corrections (see Section 2.2.2), which would render direct counting quite sensitive to statistical fluctuations.

  7. 7.

    We analyze the MI with separate matrices.

  8. 8.

    Note that testing against a binomial distribution is a conservative choice here, because it is less likely to get 6 significant results (5 with positive mean and 1 with negative mean) than to get 4 positive ones only. However, when tested over the group we consider the threshold according to the latter, which is truly binomial.

  9. 9.

    See Chapter 5 of the PhD thesis which can be downloaded from the German National Library:

  10. 10.

    We explain in Appendix B how the number of joint voxels v = 3 was selected to balance the ability to capture multivariate interactions with the limitations of the number of available observations. Also in that appendix, we explore the effect of altering v (including conducting univariate analysis with v = 1). Furthermore, the appendix explores the effect of altering the number of subset pairs S and surrogate measurements P.

  11. 11.

    As described in Appendix A.3, this simple test does not mean that the right SC → right Cerebellum link is a false positive; it simply does not add evidence against the false positive.

  12. 12.

    Our use of 140 time steps for each C and χ combination matches the length of fMRI time series analyzed in Section 4.

  13. 13.

    The minimum strengths required for detection here may seem large at first glance, however one must bear in mind the specific difficulties built into this data set: the non-linear coupling, the small number of samples, and relatively low influence of the Y on X (low χ/ϵ x ). Also our correction for a large number of comparisons is a factor here. This being said, correcting for multiple comparisons provides important protection against false positives so must be maintained when investigating all values of C here.

  14. 14.

    High memory in the source Z is required for the values z n (considered by the interregional TE) to contain some information about the previous values z n − 1 which had an indirect effect on x n + 1 via y n .

  15. 15.

    We expected that high memory in the destinations Y and X and in the common source Z would help preserve information in Y about the source Z which would be helpful to predicting X.

  16. 16.

    Note that the combination of undersampling and memory in our variables provides a smoothing-type effect on the data. As such, these results imply some level of robustness for the technique against temporal smoothing in the underlying data.

  17. 17.

    Similarly, only two interregional links were inferred at the group level by the interregional TE with univariate analysis (v = 1) and S = 3,000, P = 300.


  1. Bassett, D. S., & Bullmore, E. T. (2009). Human brain networks in health and disease. Current Opinion in Neurology, 22(4), 340–347.

    Article  PubMed  Google Scholar 

  2. Bettencourt, L. M. A., Stephens, G. J., Ham, M. I., & Gross, G. W. (2007). Functional structure of cortical neuronal networks grown in vitro. Physical Review E, 75(2), 021915.

    Article  Google Scholar 

  3. Bode, S., & Haynes, J. D. (2009). Decoding sequential stages of task preparation in the human brain. NeuroImage, 45(2), 606–613.

    Article  PubMed  Google Scholar 

  4. Bressler, S. L., Tang, W., Sylvester, C. M., Shulman, G. L., & Corbetta, M. (2008). Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. Journal of Neuroscience, 28(40), 10056–10061.

    CAS  Article  PubMed  Google Scholar 

  5. Büchel, C., & Friston, K. J. (1997). Modulation of connectivity in visual pathways by attention: cortical interactions evaluated with structural equation modelling and fMRI. Cerebral Cortex, 7(8), 768–778.

    Article  PubMed  Google Scholar 

  6. Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96–107.

    CAS  Article  PubMed  Google Scholar 

  7. Chai, B., Walther, D. B., Beck, D. M., & Fei-Fei, L. (2009). Exploring functional connectivity of the human brain using multivariate information analysis. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 270–278). NIPS Foundation.

  8. Chávez, M., Martinerie, J., & Le Van Quyen, M. (2003). Statistical assessment of nonlinear causality: Application to epileptic EEG signals. Journal of Neuroscience Methods, 124(2), 113–128.

    Article  PubMed  Google Scholar 

  9. Frenzel, S., & Pompe, B. (2007). Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20), 204101.

    Article  PubMed  Google Scholar 

  10. Friston, K. (2002). Beyond phrenology: What can neuroimaging tell us about distributed circuitry? Annual Review of Neuroscience, 25, 221–250.

    CAS  Article  PubMed  Google Scholar 

  11. Friston, K., Ashburner, J., Kiebel, S., Nichols, T., & Penny, W. (2006). Statistical parametric mapping: The analysis of functional brain images. Elsevier, London.

    Google Scholar 

  12. Friston, K. J. (1994). Functional and effective connectivity in neuroimaging: A synthesis. Human Brain Mapping, 2, 56–78.

    Article  Google Scholar 

  13. Friston, K. J., & Büchel, C. (2000). Attentional modulation of effective connectivity from V2 to V5/MT in humans. Proceedings of the National Academy of Sciences of the USA, 97(13), 7591–7596.

    CAS  Article  PubMed  Google Scholar 

  14. Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. Neuroimage, 19(4), 1273–1302.

    CAS  Article  PubMed  Google Scholar 

  15. Gong, P., & van Leeuwen, C. (2009). Distributed dynamical computation in neural circuits with propagating coherent activity patterns. PLoS Computational Biology, 5(12), e1000611.

    Article  Google Scholar 

  16. Grosse-Wentrup, M. (2008). Understanding brain connectivity patterns during motor imagery for brain-computer interfacing. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 561–568). Curran Associates, Inc.

  17. Handwerker, D. A., Ollinger, J. M., & D’Esposito, M. (2004). Variation of bold hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage, 21(4), 1639–1651.

    Article  PubMed  Google Scholar 

  18. Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523–534.

    CAS  Article  PubMed  Google Scholar 

  19. Haynes, J. D., Tregellas, J., & Rees, G. (2005). Attentional integration between anatomically distinct stimulus representations in early visual cortex. Proceedings of the National Academy of Sciences of the USA, 102(41), 14925–14930.

    CAS  Article  PubMed  Google Scholar 

  20. Hinrichs, H., Heinze, H. J., & Schoenfeld, M. A. (2006). Causal visual interactions as revealed by an information theoretic measure and fMRI. NeuroImage, 31(3), 1051–1060.

    CAS  Article  PubMed  Google Scholar 

  21. Honey, C. J., Kotter, R., Breakspear, M., & Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proceedings of the National Academy of Sciences, 104(24), 10240–10245.

    CAS  Article  Google Scholar 

  22. Horstmann, A. (2008). Sensorimotor integration in human eye-hand coordination: Neuronal correlates and characteristics of the system. Ph.D. thesis, Ruhr-Universität Bochum.

  23. Johansen-Berg, H., Behrens, T. E., Robson, M. D., Drobnjak, I., Rushworth, M. F., Brady, J. M., et al. (2004). Changes in connectivity profiles define functionally distinct regions in human medial frontal cortex. Proceedings of the National Academy of Sciences of the USA, 101(36), 13335–13340.

    CAS  Article  PubMed  Google Scholar 

  24. Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  25. Kraskov, A. (2004). Synchronization and interdependence measures and their applications to the electroencephalogram of epilepsy patients and clustering of data. In Publication series of the John von Neumann Institute for computing (Vol. 24). Ph.D. thesis, John von Neumann Institute for Computing, Jülich, Germany.

  26. Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.

    Article  Google Scholar 

  27. Liang, H., Ding, M., & Bressler, S. L. (2001). Temporal dynamics of information flow in the cerebral cortex. Neurocomputing, 38–40, 1429–1435.

    Article  Google Scholar 

  28. Lizier, J. T., & Prokopenko, M. (2010). Differentiating information transfer and causal effect. European Physical Journal B, 73(4), 605–615.

    CAS  Article  Google Scholar 

  29. Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2008). Local information transfer as a spatiotemporal filter for complex systems. Physical Review E, 77(2), 026110.

    Article  Google Scholar 

  30. Logothetis, N., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001). Neurophysiological investigation of the basis of the fMRI signal. Nature, 412, 150–157.

    CAS  Article  PubMed  Google Scholar 

  31. Lunenburger, L., Kleiser, R., Stuphorn, V., Miller, L. E., & Hoffmann, K. P. (2001). A possible role of the superior colliculus in eye-hand coordination. Progress in Brain Research, 134, 109–125. 0079-6123 (Print) 0079-6123 (Linking) Journal Article Research Support, Non-U.S. Gov’t Review.

  32. Lungarella, M., Pegors, T., Bulwinkle, D., & Sporns, O. (2005). Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics, 3(3), 243–262.

    Article  PubMed  Google Scholar 

  33. MacKay, D. J. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.

    Google Scholar 

  34. Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.

    Article  PubMed  Google Scholar 

  35. Penhune, V. B., & Doyon, J. (2005). Cerebellum and m1 interaction during early learning of timed motor sequences. Neuroimage, 26(3), 801–812.

    CAS  Article  PubMed  Google Scholar 

  36. Ramsey, J., Hanson, S., Hanson, C., Halchenko, Y., Poldrack, R., & Glymour, C. (2010). Six problems for causal inference from fMRI. NeuroImage, 49(2), 1545–1558.

    CAS  Article  PubMed  Google Scholar 

  37. Rubinov, M., Knock, S. A., Stam, C. J., Micheloyannis, S., Harris, A. W. F., Williams, L. M., et al. (2009). Small-world properties of nonlinear brain activity in schizophrenia. Human Brain Mapping, 30, 403–416.

    Article  PubMed  Google Scholar 

  38. Saito, Y., & Harashima, H. (1981). Tracking of information within multichannel EEG record - causal analysis in EEG. In N. Yamaguchi & K. Fujisawa (Eds.), Recent advances in EEG and EMG data processing (pp. 133–146). Amsterdam: Elsevier/North Holland Biomedical Press.

    Google Scholar 

  39. Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464.

    CAS  Article  PubMed  Google Scholar 

  40. Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J. D. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11(5), 543–545.

    CAS  Article  PubMed  Google Scholar 

  41. Tanaka, Y., Fujimura, N., Tsuji, T., Maruishi, M., Muranaka, H., & Kasai, T. (2009). Functional interactions between the cerebellum and the premotor cortex for error correction during the slow rate force production task: An fmri study. Experimental Brain Research, 193(1), 143–150.

    Article  Google Scholar 

  42. Tung, T. Q., Ryu, T., Lee, K. H., & Lee, D. (2007). Inferring gene regulatory networks from microarray time series data using transfer entropy. In P. Kokol, V. Podgorelec, D. Mičetič-Turk, M. Zorman, & M. Verlič (Eds.), Proceedings of the twentieth IEEE international symposium on computer-based medical systems (CBMS ’07), Maribor, Slovenia (pp. 383–388). Los Alamitos: IEEE.

    Chapter  Google Scholar 

  43. Verdes, P. F. (2005). Assessing causality from multivariate time series. Physical Review E, 72(2), 026222–026229.

    CAS  Article  Google Scholar 

Download references


JL and JH thank Thorsten Kahnt for discussions on the statistical analysis. JL thanks Mikail Rubinov for helpful suggestions. JL thanks the Australian Research Council Complex Open Systems Research Network (COSNet) for a travel grant that partially supported this work. JDH thanks the Max Planck Society, the Bernstein Computational Neuroscience Program of the German Federal Ministry of Education and Research (BMBF Grant 01GQ0411) and the Excellence Initiative of the German Federal Ministry of Education and Research (DFG Grant GSC86/1-2009). MP is grateful for a 2009 Research Grant from The Max Planck Institute for Mathematics in the Sciences (Leipzig, Germany) on Information-driven Self-Organization and Complexity Measures.

Author contributions: J.-D.H., J.H. and A.H. conceived the fMRI experiment. A.H. performed the fMRI experimental work. J.H. and A.H. pre-processed the data. J.L. and M.P. conceived the information-theoretical analysis. J.L. performed the information-theoretical analysis. J.H. performed the statistical analysis. J.L. and J.H. wrote the paper.

Author information



Corresponding author

Correspondence to Joseph T. Lizier.

Additional information

First two authors contributed equally to this work.

Action Editor: Jonathan David Victor


Appendix A: Application to numerical data sets

In order to explore the properties of the technique presented in Section 2, we apply it to a number of artificial data sets in this section. In particular, we demonstrate: the efficacy of the technique when applied to small data sets with a small amount of nonlinear, collective coupling; how to use the statistical significance to guide selection of the number of joint voxels under analysis v; some robustness to undersampling, and to inference of directed links where only a logical overlap exists.

A.1 Collective, non-linear interregional coupling

The primary test of the technique involves two multivariate “regions” of 10 variables, \(\mathbf{X}=\left\{ X_1,\ldots,X_{10}\right\} \) and \(\mathbf{Y}=\left\{ Y_1,\ldots,Y_{10}\right\}\), in which the variables of Y influence X in a collective, non-linear fashion under a range of coupling strengths. The coupling strength is described by the number of variables C in X which are influenced by those of Y, and the level χ to which those elements in X are determined from Y. For a given C and χ, the value x i,n + 1 of variable X i at time step \(n+1 = \left\{ 2 \ldots 140 \right\}\) is determined as:

$$ x_{i,n+1} = \Bigg\{ \begin{array}{cccc} \epsilon_x x_{i,n} + & \chi y_{j, n} y_{l, n} + & (1 - \epsilon_x - \chi) g & \textrm{ for } i \leq C \\ \epsilon_x x_{i,n} + & & (1 - \epsilon_x) g & \textrm{ for } i > C \end{array} \label{eq:oneWayCoupling}, $$

where g is a zero mean white noise process with σ = 1, and j and l are indices of variables Y j and Y l in Y randomly selected to provide a joint input to X i for the duration of the time series. The initial values x i,n = 1 are determined by the zero mean white noise process g, and we have:

$$ y_{j,n+1} = \epsilon_y y_{j,n} + (1 - \epsilon_y) g \label{eq:oneWayCouplingY}. $$

Our test data sets thus involve one-way coupling YX, where the coupling is determined in a non-linear manner from multiple values within the source region. For our first experiment here, we generate time-series sets X and Y for all combinations of \(C=\left\{ 1,\ldots,10\right\}\) and \(\chi=\left\{ 0.00,0.05,\ldots,0.30\right\}\) with ϵ x  = 0.7 and ϵ y  = 0.0. With the additional factors of the relatively low influence of Y on X (low χ/ϵ x ) and a small number of observations,Footnote 12 this has been specifically designed to be a particularly difficult data set from which to correctly detect a directed interregional link.

We measured the interregional TEs T k,v(XY ) and T k,v(YX ) and interregional MI I v (X; Y ) with v = 2 and k = 1, using Kraskov-estimators with a window size of the two closest observations. We then computed their statistical significance using the techniques we presented in Section 2 with S = 2025 and P = 100. We correct for multiple comparisons across the many combinations of C and χ in each direction.

Figure 7(a) demonstrates that the interregional TE detects the interregional link YX fairly consistently for the data sets with larger numbers of coupled variables C and coupling strengths χ. No false positives are returned in the situation of zero coupling (χ = 0.00) or in the reverse direction XY (not shown).

Fig. 7

p-values from z-tests against surrogates for the YX relationship in Appendix A.1 for each data set (C, χ) for: (a) the interregional TE Tk,v(YX ); and (b) the interregional MI I v (X(n + 1); Y(n) ) with time difference of one step from Y to X. Using correction for multiple comparisons in both YX and XY, the cutoff for statistical significance for a desired α = 0.05 becomes \(\alpha_c=3.57 \times 10^{-4}\). “X” marks data sets for which the relevant measure infers a statistically significant link YX. The TE infers a statistically significant link for 24 of the largest combinations of (C, χ), while the MI with time difference does so for 9 combinations. The actual minimum p-values in each case are several orders of magnitude smaller than 10 − 6, corresponding to z-scores of 11.1 for the TE and 5.27 for the MI with time difference

The interregional MI does not detect the directed link at any coupling strength (results not shown). This is because the simultaneous values of X(n) and Y(n) at a given time point n are unrelated (Y(n) influences X(n + 1), but has no relationship to X(n) in this data set). Importantly, it does not produce any false positives here. We also measured the interregional MI with a 1-step time difference between X and Y; this breaks the symmetry of the measure, and makes detection of the influence of Y(n) on X(n + 1) possible. As shown in Fig. 7(b), the statistical significance of this measure detects the influence YX at some of the strongest couplings, and returns no false positives for XY. The measure is not as effective as the TE however: it correctly infers the influence for a smaller number of data sets (C, χ), and with generally larger and less consistent p-values for these larger (C, χ). This is perhaps because it ignores the mixing of the coupling from Y with the influence of the past of each X i via the ϵ x x i,n terms in Eq. (12). The TE (which accounts for the past of the destination) is more sensitive to this mixing.

The success of our statistical inference with the interregional TE in this particularly difficult example is an important result. This type of non-linear coupling cannot be detected by linear methods (e.g. Granger causality), nor with the non-directional MI. Even when the MI has a directionality induced in it, it is not as sensitive as the TE here. Similarly, we verified that single-variate analysis (with v = 1 voxel) was much less effective: the TE could only detect the regional link at the very largest (C, χ) combination (see Appendix A.2). Finally, we note that a minimum coupling strength is required before detection by our method to ensure statistical significance, which is an important property to protect against false positives.Footnote 13

A.2 Effect of multivariate analysis

Continuing with the same time series sets X and Y for various (C, χ) from Appendix A.1, we investigate the effect of altering the number of joint variables v included in the measure T k,v(YX ).

Figure 8(a) shows that inference of the directed link YX at the larger (C, χ) combinations is stable for v between 2 and 6, with the correct inference made for roughly the same number of (C, χ) data sets here. As a more focused example, Fig. 8(b) shows the relevant p-values versus v for the particular data set (C = 8, χ = 0.25), demonstrating that inference of the directed link could be made here for v between 2 and 7.

Fig. 8

Analysis of the effect of increasing the number of joint variables v when using the interregional TE, T k,v(YX ), for inference of the YX link in Appendix A.1. (a) the number of data sets (C, χ) that the link YX was successfully inferred for. (b) the p-values from z-tests against surrogates for one particular data set (C = 8, χ = 0.25). The horizontal line marks the cutoff for statistical significance (corrected for multiple comparisons) at \(\alpha_c=3.57 \times 10^{-4}\)

Certainly, one would like to maximize number of joint variables v when using T k,v(YX ), since this provides more scope for capturing multi-variate interactions. Also, increasing v even above the number of variables involved in interactions in the data can be advantageous. This is because it raises the proportion of sample sets R x,i of v variables in the source which include a full set of variables that interact to produce an outcome in the sample destination set R y,j. For example, increasing v above 2 here raises the proportion of our S sample sets which include both source variables Y j and Y l that causally effect one of the selected destination variables X i (see Eq. (12)).

However, increasing v brings us closer to the limits imposed by the number of observations available to us. This is the case for whichever estimator we choose to use. For example, the Kraskov estimators in use here are known to have their error in measurement increase with the number of joint variables considered for a fixed number of observations (see Fig. 15 in Kraskov et al. 2004). Similarly, spurious relationships can appear more easily in the low-sample limit, making the distribution of measures on the surrogate data sets more spread out, and therefore raising the relevant p-value.

These plots demonstrate that the number of joint variables v can only be increased to a certain level before being limited by the number of available samples. The p-values with respect to v explicitly show where these limits are.

A.3 Overlapping data sets without direct relationships

We then investigate a number of instances where interregional data sets logically overlap in some way without having a direct relationship. These instances are known to present difficulty for inference techniques, which may infer a directed interregional link when only an indirect relationship is present (as described in “problem 4” in Ramsey et al. (2010)). We explore the conditions under which our technique may be susceptible to making these inferences.

First, we explore the pathway structure ZYX. We generate data sets where the individual relationships between the directed pairs ZY and YX are each described by Eqs. (12) and (13), with C = 10, ϵ x ,ϵ y  = 0.7 and variable (χ,ϵ z ).

The p-values from our analysis are displayed in Fig. 9. Of course, we find that the actual directed link ZY inference depends on the coupling strength (as per Appendix A.1). It does not seem to have a particular dependence on the self-connection or memory ϵ z in the source (not investigated in Appendix A.1).

Fig. 9

p-values from analysis of direct and indirect interregional relationships for the pathway structure ZYX in Appendix A.3. The analysis is performed using the interregional transfer entropy with v = 2. “X” marks data sets for which a statistically significant directed link is inferred. Using correction for multiple comparisons in all 6 possible directed relationships, the cutoff for statistical significance for a desired α = 0.05 becomes \(\alpha_c=1.49 \times 10^{-4}\). The actual minimum p-values in each case are several orders of magnitude smaller than 10 − 6, corresponding to z-scores of 14.6 and 6.02 respectively

Figure 9(b) shows that it is possible for our technique to infer a directed link ZX where the real underlying relationship ZYX is in fact an indirect pathway through Y. We find that inference of the indirect relationship is much less sensitive than for the direct relationship, occurring only where there is both high coupling χ and high memory ϵ z in the source.Footnote 14 Importantly, we found that p-values for ZX were always higher (i.e. weaker) than YX when both links were inferred.

If Y is not available, then inference of the indirect relationship may be desirable, since it still reveals structure in the available data. Where Y is available though, ideally only ZY and YX should be inferred. We suggest that extension of the complete transfer entropy (Lizier et al. 2008) to a similar interregional measure (and with similar statistical significance testing) could usefully address this issue. The complete TE conditions out the influence of other possible sources, e.g. \(T_k(Y \rightarrow X \mid Z) = I(Y;X' \mid X^{(k)},Z)\). Extending the measure should still infer YX (since Y adds information not contained in Z) but not ZX (since Z does not add any information not contained in Y). We leave extension of this measure and testing of the technique to future work.

Next, we explore the common cause structure with ZY and ZX but no direct relationship between Y and X. We generate data sets where the individual relationships between the directed pairs ZY and ZX are described by Eqs. (12) and (13) with C = 10: and ϵ x ,ϵ y  = 0.7 and variable (χ,ϵ z ) in Fig. 10(a); and alternately ϵ z  = 0.7 and variable (χ,ϵ x ,ϵ y ) in Fig. 10(b).

Fig. 10

p-values from analysis of the common cause structure: ZY and ZX in Appendix A.3. The analysis is performed using the interregional transfer entropy with v = 2 for the pair without a directed relationship YX. “X” marks data sets for which a statistically significant directed link is inferred. Using correction for multiple comparisons in all 6 possible directed relationships, the cutoff for statistical significance for a desired α = 0.05 becomes \(\alpha_c=1.49 \times 10^{-4}\). The actual minimum p-values in each case are several orders of magnitude smaller than 10 − 6, corresponding to z-scores of 6.30 in each case

Figure 10 shows that it is possible for our technique to infer a directed link YX (with similar results for XY of course) where Y and X are only related by a common cause. We find that the inference only occurs under high coupling χ and high memory ϵ x ,ϵ y in the destinations of the common cause (Fig. 10(b)), with a possible but less clear dependence on high memory ϵ z in the common cause (Fig. 10(a)).Footnote 15 Crucially though, this inference is much less sensitive than for the relevant direct cause from Z. (The ZY relationship for Fig. 10(a) is the same as for the pathway structure, see Fig. 9(a) for results on the direct cause to compare to Fig. 10(a)). As expected, the interregional MI revealed a very strong relationship between Y and X (not shown), e.g. inferring a relationship for all χ > 0 for the data sets in Fig. 10(a).

Similar to our argument regarding the pathway structure, if Z is not available then inference of YX and XY may be useful in revealing structure in the available data. When the common cause Z is available, this is undesirable though. In this case, we again suggest that extension of the complete transfer entropy to an interregional measure could be expected to eliminate inference of spurious relationships due to a common cause.

Without such an extension in place though, the fact that the false positive links here are much weaker than the relevant actual direct links suggests the use of comparisons amongst connected triplets. That is, where one finds ZY, YX and ZX, then:

  1. 1.

    if ZX is stronger than ZY or YX then it is unlikely that ZX is a pathway type false positive;

  2. 2.

    if YX is stronger than ZX then it is unlikely that YX is a common cause type false positive.

Such comparisons cannot definitively rule out the relevant false positive situation, but can add evidence against the presence of these types of false positives.

A.4 Undersampling

We also test the technique against data sets which have been undersampled from the raw underlying data. Using the same relationship YX defined in Eqs. (12) and (13), we then define \(\mathbf{X}^s = \{ X_1^s, \ldots, X_{10}^s \}\) where the constituent time series are undersampled by a factor of s as \(X_i^s = \{ x_{i,1}, x_{i,1+s}, x_{i,1+2s}, \ldots \}\). Y s is similarly defined, and the technique is then applied to the data sets X s and Y s. For comparability, we generate 140 samples in the undersampled data sets. We use C = 10, χ = 0.3, and ϵ x  = 0.7.

As shown in Fig. 11, for these parameter values we find that there is some robustness in correct inference of the interregional relationship Y sX s using the interregional TE up to an undersampling factor of s = 3. (Again, no significant link was inferred in the reverse direction X sY s). The correct relationship is detected more reliably with: higher source-destination coupling χ (not shown), a smaller undersampling factor s, and higher source memory ϵ y (see Fig. 11).Footnote 16 The undersampled source values including y j,n under consideration can have a causal effect on the destination x i,n + s by either: influencing x i,n + m, or influencing y j,n + m for 1 ≤ m < s; and in both cases consequently influencing x i,n + s. Higher χ and smaller s increase both effects, while ϵ y increases the latter. Higher memory in the destination ϵ x (with respect to noise g, given χ) should similarly increase the influence of the source under consideration and therefore the reliability of detection in the undersampled data.

Fig. 11

p-values from analysis of the undersampled data sets YsXs described in Appendix A.4, with respect to undersampling factor s and source memory ϵ y . The analysis is performed using the interregional transfer entropy with v = 2. “X” marks data sets (s,ϵ y ) for which a statistically significant directed link is inferred. Using correction for multiple comparisons for both directions for all (s,ϵ y ) sets, the cutoff for statistical significance for a desired α = 0.05 becomes \(\alpha_c=3.91 \times 10^{-4}\). The actual minimum p-values are several orders of magnitude smaller than 10 − 6, with the largest z-score at 15.2

Appendix B: Effect of multivariate analysis

We have reported in the main paper results from an analysis that considered interactions between sets of v = 3 voxels. We have compared the distribution of the calculated measure MI or TE from S = 3,000 samples against the mean of P = 300 surrogate measurements for each subset sample. The use of v = 3 was motivated by the trend of p-values versus v in the simulation in Section A.2 and Fig. 8(b), where we found that the p-values for our technique were minimized for v = 2 to 4. We confirmed the selection of v = 3 with S = 3,000, P = 300 by investigating the trend of p-values versus v for selected region pairs, finding that the p-values for our technique were typically minimized at the lower end of the v = 3 to 5 range (results not shown). This means that v = 3 balanced our desire to capture multivariate interactions with the need to remain within the limitations of the available number of samples.

At this point then, two important questions have to be raised. First, how strongly does the resulting structure depend on the choice of parameters. Second, what is the effect of including more than one voxel (v = 1) in the subsets, and thus taking into account multivariate interactions.

To answer these questions we have run several additional analyses with the following parameters. First, we computed the MI and TE measures as described in the main text again but for v = 1 and v = 5 leaving S = 3,000 and P = 300 unchanged. We also calculated the same MI and TE measures for the three sizes (v = [1,3,5]) but using only S = 1,000 samples and P = 100 permutations. Second, we calculated the MI, TE and MI modulation structures based on the average activations across all voxels in each ROI. Note that there is only one possible sample for this average analysis and thus S = 1. We use P = 300 and the standard TE (MI) significance tests in Section 2.2.1 coupled with the group level analysis described in Section 2.2.3. The average analysis is similar to standard functional connectivity studies in fMRI that look at correlations between ROI (Friston 1994). It is important to note that with the TE this average ROI analysis could not infer any interregional links at the group level.Footnote 17 We then compared the results of all 7 analyses, including the main analysis presented in the paper and the average ROI analysis, by calculating the correlation coefficients between the corresponding resulting MI, TE and MI-modulation structures. Importantly, we did not compare the number of significant subjects, but directly looked at the correlation coefficients between the mean values for MI, TE and MI-modulation within each subject. In Fig. 12 we show the similarity between the information structures obtained by the different analyses.

Fig. 12

Comparison of univariate and multivariate analyses. Top Correlation coefficient between observed information structure for different sizes of multivariate interaction: v = 1 (univariate), v = 3 and v = 5 and for the average signal within ROIs (avg). For every v the structure were calculated and compared for a larger (l) and a small (s) sample set (S = 3,000, P = 300, and S = 1,000, P = 100 respectively). Note that the avg analysis does not include any sampling and thus has only one sampling size and was performed using P = 300. Gray scale indicates average correlation coefficient between information structures resulting from the respective analyses according to the color bar on the right. Averages are taken across subjects and are based on Fisher z-transformed correlations and then transformed back. Dashed lines indicate the border between univariate and multivariate information analysis. Bottom In order to test whether the multivariate measures are more similar to each other than they are to univariate measures, we compared the z-transformed correlation coefficients of all entries in the top panels by means of a paired t-test across subjects (n = 8). We thus tested whether some combinations of structures are more similar, than other combinations. Gray scale indicates the p-values of this test. See colorbar on the right. P-values thresholded at α = 0.05/N, where N = 210 is the number of t-tests made. White entries are above α and thus not significant. The comparisons have been ordered to highlight differences between correlations that include the average ROI analysis (avg), correlations that include univariate analyses (m–u), but not avg, and, finally, correlations that include multivariate analyses only (m–m)

The results can be summarized by two main statements. First, although univariate structures are correlated to the multivariate structures, the multivariate structures are correlated more strongly amongst each other than they are to the univariate structures. This indicates that the multivariate analysis captures some structure that is not present in the univariate analysis. Second, these multivariate interactions are captured by 3-voxel as well as 5-voxel interactions in a very similar way. Hence, the multivariate nature of the interaction does not seem to include very high dimensional interactions that cannot be captured by 3-voxel interactions but are present in 5 voxel interactions.

In a second step, we compared the statistical results obtained from the seven analyses. To do this we counted the percentage p s of stable significant connections that remain unchanged between two types of analysis. We defined \(p_{s,ij}=\frac{2N_{s,ij}}{N_{i}+N_{j}}\), where N s,ij is the number of links that are significant in both analyses i and j, thus called stable, and N i and N j are the numbers of significant links in each analysis, respectively. If p s,ij = 1, all significant connections are the same in both analyses, and if p s,ij = 0, there is no significant connection that shows up in both analyses. p s,ij was calculated for all possible pairs of analyses i and j. We summarize the results by averaging over the three main types of analyses defined in Fig. 12. All results are given in percent as Mean ± SD. In the TE structure, there are no stable connections in the average over ROI analysis (avg in Fig. 12) compared to any other analysis. The percentage of stable significant connections is \(33 \text{\%} \pm 14 \text{\%}\) for comparisons of a univariate to multivariate analysis (u–m in Fig. 12) and \(80 \text{\%} \pm 5 \text{\%}\) for comparisons between multivariate analyses (m–m in Fig. 12). For the MI modulation, the corresponding percentages are: \(2 \text{\%} \pm 2 \text{\%}\) (avg), \(46 \text{\%} \pm 17 \text{\%}\) (u–m) and \(85 \text{\%} \pm 3 \text{\%}\) (m–m). Again, these numbers show that the multivariate information measures yield stable results and that the results are clearly different from the two kinds of univariate analysis we have compared them to.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lizier, J.T., Heinzle, J., Horstmann, A. et al. Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity. J Comput Neurosci 30, 85–107 (2011).

Download citation


  • fMRI
  • Visual cortex
  • Motor cortex
  • Movement planning
  • Information transfer
  • Transfer entropy
  • Information structure
  • Neural computation