Journal of Computational Neuroscience

, Volume 30, Issue 1, pp 85–107 | Cite as

Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity

  • Joseph T. Lizier
  • Jakob Heinzle
  • Annette Horstmann
  • John-Dylan Haynes
  • Mikhail Prokopenko


The human brain undertakes highly sophisticated information processing facilitated by the interaction between its sub-regions. We present a novel method for interregional connectivity analysis, using multivariate extensions to the mutual information and transfer entropy. The method allows us to identify the underlying directed information structure between brain regions, and how that structure changes according to behavioral conditions. This method is distinguished in using asymmetric, multivariate, information-theoretical analysis, which captures not only directional and non-linear relationships, but also collective interactions. Importantly, the method is able to estimate multivariate information measures with only relatively little data. We demonstrate the method to analyze functional magnetic resonance imaging time series to establish the directed information structure between brain regions involved in a visuo-motor tracking task. Importantly, this results in a tiered structure, with known movement planning regions driving visual and motor control regions. Also, we examine the changes in this structure as the difficulty of the tracking task is increased. We find that task difficulty modulates the coupling strength between regions of a cortical network involved in movement planning and between motor cortex and the cerebellum which is involved in the fine-tuning of motor control. It is likely these methods will find utility in identifying interregional structure (and experimentally induced changes in this structure) in other cognitive tasks and data modalities.


fMRI Visual cortex Motor cortex Movement planning Information transfer Transfer entropy Information structure Neural computation 

1 Introduction

Distributed computation in the brain is a complex process, involving interactions between many regions in order to achieve a particular task. Thus, in order to understand neural processing it is of particular importance to understand how different brain regions interact (Friston 2002). Our interest lies in establishing directed, interregional, functional information structure in the brain based on particular cognitive tasks. Several studies have considered similar goals. By observing common information in different brain regions at different times, a flow of information was inferred between different regions (Soon et al. 2008; Bode and Haynes 2009). Functional brain networks were inferred with a measure of information transfer in a model of the macaque cortex (Honey et al. 2007). Also, single-variate information transfer was studied at the regional level in fMRI measurements in the human visual cortex (Hinrichs et al. 2006). Undirected structure was studied with multivariate analysis at the regional level in fMRI measurements during visual processing (Chai et al. 2009). Other information-theoretical measures were used to study information transfer between brain areas of macaques (Liang et al. 2001). A different approach often used in fMRI is dynamic causal modeling, a model-based approach that compares a set of a priori defined neural models and tests how well they explain the experimental data (Friston et al. 2003).

In establishing such information structure, we are particularly interested in capturing nonlinear, directional, collective interactions between different subregions of one brain region facilitating outcomes in another region. Some of the above studies use linear methods only (e.g., Granger causality (Bressler et al. 2008) and linear approximations (Liang et al. 2001)), others do not capture whether an effect is due to a collective interaction from two or more driving elements (e.g. Hinrichs et al. 2006 examines only average values over all voxels in a region), or do not specifically examine the regional level of interactions. Also, some are not direct measures of information transfer, rather inferences of (undirected) common information (e.g. Chai et al. 2009). Others are reliant on the assumption of an underlying neural model (e.g. Friston et al. 2003).

Here, we present a new approach to detecting directed information structure between brain regions in cognitive tasks. Our approach examines the statistical significance of an ensemble of information transfer measurements between pairs of brain regions, each of which examines multiple variables (voxels for fMRI) in each brain region. This asymmetric, multivariate, information-theoretical analysis captures not only non-linear relationships, but also collective interactions arising from small groups of variables in each region, and the direction of these relationships. We present a detailed description of the mathematical foundations of the approach in Section 2, and a detailed study of the efficacy of the technique when applied to artificial data sets in Appendix A. Of particular importance is that this approach considers the collective interactions resulting from the combined activity of multiple driving elements. An example collective interaction is the Exclusive-OR (XOR) logical operation, where analyzing single inputs in isolation reveals nothing about the outcome, since it is determined collectively by the multiple input variables. From a practical perspective, multi-voxel analysis is necessary in order to detect complete spatiotemporal patterns of activity (Norman et al. 2006; Haynes and Rees 2006), which are known to be more informative about experimental conditions than single voxel activity. Furthermore, the information-theoretical basis of our approach distinguishes it in allowing direct measurement of asymmetric information transfer, and makes it model-free. Also, the particular combination of information theory and multi-voxel analysis here is novel, as is the ability of the technique to provide insights from relatively small data sets. Finally, note that the interregional information networks inferred here are functional, directed networks rather than structural networks (Friston 1994; Honey et al. 2007). Functional networks provide insight into the logical structure of the network and how this structure changes as a function of network activity (regardless of whether the underlying anatomy is known).

We are also interested in determining the changes in such directed information structure as the cognitive task changes. Changes in connectivity in a visual attention task have been illustrated with many techniques, e.g. dynamic causal modeling (Friston and Büchel 2000), (pairwise) Granger causality (Bressler et al. 2008) or psycho-physiological interaction (Büchel and Friston 1997). Visual integration across the visual field also changes the connectivity (Haynes et al. 2005). In addition, changes in connectivity pattern between brain regions have been associated with diseases (Bassett and Bullmore 2009; Rubinov et al. 2009). We describe a statistical analysis that can be used with our approach in order to infer differences in the directed interregional structure as the cognitive task changes.

The method was tested on fMRI data from a study where subjects undertake a visuo-motor tracking task under various task difficulties. In Section 3 we present the materials and methods for this fMRI data set. We demonstrate the method in Section 4 by applying it to study the connectivity during this tracking task. Our analysis of the fMRI data here yields a distinct, tiered, directed interaction structure which connects cortical movement planning regions (as information sources) to subcortical motor control regions (as information destinations). The correlations of the strength of the interregional relationships with task difficulty are then analyzed to determine which pairs of regions have: a. more in common, or b. a more pronounced directional relationship as the task difficulty increases. Most significantly, we identify an increased coupling between regions involved in movement planning and execution with task difficulty. The method is thus demonstrated to be useful for investigating interregional structure in fMRI studies, but it can certainly also be applied to other modalities such as electrophysiological multi-electrode array recordings.

2 Methods: interregional information structure analysis technique

In this section, we present our method for detecting directed information structure between brain regions during cognitive tasks. The natural domain for measuring information transfer is information theory (e.g. see MacKay 2003), which provides a model-free, non-linear platform for quantifying the information content of individual variables, variable collections or exchanges between variables.

We first describe the underlying information-theoretical measures in Section 2.1, then outline how to extend them to measure multivariate and interregional information transfer in Sections 2.1.3 and 2.1.4. Subsequently, we describe how to measure the statistical significance of the interregional information transfer in order to infer directed information links in Section 2.2. Finally, we describe how to detect changes in the directed structure as the cognitive task changes in Section 2.3.

2.1 Information-theoretical measures

The fundamental quantity in information theory (MacKay 2003) is the Shannon entropy, which represents the uncertainty associated with any measurement x of a random variable X (logarithms are in base 2, giving units in bits): H(X) = − ∑ xp(x) log2p(x). The joint entropy of two random variables X and Y is a generalization to quantify the uncertainty of their joint distribution: H(X,Y)  = − ∑ x,yp(x,y) log2p(x,y). The conditional entropy of X given Y is the average uncertainty that remains about x when y is known: H(X|Y) = − ∑ x,yp(x,y) log2p(x|y). While the standard definition of these values considers discrete variables, continuous variables can be considered using techniques such as kernel estimation (Kantz and Schreiber 1997).

In this work, we use two particular information-theoretical measures to study the information transfer between pairs of variables X and Y: the mutual information and the transfer entropy. Here, we describe them and the manner in which they are extended to provide a measure of transfer between two regions or sets of variables, which is directional, nonlinear and incorporates collective interactions.

2.1.1 Mutual information as a symmetric measure of common information

The mutual information (MI) between X and Y measures the average reduction in uncertainty about x (or entropy H of x) that results from knowing the value of y, or vice versa:
$$ I(X;Y) = \sum\limits_{x,y} p(x,y) \log_2{\frac{p(x,y)}{p(x)p(y)}} \label{eq:mi}. $$
In this way, I(X;Y) is a symmetric measure of the common information between X and Y. Though it has been previously used to measure directed information transfer from one variable to another, this is not valid: it is a static, symmetric measure of shared information. This is useful in its own right: it may be considered as a result, but not a direct measure, of information transfer. The MI has been used for example to study functional structure of cortical neural networks via analysis of neuronal spike trains (Bettencourt et al. 2007).

Here, the MI is estimated for continuous-valued variables using the techniques of Kraskov et al. (2004) and Kraskov (2004). This technique uses kernel estimation with enhancements designed specifically to handle data with a small number of observations. We use a window size of the two closest observations for the Kraskov-estimators here.

Also, note that the conditional mutual information between X and Y given Z is the MI between X and Y when Z is known:
$$\begin{array}{rll} I(X;Y|Z) &=& H(X|Z)-H(X|Y,Z)\\ &=& H(Y|Z)-H(Y|X,Z) \label{eq:condMi}. \end{array} $$

2.1.2 Transfer entropy as a directed measure of information transfer

The transfer entropy (TE) (Schreiber 2000) was introduced as a directed measure of dynamic information transfer from one variable to another. It quantifies the information provided by a source node about a destination’s next state that was not contained in the past of the destination. Specifically, the transfer entropy from a source node Y to a destination X is the average mutual information between the previous state of the source yn and the next state of the destination xn + 1 at time n + 1, conditioned on the past k states of the destination \(x_n^{(k)}=\left\{ x_n, x_{n-1}, \ldots , x_{n-k+1} \right\}\):1\(^{\textrm{,}}\)2
$$\begin{array}{rll} T_k(Y \rightarrow X) &=& \sum\limits_{x_{n+1},x^{(k)}_{n},y_{n}} p\left(x_{n+1},x^{(k)}_{n},y_{n}\right)\\ && \times \log_2{ \frac{ p\left(x_{n+1}|x^{(k)}_{n},y_{n}\right)}{p\left(x_{n+1}|x^{(k)}_{n}\right)}} \label{eq:te}. \end{array} $$
Equivalently, we can express this directly as a conditional MI, or as the sum of two MI quantities:
$$ T_k(Y \rightarrow X) = I\left(Y;X' | X^{(k)}\right) \label{eq:teConditional}, $$
$$ T_k(Y \rightarrow X) = I\left( Y; \left(X', X^{(k)}\right) \right) - I\left( Y; X^{(k)} \right) \label{eq:te2MIs}, $$
where X′ refers to the next state of the destination and X(k) refers to its past k states.

The TE may be measured for any two time series X and Y and is always a valid measure of the predictive gain from the source, but only represents physical information transfer when measured on a causal link (Lizier and Prokopenko 2010), e.g. directed anatomical connections. Here, we will use a history length k = 1 due to limitations of the number of observations. While the TE is not a direct measure of causal effect, the use of this short history length alters the character of the measure towards inferring causal effect (Lizier and Prokopenko 2010).

As per Eq. (5), the TE is computed here using two MI estimators in the style of Kraskov et al. (2004) and Kraskov (2004).3 Again, we use a window size of the two closest observations for the Kraskov-estimators here.

The TE is then a non-linear, directional measure for the information transfer between two variables. It has been used for example to analyze cortical interactions in simulated data (Honey et al. 2007), EEG data (Chávez et al. 2003; Grosse-Wentrup 2008), and fMRI data (Hinrichs et al. 2006), while a related measure was used in the investigations in Liang et al. (2001). It is important to note however that it does not detect the information transfer due to the interaction between two or more source variables. To capture this aspect, we need to extend it to consider multivariate sources and destinations.

2.1.3 Extending information transfer to multivariate source and destination

Each of these measures of information transfer may be trivially extended to consider joint variables as the source and destination, i.e. we have I(X ; Y) and \(T_k(\mathbf{Y} \rightarrow \mathbf{X}) = I(\mathbf{Y};\mathbf{X}' | \mathbf{X}^{(k)})\) where X and Y are joint variables.

The multivariate mutual informationI(X ; Y) measures the amount of information shared between a set of source variables Y and a set of destination variables X. It has been studied for example in fMRI data in Chai et al. (2009).

The multivariate transfer entropyTk(YX) (see Fig. 1) measures the amount of information that a set of source variables Y provides about a set of destination variables X, that was not contained in the past of the destination set. While the multivariate TE has been considered elsewhere (e.g. in genetic microarray data sets (Tung et al. 2007)), we are not aware of its application to fMRI data as yet. This is an important extension, because multivariate voxel patterns are known to be more informative about experimental conditions than single voxel activity (Norman et al. 2006; Haynes and Rees 2006). In particular, Tk(YX) will capture the information transfer that occurs due to the interaction between a set of source variables (e.g. an exclusive-OR type interaction); univariate analysis cannot capture this. As such, in the multivariate TE we now have a measure for information transfer that is non-linear, directional and captures collective interactions.
Fig. 1

Multivariate transfer entropy \(T_k(\mathbf{Y} \rightarrow \mathbf{X}) = I(\mathbf{Y};\mathbf{X}' | \mathbf{X}^{(k)})\) from the set of source variables Y to the set of destination variables X

Note however that the number of available observations limits the number of joint variables that may be analyzed in this manner (e.g. see Lungarella et al. 2005; Kraskov et al. 2004, and Section 2.2.2 and Appendix A.2 later). This is because spurious correlations are more frequently “detected” as this limit is approached, increasing the error in measurement.

2.1.4 Measuring information transfer between two regions of variables

Though theoretically appealing, it is generally impractical for us to compute the interregional information transfer as the multivariate transfer entropy Tk(RaRb) between the complete sets of variables in two regions Ra and Rb. This is because a “region” in most neural applications contains many variables.4 Complete sampling of these massively multivariate spaces would require many orders of magnitude more observations in time than could be practically obtained (see Lungarella et al. 2005; Kraskov et al. 2004). As such, the most practical way to retain the benefits of this measure is to compute the interregional information transfer from region Ra to region Rb as the multivariate TE averaged over a large number S of samples of pairs of subsets of v variables in each region:
$$ T_{k,v}\big(\mathbf{R}_a \rightarrow \mathbf{R}_b \big) = \left\langle T_k\big( \mathbf{R}_{a,i} \rightarrow \mathbf{R}_{b,j} \big) \right\rangle_{i,j} \label{eq:teInterregional}. $$
Here i and j label the sample subsets Ra,i and Rb,j of size v in each region. Again, while it is desirable to average over all pairs of subsets from each region, where:
$$ S = {\left| \mathbf{R}_a \right| \choose v} {\left| \mathbf{R}_b \right| \choose v} \label{eq:combosOfSubsets}, $$
this is impractical for large region sizes \(\left| \mathbf{R}_a \right|\) and \(\left| \mathbf{R}_b \right|\). In practical situations, S must be limited with the sample subsets selected randomly. Thus, Tk,v(RaRb) provides a measure of interregional information transfer that is non-linear, directional, captures collective interactions and can be realistically implemented.
The same technique of averaging over subsets of size v can be used with the multivariate MI to produce a similar but non-directional measure, the interregional mutual information:
$$ I_v\big(\mathbf{R}_a; \mathbf{R}_b\big) = \left\langle I\big( \mathbf{R}_{a,i} ; \mathbf{R}_{b,j} \big) \right\rangle_{i,j} \label{eq:miInterregional}. $$
Indeed, this is done in Chai et al. (2009), where the authors additionally condition on experimental conditions.
Finally, we note that both measures may be averaged over subjects s in order to assess group effects. This gives the following expressions:
$$ T^g_{k,v}\big(\mathbf{R}_a \rightarrow \mathbf{R}_b \big) = \left\langle T_{k,v}\big(\mathbf{R}_a \rightarrow \mathbf{R}_b \big) \right\rangle_s \label{eq:teInterregionalGroup}, $$
$$ I^g_v \big(\mathbf{R}_a; \mathbf{R}_b\big) = \left\langle I_v \big(\mathbf{R}_a; \mathbf{R}_b\big) \right\rangle_s \label{eq:miInterregionalGroup}. $$

2.2 Significance testing of information based connectivity measures

The measures described above quantify the information transfer between two single variables or sets of variables. However, it is important to realize that since they are computed from a finite number of observations, these measurements are actually random variables.

Hence, the presence or absence of a connection (MI or TE) needs to be assessed using statistical tests. In the following we explain the statistical procedures used to infer interregional links in this study. We use the TE to illustrate the procedures; note that they are also applicable to the MI.

2.2.1 Significance testing of the transfer entropy

First, we consider statistical significance testing for the standard univariate TE measurement Tk(YX), as previously described in Verdes (2005) and Chávez et al. (2003). The null hypothesisH0 of the test is that the state changes \(x_n^{(k)} \rightarrow x_{n+1}\) of the destination X have no temporal dependence on the source Y.

Assuming H0 true, we need to determine the distribution of TE measurements \(T_k(Y^p \rightarrow X)\) under this condition. This is done5 by (see Fig. 2):
  1. 1.

    generating many surrogate time-series (say P of them) by permuting the elements yn of the source time series Y to obtain each surrogate Yp, then

  2. 2.

    using each Yp to compute a surrogate TE to X: \(T_k(Y^p \rightarrow X)\).

Importantly, these surrogates are computed from the same number of observations, and the same distributions p(yn) and \(p(x_{n+1} | x_n^{(k)})\); the only difference is that the temporal dependence \(p(x_{n+1} | x_n^{(k)}, y_n)\) of the state changes of the destination on the source has been destroyed. Thus the distribution of the surrogates \(T_k(Y^p \rightarrow X)\) describes our expectation for Tk(YX) under H0. We can then:
  1. 3.

    determine a one-sided p-value of the likelihood of our observation of Tk(YX) being distributed as \(T_k(Y^p \rightarrow X)\); i.e. the probability of observing a greater Tk(YX) than that actually measured, assuming H0. This can be done either by directly counting the proportion of surrogates where \(T_k(Y^p \rightarrow X) \geq T_k(Y \rightarrow X)\), or assuming a normal distribution of \(T_k(Y^p \rightarrow X)\) and computing the p-value under a z-test.6

  2. 4.

    For a given α value, we reject H0 when p < α, concluding then that a significant directed temporal relationship between the source and destination does exist.

This method provides an objectively determined threshold for any TE measurement, and the given α allows comparison to other pairs of variables.
Fig. 2

Significance testing of the transfer entropy measurement Tk(YX) is performed by Section 2.2.1: 1 generating a number of surrogate source time-series Yp by permuting the elements yn of Y; 2 measuring \(T_k(Y^p \rightarrow X)\) for each surrogate Yp; 3, 4 determining the likelihood of Tk(YX) being distributed as \(T_k(Y^p \rightarrow X)\). The method can be extended to multivariate Y and X by generating corresponding surrogate source time-series Yp by permuting the elements yn of Y (Section 2.2.2). Significance testing of the interregional measure Tk,v(RaRb ) is similar in principle, with small differences in execution described in Section 2.2.2

2.2.2 Significance testing of interregional measures

Significance testing of the multivariate measure Tk(YX) is a straightforward extension. Most importantly, in generating the surrogates Yp we do not permute the component time series Y1, Y2,... of Yindividually but permute the vectors yn at each time point n in Yas a whole. This ensures that the only difference in making the surrogate measurements \(T_k(\mathbf{Y}^p \rightarrow \mathbf{X})\) is the temporal relationship \(p(\mathbf{x}_{n+1} | \mathbf{x}_n^{(k)}, \mathbf{y}_n)\).

Significance testing can similarly be extended to the interregional measure Tk,v(RaRb ), though there are a number of subtle differences. The process begins by:
  1. 1.

    generating P surrogate regional time-series \(\mathbf{R}_a^p\), by permuting the elements ra,n of Ra, then

  2. 2.

    using each \(\mathbf{R}_a^p\) to compute a surrogate interregional TE to Rb: \(T_{k,v}(\mathbf{R}_a^p \rightarrow \mathbf{R}_b )\).

A crucial difference to significance testing for the univariate TE is that these surrogate interregional TE measurements are means of S surrogate multivariate TE measurements \(T_k( \mathbf{R}_{a,i}^p \rightarrow \mathbf{R}_{b,j} )\) for paired subsets i and j of v variables. Importantly, the p-th permutation is applied to the whole source region as \(\mathbf{R}_a^p\)before the S subsets i of v variables for each \(\mathbf{R}_{a,i}^p\) are selected from it. Also, for each of the P permutations \(\mathbf{R}_a^p\) the same subsetsi and j must be selected as for the actual measurement Tk,v(RaRb ) in Eq. (6). Together, these constraints ensure that the only difference in making the surrogate measurements \(T_k( \mathbf{R}_{a,i}^p \rightarrow \mathbf{R}_{b,j} )\) is the temporal relationship \(p(\mathbf{r}_{b,n+1} | \mathbf{r}_{b,n}^{(k)}, \mathbf{r}_{a,n})\).
We can then compute a p-value and draw a conclusion on the significance of a directed interregional link by comparing Tk,v(RaRb ) with the P surrogate values \(T_{k,v}(\mathbf{R}_a^p \rightarrow \mathbf{R}_b )\) as per steps 3 and 4 in Section 2.2.1 above. Note that when using a z-test to compute the p-value, we write the mean over all P surrogate measures as:
$$ \overline{T}_{k,v}^P\big(\mathbf{R}_a \rightarrow \mathbf{R}_b \big) = \left\langle T_{k,v}\big(\mathbf{R}_a^p \rightarrow \mathbf{R}_b \big) \right\rangle_{p} \label{eq:teInterregionalSurrogates}. $$
while the equivalent for the MI is labeled \(\overline{I}_{v}^P\).

It is important to note that the method does not exclude bidirectional relationships (i.e. it does not just look for which direction was strongest).

Finally, we note that in assessing connectivity between pairs within R regions, we should correct our statistical threshold of α for multiple comparisons. We use the common Bonferroni correction and thus αc = α/N, where N is the number of connections tested. For undirected links (e.g. with MI) we have N = R(R − 1)/2; for directed links (e.g. with TE) we have N = R(R − 1). Note that except for the number of connections, which is the double for the directed measure of TE, there is no difference between the two measures from a statistical point of view.

In Appendix A we demonstrate the ability of the technique to correctly infer interregional links in a numerical data set with relatively low, non-linear coupling, and a small number of observations. We also demonstrate some level of robustness of the technique to undersampling in the data sets, and to inference where data sets are logically related without being directly linked. Also, we demonstrate how the statistical significance of the interregional measure can be used to guide the selection of the number of joint variables v. In particular, we quantitatively demonstrate that if v is too large given the number of observations, the measure is more susceptible to “detecting” spurious correlations amongst the variables, introducing large variance into the surrogates \(T_{k,v}(\mathbf{R}_a^p \rightarrow \mathbf{R}_b )\) and thereby increasing the p-value. Using the raw measurement alone will not discern this situation.

2.2.3 Significance testing across a group of subjects

This method of identifying significant interregional links outlined above is suitable for application on the individual level for a given p = αc level. Here we describe how to summarize the structure of connectivity for a set of regions across a group of subjects.

We can summarize the results for each subject s in a matrix Ms(a,b) representing the connectivity from Ra to Rb in the following way. For a significantly positive interregional TE measurement7 between regions Ra and Rb a 1 is entered at the corresponding position in the matrix. A significantly negative TE is entered as −1, while all other entries are set to zero. This matrix summarizes the significance (non-zero values) and direction of the effect (larger or smaller than chance, 1 or −1 respectively).

To summarize the structure of the connectivity across subjects, the significance matrices of the n subjects are added: M(a,b) = ∑ sMs(a,b). The values of M(a,b) range from − n to n. A value of 0 indicates that the number of positive and negative significant results was equal across subjects (including the case, where both values were 0). A positive number m indicates that across all subjects, there were m subjects more that had a significant effect with a positive mean, than with a negative mean (e.g. 4 positive means and 0 negative means, but also 6 positive means and 2 negative means). The binomial distribution B(n,p = α), with p = α being the statistical threshold for the single subjects, approximates8 our expectation of m from chance alone. As such, we compare m to B(n,p = α) to determine whether the interregional relationship is significant across the group.

2.3 Significance testing for modulation of connectivity measures

In order to verify whether the connectivity between brain regions is modulated by an experimental condition, we performed a second analysis. Since the experimental condition in our test data set (in Section 3) varied parametrically, we also assessed the modulation of MI and TE by linear dependence of MI and TI on experimental conditions. Note however, that other modulation types (e.g. the difference between two conditions) could be used in exactly the same way. We fitted the MI or TE measures (mean over all samples) with a linear regression against the experimental parameter of interest. Again this resulted in a true measured slope of the regression and a distribution of P slopes from the randomly permuted analyses. The statistical testing was done as described in Section 2.2.3 by assessing the probability that the true value emerges from a Gaussian distribution characterized by the mean and variance of the random permutations.

With the TE measurements, these tests determine changes in directed information transfer for a given region pair as a function of experimental condition. With the MI measurements, the tests determine whether the regions have more or less in common as a function of the experimental condition.

Again we considered positive and negative slopes separately and then merged them in the matrix Ms(a,b). The significance levels αc are the same as for the connectivity measures. We examined group effects by comparing M(a,b) = ∑ sMs(a,b) to binomial distributions, as for the interregional measures themselves.

3 Material: fMRI data set for manual tracking task

The method described above was applied to a data set acquired to investigate the influence of predictability in a visuo-motor coordination task. We provide here an overview of the experimental design and preprocessing necessary to understand the information-theoretic analysis presented. A full description of all methodological details is given in Horstmann (2008).9

3.1 Subjects and experimental design

We analyzed the data of eight subjects (4 female, mean age: 27.5 years, SD 3.8) that participated in an fMRI experiment. All experiments were performed at the Max Planck Institute for Brain and Cognitive Sciences in Leipzig. During scanning, subjects were asked to track with their right index finger a dot that was moving along a circular outline not visible to the subject (see Fig. 3).
Fig. 3

Experimental design and regions of interest (a) Layout of the screen. Dotted line: illustration of target path (not visible to subjects), white circle: target, smaller red circle: cursor, gray cross: fixation cross, small gray arrow: overall movement direction, larger turquoise and red arrows: movement sequence, illustrating successive cw- and ccw-components of the stimulus. (b) Velocity over time. Upper panel: high predictability. Lower panel: low predictability. cv constant velocity, cw clockwise, ccw counterclockwise, st sinusoidal transient. (c) Trial time course. Time elapses from top to bottom. Successive periods of one trial are given. Dotted line: path on which the target moves. Cross: fixation cross, red dot: cursor, white dot: target, arrows indicate direction of overall movement. (d) The 11 regions of interest from which data was extracted for the information-theoretic analysis. See Table 1 for full ROI names. The ROI in the basal ganglia is not shown here

Continuous visual feedback informed the subjects about the position of the target and their finger tip. The target followed an alternating clock-wise (cw) counter-clock-wise (ccw) trajectory. The difficulty of the tracking task was altered between four different levels in blocks of 16.8 s by changing the regularity of the cw vs. ccw stretches (see Fig. 3(a) to (c)). 20 such blocks were sampled for each difficulty. The visuo-motor tracking task required the integration of visual input, motor control and proprioceptive feedback, and the difficulty levels were used to manipulate the predictability of the movement that had to be tracked. We were interested in possible changes in the interaction between brain areas depending on the predictability of the target. Functional magnetic resonance imaging (fMRI) gradient echo EPI data (Siemens TRIO 3T, TR = 2.8 s, TE = 30 ms, matrix: 64 × 64, FOV = 19.2 cm, resolution: 3 × 3 × 3 mm, 42 slices) was acquired during the tracking task from each subject.

3.2 Preprocessing and extraction of regions of interest

We first performed standard preprocessing including motion correction and spatial normalization on the data (Friston et al. 2006). In order to extract potential regions of interest (ROI) the preprocessed data was analyzed with a general linear model (GLM). We then looked for a general effect of interest for tracking (F-test) and extracted regions based on a significance level of p < 0.05 (family wise error corrected). This was used to find regions that were activated during the task. In addition to the regions activated by the task, two anatomical masks of the left and right superior colliculus (SC) were included, because we were particularly interested in a possible involvement of the SC in the task (Lunenburger et al. 2001). For a detailed description of this analysis see Horstmann (2008). Overall, this procedure yielded 11 ROIs of different sizes (see Fig. 3 and Table 1). These served as ROIs for the information-based connectivity analysis. From each ROI, we extracted the multivariate fMRI data during the tracking periods (20 tracking periods for each of the 4 difficulty levels, with 7 functional images per tracking period). In this way, we obtained for each ROI and difficulty level a data matrix X of dimension NI ×NV for each subject, where NI = 140 is the number of functional images per difficulty level and NV is the number of voxels in the region as given in Table 1. In order to avoid spurious results due to motion, scanner drifts or global brain signal fluctuations, we corrected for the temporal signal of each voxel for motion and global (average over the whole brain) fluctuations. For this, the average signal over the whole brain was computed and then together with the 6 motion parameter estimates from the SPM preprocessing regressed out of all voxel time courses, i.e. the voxel time courses were orthogonalized to these seven time courses (global mean plus 6 motion parameters).
Table 1

Table of regions of interest analyzed in the visuo-motor tracking task

Name of region


Number of voxels NV

Left superior colliculus



Right superior colliculus



Right cerebellum



Right basal ganglia



Primary visual cortex



Left primary motor cortex



Left supplementary motor area



Left dorsal premotor cortex



Right dorsal premotor cortex



Left superior parietal lobule



Right superior parietal lobule



4 Results: application to fMRI experimental data

In this section, we describe the application of the above methods for determining interregional structure from an fMRI data set of a visuo-motor tracking task. We also examine the changes in this structure due to changes in the difficulty of the visuo-motor task is altered.

Using the method described in Section 2.1 we obtained for each subject interregional MI and TE measures for the dataset consisting of multivariate data from the 11 ROIs (see Section 3). We averaged over S = 3,000 sampled subset pairs of v = 3 voxels for each interregional measure.10 In addition, the data was sorted into 4 difficulty levels. The connectivity measures between each pair of ROIs could be defined for each difficulty as the mean of the S = 3000 samples of connectivity.

To test whether the connection between two ROIs is significantly different from chance (following Section 2.2.2), we averaged across the 4 difficulty levels (and as indicated above across samples). The resulting value was compared to the distribution obtained from P = 300 surrogate measurements, for which we calculated exactly the same value. We used a one-sided z-test (see Section 2.2.3) with significance level of α = 0.05, which was corrected for multiple comparisons as αc = 0.05/55 for the MI structure and αc = 0.05/110 for the TE structure. In comparing across the group (as per Section 2.2.3), we would expect M(a,b) to follow a binomial distribution B(n = 8,p = 0.05) by chance alone and note that a significance level of pg = 0.05/N (corrected for multiple comparisons) is reached if at least 3 subjects individually show a significant effect.

Traditionally, the 140 time observations for each difficulty here would be seen as too small a data set for the application of information-theoretical measures. However, our harnessing of underlying Kraskov-type estimators (Kraskov et al. 2004; Kraskov 2004) and our focus on statistical significance not only allows our technique to gain insights from this small data set, but also to do so in a multivariate fashion. The numerical surrogate data set analyzed in Appendix A.1 demonstrated the ability of our technique to correctly infer interregional links in data sets of this size, with relatively low, non-linear coupling.

In the next sections, we describe several aspects of the obtained interregional connectivity structure observed during this visuo-motor tracking task. We show that two subnetworks can be isolated based on the average MI between brain regions. We then suggest based on TE measures that the network is organized in a 3-tier directed structure, with the top two tiers corresponding to the subnetworks identified with MI. Finally, we report changes in the connectivity structure that depend on the difficulty of the visuo-motor tracking task. The main focus in the following is to present the information measure methods and give some interpretations of their results. A more detailed account of the regional activations patterns, their dependence on difficulty and an in depth physiological interpretation will be published elsewhere.

4.1 Subdivision of visuo-motor network based on MI

The interregional MI Iv(Ra; Rb) was significantly higher in the true data than in the random permutations for all possible pairs of regions across the group. That is, for all 55 undirected connections a significance level of ps < 0.05 (Bonferroni corrected) was reached in at least 3 subjects, which is significant across the group as explained above. On one hand this is insightful, as it indicates that all of the ROIs have much in common as the subjects undertake this visuo-motor task. However, we seek more detailed insights into the information structure in place here.

Thus, we examine the average interregional MIs across all subjects for our set of ROIs, with the symmetric matrix \(G_{a,b}=I^g_v(\mathbf{R}_a; \mathbf{R}_b)-\overline{I}^{g,P}_v(\mathbf{R}_a; \mathbf{R}_b)\) (see Eqs. (10) and (11)). Note that average MI was defined relative to the value averaged across all subjects expected by chance: \(\overline{I}^{g,P}_v(\mathbf{R}_a; \mathbf{R}_b) = \big\langle \overline{I}^{P}_v(\mathbf{R}_a; \mathbf{R}_b) \big\rangle_s \). We analyzed this matrix of MI values by applying spectral reordering, an algorithm that reorders the rows and columns of the matrix to concentrate the mass of the matrix as close as possible to the diagonal (Johansen-Berg et al. 2004). Spectral reordering can be used to extract clusters in connectivity structure that is represented by a symmetric matrix. The reordered average MI is shown in Fig. 4(a). The spectral reordering clearly reveals two main structures: a premotor-motor cortical subnetwork consisting of areas M1, lSMA, lPMd, rPMd, lSPL and rSPL, and a smaller cluster consisting only of the two superior colliculi. The three other ROIs (V1, lCer and rBG) are only weakly connected to these two structures. Figure 4 summarizes the results from this first part of the analysis.
Fig. 4

Mutual information structure (a) The average interregional MI (see scalebar on the right) across the group is plotted for all connections. White dashed lines are inserted to indicate the borders between the clusters suggested by spectral reordering. All estimated MIs were significantly higher than chance in at least 3 of the subjects, and thus significant at the group level. Here, we show the average value MI over all 8 subjects. Note that the matrix is symmetric and that there are no entries on the diagonal (indicated by the bold white line and letters S and C to illustrate the two subnetworks). For illustration reasons the maximum gray scale value was chosen to be Iv = 0.065. Only the MI between the two SC is higher (Iv(lSC,rSC) = 0.164). (b) Summary of the subdivision of the 11 ROIs into two large clusters suggested by spectral reordering. All 11 regions that entered the analysis are drawn as circles and arranged to the left and right according to their anatomical location. Cortical areas are indicated by gray circles, subcortical regions by white circles. The two clusters illustrated in subfigure (a) are illustrated by the dashed rectangles. Cortical cluster (C, gray circles with solid outline), SC cluster (S, white circles with solid outlines). The three remaining ROIs are also indicated (V1, gray with dashed outline, rBG and rCer, white with dashed outline). ROIs not belonging to any cluster are drawn with dashed outlines. Note that V1 is placed on the left rather than in the middle for illustrative purposes. All abbreviations for brain regions are the same as defined in Table 1

4.2 Directed information structure

The interregional TE was significantly higher for the measured data (compared to the surrogate data) for a number of directed links at the group level (i.e. where Tk,v was significant for 3 or more subjects as described earlier). Figure 5 summarizes the directed information structure determined here.
Fig. 5

Directed information structure (a) Connection matrix plotting the number of subjects for which each interregional link is found to be significant with the interregional TE. Entries in the connections matrix are discrete, indicating the number of subjects for which the corresponding connection was significant. A link is significant at the group level if significant for 3 or more subjects here (see Section 4). The gray scale values show the number of subjects for which a link is significant (see legend on the right). The names of the regions are the same as given in Table 1. (b) Representation of the directed interregional structure as a graph. The same layout of cortical areas as in Fig. 4 is presented. The thickness and style of lines indicates the number of subjects which had a significant TE connection between the corresponding regions. See legend in subfigure (a) for a description of line styles

This reveals a distinct three-tier directed information transfer structure, which is quite illustrative about the information structure during the visuo-motor task. The top two tiers correspond to the premotor-motor cortical subnetwork and the SC cluster identified using the average interregional MIs in Fig. 4. The top tier provides directed information to the middle tier, and both top and middle provide directed information inputs to the bottom tier. This bottom tier contains the right cerebellum, which is involved in motor control for the right hand, which was used during the experiment. The basal ganglia region remains uninvolved, but the primary visual cortex V1 appears to receive some directed input from the premotor-motor cortical subnetwork (rSPL).

This tiered structure is in line with the tasks involved in the experiment. The tracking task requires visuo-motor integration and precise computation of the motor output. The premotor-motor cortical network, which is involved in such tasks, should send the relevant information to the superior colliculi and the cerebellum, where basic movement control mechanisms will be computed. Also, the superior colliculi are involved in guiding eye position and attention, which is crucial in the task requiring visuo-motor control.

4.3 Changes in information structure

Significance testing of the task (difficulty) dependency of interregional information transfer was performed for each ROI pair, using both TE and MI. To assess the changing nature of the directed and undirected relationships respectively, we computed a linear fit of task difficulty for the TE and MI (average over S = 3,000 samples) respectively, and compared resulting value to the null hypothesis of it having been drawn from the same distribution as the P = 300 permutations as outlined in Section 2.3.

The MI between several pairs of regions is modulated with task difficulty consistently across subjects (see Fig. 6). As outlined above (Section 2.2.3) the criterion for significance was that at least 3 subjects showed a significant slope (in the same direction). We did not find any significant changes with respect to tracking difficulty for the directed TE measure.
Fig. 6

Task dependent modulation of mutual information (a) The matrix representing the number of significant modulations of MI found across the group is plotted for all connections. Gray values code the number of significant subjects (all of which displayed an increase in MI with task difficulty). Note that the matrix is symmetric and that there are no entries on the diagonal (indicated by the bold white line). White dashed lines are inserted to indicate the borders between the cortical and the superior colliculus clusters found in the MI analysis. (b) Summary of MI modulation with task difficulty. Brain areas are drawn as describe in Fig. 4. Changes observed in 3 subjects are drawn as solid lines. See also the legend in subfigure (a). Abbreviations are defined in Table 1

All observed changes are due to an increase in MI between ROIs as a result of an increased difficulty of the tracking. Interestingly, MI was increased within the cortical premotor-motor network (left SMA, left and right PMD and left M1), and in the connections from M1 to the cerebellum.

5 Discussion

We have shown that the method presented here was successfully applied to an fMRI data set from a visuo-motor tracking task. The results suggest a structured network that is in line with what would be expected from the requirements of the task. Importantly, we also detected changes in the network structure based on multivariate MI measures. As fMRI is an indirect measure of neuronal activity, the interpretation of results that are based on interactions between voxels is not always straight forward. In addition, the temporal resolution of fMRI does not allow us to measure data at a rate fast enough to capture direct neuronal interactions.

5.1 Cortico-subcortical structure involved in visuo-motor tracking

The set of regions that entered the information-theoretic analysis was further subdivided by the analysis in two respects. First, MI measures suggest a subdivision of the regions into a large cortical cluster that contains several motor and premotor areas, a small collicular cluster that contains the two superior colliculi, and a third set of regions that contains the ROIs in V1, the Basal Ganglia and the Cerebellum. Second, the TE measure revealed a directed connectivity structure with the following general pattern: Cortical regions are connected via forward links to the superior colliculi, and cortical regions as well as the superior colliculi have directed links to the right Cerebellum. Within the predefined set of regions, the structure of these links suggests that cortical premotor and motor areas influence the superior colliculi and that both of them influence the cerebellum.

While the MI measures simultaneous fluctuations in the fMRI signals, the TE tries to infer a directed influence which is much more susceptible to problems of temporal resolution in the data acquisition. The low temporal resolution of fMRI puts some limitations to the interpretation of the TE results. Although we established a directed structure within the voxel space of the fMRI, this does not necessarily mean that the directed links correspond directly to neuronal connections between the respective regions. The temporal resolution of the fMRI experiment (TR = 2.8 s) is much slower than typical time scales of neuronal processing. For example, the delay of the onset of spiking of neurons in visual areas in response to a visual stimulus increases by the order of 10 ms only between one visual area and a second area that is one level higher in the visual hierarchy (Bullier 2001). Therefore, fast neuronal interactions might be too fast to be captured by fMRI as directed links. In our study, one might, e.g., expect to find some level of input from the visual cortex (V1) to the movement planning regions (since planning requires the visual information of where the object is). Such a directed connection was not found (see Fig. 5). That is to say, the raw visual information about where the object is may still influence the activity in the movement planning regions, but in a way that is too fast to be detected directly using fMRI. The directed link from the premotor-motor network to visual cortex, on the other hand can be interpreted as predictive effects on the visual system by modulating attention in the periphery while maintaining visual fixation at the center of the visual field. This link might be captured because in the visuo-motor tracking paradigm the general task structure was relatively slow, and therefore top-down attentional signals might modulate more slowly than the bottom-up visual input. The top-down effect does not need to be at or slower than the time resolution of fMRI here (TR = 2.8 s) to be detected though, since we have demonstrated some level of robustness of our technique to undersampling in the underlying data in Appendix A.4.

In addition, there might be a confound in TE measures of directed information between signals in the different regions, when it comes to interpreting these links directly on the neuronal level. The hemodynamic delay can differ between different cortical regions up to around 1 s (Handwerker et al. 2004), which might in the worst case invert the temporal succession if one compares the TE between two regions from BOLD (blood oxygen level dependent) data to the real neuronal firing. Finally, it might well be that a bidirectional link on the neuronal level results in an increased MI between two regions, but is not visible in the TE structure, because the neuronal interactions are too fast to be captured by fMRI. These issues do not affect other data modalities (e.g. EEG), but need to be taken into account when interpreting the TE results from fMRI data here.

5.2 Modulation of MI with task difficulty

In applications of fMRI within cognitive neuroscience, it is, generally, of particular interest to detect changes that are caused by some experimental manipulations. In addition to the general structure, we provide evidence that the method presented here is able to find such task related changes in the connectivity structure based on the MI measurement.

Interestingly, two types of connections showed an increase in MI, i.e. functional connectivity, with increased tracking difficulty. First, the result points towards the need of an increased cortico-cortical interaction, if a visuo-motor task becomes less predictable, and thus more difficult. This increased interaction is reflected in the connections that are modulated within the cortical cluster (see Fig. 6). Second, an increase in interactions between motor cortex and the cerebellum is in accordance with the increased need of error correcting movements, when the predictability of the target is low. In accordance, earlier studies have shown that the cerebellum is involved in such on-line adjustments and that M1 and the cerebellum interact during early phases of motor learning (Tanaka et al. 2009; Penhune and Doyon 2005). Based on our statistics we did not find any significant changes that involve the SC, which might have been expected for such a task (Lunenburger et al. 2001).

Interestingly, these changes show up only in the MI and not in the TE. There are two possible interpretations of this. First, the interaction between the regions that show a modulation of MI with task difficulty could be bidirectional. This is likely to be the case for the cortico-cortical connections. In fact, we did not observe a directed link between these regions. Second, it is possible that the changes in MI are the result of relatively rapid fluctuations due to an unidirectional connection between two regions. Such changes might be reflected in the MI structure, but would be missed by the TE measure, because of the slow temporal resolution of fMRI.

5.3 What is the effect of using multivariate interactions?

The main results reported in this paper are based on an analysis that captures information structure between sets of v = 3 voxels in regions of interest defined based on a general activation during the task. Appendix B describes how v = 3 was selected using p-values from our statistical significance technique. An additional question that needs to be addressed then, is whether an observed structure depends on the choice of v. In particular, is there any difference between the multivariate analysis presented here and a univariate analysis that averages over sets of 1 voxel in each region or even considers the average time course within the ROI? If the analysis is multivariate, does the dimensionality of the studied interactions change the structure?

We have addressed these questions in a series of additional analyses, summarized in Appendix B. For all three information structures (TE, MI and modulation of MI) the multivariate analyses are correlated with the univariate analyses suggesting that part of the multivariate structure resides on univariate interactions that are captured by the multivariate analysis as well. However, all structures derived with multivariate analyses are much more similar to each other than they are to the univariate analyses. Hence, there is a consistent structure that is captured by the multivariate, but not by the univariate interactions.

Importantly, Appendix B also shows that analysis using the average time course within the ROIs was not able to infer any interregional links at the group level.

5.4 Methodological considerations

We have shown that despite the low number of samples, multivariate information structure can be estimated from fMRI data. However, there are some methodological questions that need to be discussed. One major problem of measuring neural interactions with fMRI is the fact that BOLD fMRI provides an indirect measure of neuronal activity only (Logothetis et al. 2001). The low temporal sampling rate in the order of seconds but also the low pass filtering effects of the hemodynamic response function prevent us from accessing directly the underlying neuronal connectivity. That said, Appendix A.4 shows that our technique has some theoretical robustness to undersampling and memory in the data (which simulates low-pass filtering). Another issue is that we know from Appendix A.3 that our technique can infer connections where the data sets have a logical relationship but are not directly connected. However, Appendix A.3 also shows the circumstances under which this can occur, highlighting our technique is much less sensitive to these undesirable inferences than actual connections, and we suggest extensions using the complete transfer entropy (Lizier et al. 2008) that may improve the technique in these cases. In the absence of such an extension, we perform the simple “comparisons amongst connected triplets” described in Appendix A.3 at the group level for the TE structure, with the results suggesting that the connected triples involving the left SC are unlikely to contain pathway and common cause false positives, but not suggesting the same evidence against the right SC → right Cerebellum link being a common cause false positive.11 Finally, we note that the finding of changes in the MI structure based on changes in the task difficulty also shows that modulations in the connectivity structure can be captured. Although we cannot infer the presence of direct underlying neuronal connections, we can certainly assess whether the coupling between two regions changes in a task dependent manner.

We have applied the method to a data set that used a block design. In this way, the experimental conditions (difficulty) could be varied slowly, even compared to the temporal resolution of fMRI, and thus fluctuations within each experimental condition can be captured. Many fMRI studies, however, use event related designs, where the brain response to short events is investigated. Although from a technical point of view, it is straight forward to apply the method to event related data, there are certainly limitations that need to be considered. In particular, the locking of the response to short events causes large event driven responses. These onset responses are not informative of the task itself and thus it is necessary to remove them, which can be done, e.g., by a linear regression model. An additional problem will be the even more limited number of data samples available. Usually, inter trial intervals in event related fMRI designs are relatively short, which results in considerable overlap of the BOLD responses from different events. This is not a problem if a standard general linear model is calculated, because the parts can be separated statistically. But it will make it very difficult to estimate any kind of interaction. We thus think that the study of interactions in fMRI is preferably done using a block design experiment.

5.5 Final conclusion

We have presented a novel approach to analyzing functional connectivity of time-series data to establish interregional information structure. Its combined characteristics (being information-theoretical, asymmetric and multi-variate) distinguish it in identifying directional, non-linear, and collective interactions between regions in a model-free manner. Furthermore the specific information-theoretic estimators used and the focus on statistical significance allows the technique to provide insights from comparatively small data sets.

In this paper, the method was applied to fMRI data from a visuo-motor tracking task. Here, it identified an interesting three-tier interregional information structure, with movement planning regions providing input to visual perception and control regions, and both these tiers driving motor execution. The method also identified increased coupling between movement planning and motor execution regions as the tracking task became more difficult.

The presented method has thus been demonstrated to be useful for investigating interregional structure in fMRI studies, but importantly it could also be applied to other modalities such as electrophysiological multi-electrode array recordings, where the measured data has a better temporal resolution and allows one to draw direct conclusions about neuronal interactions.

Future work will include exploring whether the extensions suggested in Appendix A.3 can correct undesirable inference of indirect logical relationships. We also seek to investigate information transfer on shorter time scales as measurement technology improves, including potentially identifying coherent information transfer structures in the cortex (Lizier et al. 2008; Gong and van Leeuwen 2009).


  1. 1.

    The TE can be formed as Tk,l(YX), where l past states of Y are considered as the information source \(y_n^{(l)}=\{ y_n, y_{n-1}, \ldots ,y_{n-l+1} \}\).

  2. 2.

    Note that the TE is equivalent to the directed transinformation (DTI) measure under certain parameter settings for the DTI (specifically M = 1 and N = 0) as per Hinrichs et al. (2006). Also, note that the TE is equivalent to the specific formulation of the DTI used in Saito and Harashima (1981) if the TE parameter l (discussed in footnote 1) is set equal to k.

  3. 3.

    Note the TE could be computed in the style of Kraskov et al. (2004) and Kraskov (2004) but with a direct conditional MI calculation as per Frenzel and Pompe (2007).

  4. 4.

    For example, fMRI regions contain potentially hundreds of voxels.

  5. 5.

    The following explanation assumes that only one previous state yn of the source is used in the computation of Tk(YX); i.e. the parameter l = 1 (see Schreiber 2000).

  6. 6.

    We use z-tests in our experiments in Section 4 because we are comparing to very low α values after making Bonferroni corrections (see Section 2.2.2), which would render direct counting quite sensitive to statistical fluctuations.

  7. 7.

    We analyze the MI with separate matrices.

  8. 8.

    Note that testing against a binomial distribution is a conservative choice here, because it is less likely to get 6 significant results (5 with positive mean and 1 with negative mean) than to get 4 positive ones only. However, when tested over the group we consider the threshold according to the latter, which is truly binomial.

  9. 9.

    See Chapter 5 of the PhD thesis which can be downloaded from the German National Library:

  10. 10.

    We explain in Appendix B how the number of joint voxels v = 3 was selected to balance the ability to capture multivariate interactions with the limitations of the number of available observations. Also in that appendix, we explore the effect of altering v (including conducting univariate analysis with v = 1). Furthermore, the appendix explores the effect of altering the number of subset pairs S and surrogate measurements P.

  11. 11.

    As described in Appendix A.3, this simple test does not mean that the right SC → right Cerebellum link is a false positive; it simply does not add evidence against the false positive.

  12. 12.

    Our use of 140 time steps for each C and χ combination matches the length of fMRI time series analyzed in Section 4.

  13. 13.

    The minimum strengths required for detection here may seem large at first glance, however one must bear in mind the specific difficulties built into this data set: the non-linear coupling, the small number of samples, and relatively low influence of the Y on X (low χ/ϵx). Also our correction for a large number of comparisons is a factor here. This being said, correcting for multiple comparisons provides important protection against false positives so must be maintained when investigating all values of C here.

  14. 14.

    High memory in the source Z is required for the values zn (considered by the interregional TE) to contain some information about the previous values zn − 1 which had an indirect effect on xn + 1 via yn.

  15. 15.

    We expected that high memory in the destinations Y and X and in the common source Z would help preserve information in Y about the source Z which would be helpful to predicting X.

  16. 16.

    Note that the combination of undersampling and memory in our variables provides a smoothing-type effect on the data. As such, these results imply some level of robustness for the technique against temporal smoothing in the underlying data.

  17. 17.

    Similarly, only two interregional links were inferred at the group level by the interregional TE with univariate analysis (v = 1) and S = 3,000, P = 300.



JL and JH thank Thorsten Kahnt for discussions on the statistical analysis. JL thanks Mikail Rubinov for helpful suggestions. JL thanks the Australian Research Council Complex Open Systems Research Network (COSNet) for a travel grant that partially supported this work. JDH thanks the Max Planck Society, the Bernstein Computational Neuroscience Program of the German Federal Ministry of Education and Research (BMBF Grant 01GQ0411) and the Excellence Initiative of the German Federal Ministry of Education and Research (DFG Grant GSC86/1-2009). MP is grateful for a 2009 Research Grant from The Max Planck Institute for Mathematics in the Sciences (Leipzig, Germany) on Information-driven Self-Organization and Complexity Measures.

Author contributions: J.-D.H., J.H. and A.H. conceived the fMRI experiment. A.H. performed the fMRI experimental work. J.H. and A.H. pre-processed the data. J.L. and M.P. conceived the information-theoretical analysis. J.L. performed the information-theoretical analysis. J.H. performed the statistical analysis. J.L. and J.H. wrote the paper.


  1. Bassett, D. S., & Bullmore, E. T. (2009). Human brain networks in health and disease. Current Opinion in Neurology, 22(4), 340–347.CrossRefPubMedGoogle Scholar
  2. Bettencourt, L. M. A., Stephens, G. J., Ham, M. I., & Gross, G. W. (2007). Functional structure of cortical neuronal networks grown in vitro. Physical Review E, 75(2), 021915.CrossRefGoogle Scholar
  3. Bode, S., & Haynes, J. D. (2009). Decoding sequential stages of task preparation in the human brain. NeuroImage, 45(2), 606–613.CrossRefPubMedGoogle Scholar
  4. Bressler, S. L., Tang, W., Sylvester, C. M., Shulman, G. L., & Corbetta, M. (2008). Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. Journal of Neuroscience, 28(40), 10056–10061.CrossRefPubMedGoogle Scholar
  5. Büchel, C., & Friston, K. J. (1997). Modulation of connectivity in visual pathways by attention: cortical interactions evaluated with structural equation modelling and fMRI. Cerebral Cortex, 7(8), 768–778.CrossRefPubMedGoogle Scholar
  6. Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96–107.CrossRefPubMedGoogle Scholar
  7. Chai, B., Walther, D. B., Beck, D. M., & Fei-Fei, L. (2009). Exploring functional connectivity of the human brain using multivariate information analysis. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 270–278). NIPS Foundation.Google Scholar
  8. Chávez, M., Martinerie, J., & Le Van Quyen, M. (2003). Statistical assessment of nonlinear causality: Application to epileptic EEG signals. Journal of Neuroscience Methods, 124(2), 113–128.CrossRefPubMedGoogle Scholar
  9. Frenzel, S., & Pompe, B. (2007). Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20), 204101.CrossRefPubMedGoogle Scholar
  10. Friston, K. (2002). Beyond phrenology: What can neuroimaging tell us about distributed circuitry? Annual Review of Neuroscience, 25, 221–250.CrossRefPubMedGoogle Scholar
  11. Friston, K., Ashburner, J., Kiebel, S., Nichols, T., & Penny, W. (2006). Statistical parametric mapping: The analysis of functional brain images. Elsevier, London.Google Scholar
  12. Friston, K. J. (1994). Functional and effective connectivity in neuroimaging: A synthesis. Human Brain Mapping, 2, 56–78.CrossRefGoogle Scholar
  13. Friston, K. J., & Büchel, C. (2000). Attentional modulation of effective connectivity from V2 to V5/MT in humans. Proceedings of the National Academy of Sciences of the USA, 97(13), 7591–7596.CrossRefPubMedGoogle Scholar
  14. Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. Neuroimage, 19(4), 1273–1302.CrossRefPubMedGoogle Scholar
  15. Gong, P., & van Leeuwen, C. (2009). Distributed dynamical computation in neural circuits with propagating coherent activity patterns. PLoS Computational Biology, 5(12), e1000611.CrossRefGoogle Scholar
  16. Grosse-Wentrup, M. (2008). Understanding brain connectivity patterns during motor imagery for brain-computer interfacing. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 561–568). Curran Associates, Inc.Google Scholar
  17. Handwerker, D. A., Ollinger, J. M., & D’Esposito, M. (2004). Variation of bold hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage, 21(4), 1639–1651.CrossRefPubMedGoogle Scholar
  18. Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523–534.CrossRefPubMedGoogle Scholar
  19. Haynes, J. D., Tregellas, J., & Rees, G. (2005). Attentional integration between anatomically distinct stimulus representations in early visual cortex. Proceedings of the National Academy of Sciences of the USA, 102(41), 14925–14930.CrossRefPubMedGoogle Scholar
  20. Hinrichs, H., Heinze, H. J., & Schoenfeld, M. A. (2006). Causal visual interactions as revealed by an information theoretic measure and fMRI. NeuroImage, 31(3), 1051–1060.CrossRefPubMedGoogle Scholar
  21. Honey, C. J., Kotter, R., Breakspear, M., & Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proceedings of the National Academy of Sciences, 104(24), 10240–10245.CrossRefGoogle Scholar
  22. Horstmann, A. (2008). Sensorimotor integration in human eye-hand coordination: Neuronal correlates and characteristics of the system. Ph.D. thesis, Ruhr-Universität Bochum.Google Scholar
  23. Johansen-Berg, H., Behrens, T. E., Robson, M. D., Drobnjak, I., Rushworth, M. F., Brady, J. M., et al. (2004). Changes in connectivity profiles define functionally distinct regions in human medial frontal cortex. Proceedings of the National Academy of Sciences of the USA, 101(36), 13335–13340.CrossRefPubMedGoogle Scholar
  24. Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis. Cambridge: Cambridge University Press.Google Scholar
  25. Kraskov, A. (2004). Synchronization and interdependence measures and their applications to the electroencephalogram of epilepsy patients and clustering of data. In Publication series of the John von Neumann Institute for computing (Vol. 24). Ph.D. thesis, John von Neumann Institute for Computing, Jülich, Germany.Google Scholar
  26. Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.CrossRefGoogle Scholar
  27. Liang, H., Ding, M., & Bressler, S. L. (2001). Temporal dynamics of information flow in the cerebral cortex. Neurocomputing, 38–40, 1429–1435.CrossRefGoogle Scholar
  28. Lizier, J. T., & Prokopenko, M. (2010). Differentiating information transfer and causal effect. European Physical Journal B, 73(4), 605–615.CrossRefGoogle Scholar
  29. Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2008). Local information transfer as a spatiotemporal filter for complex systems. Physical Review E, 77(2), 026110.CrossRefGoogle Scholar
  30. Logothetis, N., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001). Neurophysiological investigation of the basis of the fMRI signal. Nature, 412, 150–157.CrossRefPubMedGoogle Scholar
  31. Lunenburger, L., Kleiser, R., Stuphorn, V., Miller, L. E., & Hoffmann, K. P. (2001). A possible role of the superior colliculus in eye-hand coordination. Progress in Brain Research, 134, 109–125. 0079-6123 (Print) 0079-6123 (Linking) Journal Article Research Support, Non-U.S. Gov’t Review.Google Scholar
  32. Lungarella, M., Pegors, T., Bulwinkle, D., & Sporns, O. (2005). Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics, 3(3), 243–262.CrossRefPubMedGoogle Scholar
  33. MacKay, D. J. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.Google Scholar
  34. Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.CrossRefPubMedGoogle Scholar
  35. Penhune, V. B., & Doyon, J. (2005). Cerebellum and m1 interaction during early learning of timed motor sequences. Neuroimage, 26(3), 801–812.CrossRefPubMedGoogle Scholar
  36. Ramsey, J., Hanson, S., Hanson, C., Halchenko, Y., Poldrack, R., & Glymour, C. (2010). Six problems for causal inference from fMRI. NeuroImage, 49(2), 1545–1558.CrossRefPubMedGoogle Scholar
  37. Rubinov, M., Knock, S. A., Stam, C. J., Micheloyannis, S., Harris, A. W. F., Williams, L. M., et al. (2009). Small-world properties of nonlinear brain activity in schizophrenia. Human Brain Mapping, 30, 403–416.CrossRefPubMedGoogle Scholar
  38. Saito, Y., & Harashima, H. (1981). Tracking of information within multichannel EEG record - causal analysis in EEG. In N. Yamaguchi & K. Fujisawa (Eds.), Recent advances in EEG and EMG data processing (pp. 133–146). Amsterdam: Elsevier/North Holland Biomedical Press.Google Scholar
  39. Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464.CrossRefPubMedGoogle Scholar
  40. Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J. D. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11(5), 543–545.CrossRefPubMedGoogle Scholar
  41. Tanaka, Y., Fujimura, N., Tsuji, T., Maruishi, M., Muranaka, H., & Kasai, T. (2009). Functional interactions between the cerebellum and the premotor cortex for error correction during the slow rate force production task: An fmri study. Experimental Brain Research, 193(1), 143–150.CrossRefGoogle Scholar
  42. Tung, T. Q., Ryu, T., Lee, K. H., & Lee, D. (2007). Inferring gene regulatory networks from microarray time series data using transfer entropy. In P. Kokol, V. Podgorelec, D. Mičetič-Turk, M. Zorman, & M. Verlič (Eds.), Proceedings of the twentieth IEEE international symposium on computer-based medical systems (CBMS ’07), Maribor, Slovenia (pp. 383–388). Los Alamitos: IEEE.CrossRefGoogle Scholar
  43. Verdes, P. F. (2005). Assessing causality from multivariate time series. Physical Review E, 72(2), 026222–026229.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Joseph T. Lizier
    • 1
    • 2
  • Jakob Heinzle
    • 3
  • Annette Horstmann
    • 4
  • John-Dylan Haynes
    • 3
    • 4
    • 5
  • Mikhail Prokopenko
    • 2
    • 6
  1. 1.School of Information TechnologiesThe University of SydneySydneyAustralia
  2. 2.CSIROInformation and Communications Technology CentreEppingAustralia
  3. 3.Bernstein Center for Computational NeuroscienceCharité-Universitätsmedizin BerlinBerlinGermany
  4. 4.Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
  5. 5.Graduate School of Mind and BrainHumboldt Universität zu BerlinBerlinGermany
  6. 6.Max Planck Institute for Mathematics in the SciencesLeipzigGermany

Personalised recommendations