1 Introduction

The problem of the determination of the functional connectivity in brain has been for a long time in the center of attention of neurophysiologists, since it is crucial for understanding of information processing in brain. In the last years, one can observe increasing interest in the determination of directional connectivity. It is due to the tremendous increase of the experimental data gathered by the electrophysiological and fMRI techniques. The understanding of mental processes requires not only the information on localization but also on mutual relations between the activated structures. In this context the directed measures of connectivity showing the direction of information transfer are of special interest and the scope of this article will be concentrated on them.

Here, the development of the methods of estimation of the directed connectivity will be presented. First, the bivariate methods will be mentioned and their limitation will be pointed out. Next, the multivariate methods, in particular the methods based on the extension of Granger causality principle to the multichannel systems will be described. Finally, the methods to determine the dynamic evolution of transmission of brain activity will be presented.

2 Bivariate measures of connectivity

2.1 Correlation and coherence

For a long time the standard methods of establishing relations between signals have been cross-correlation and coherence (the formulas can be found in any textbook on signal analysis e.g., [30]). The amplitude of the correlation describes the similarity of the signals and, if for certain delay τ function R xy has maximum, we can assume that signal Y is delayed by τ in respect to signal X.

Cross-spectrum is a counterpart of cross-correlation in the frequency domain; it can be expressed as a Fourier transform of cross-correlation. Both functions are therefore statistically equivalent, however, they emphasize different features of signals cross-correlation—in time domain, cross-spectrum in frequency domain. Coherence is a normalized form of cross-spectrum. Based on the phase of coherence the delay between frequency components of the signals can be calculated, hence we may judge about the direction of propagation. However, we must remember that the phase is determined modulo 2π, therefore the direction may be sometimes misjudged.

Another problem concerning correlation and coherence between two signals is due to the fact that we do not know if the dependence between the signals does not come from the influence of the third channel which drives them. Bivariate coherence patterns usually show multiple connections of the strength decreasing with the distance and are therefore not very informative [21]. In case of bivariate estimators of direction the situation is even worse, since the propagation is detected always when there is a delay between the signals. The differences between bivariate and multivariate measures will be delineated in Sect. IV.

2.2 Non-linear measures of connectivity

Several non-linear measures of connectivity such as: non-linear correlation, mutual information, transfer entropy, general synchronization, and phase synchronization have been introduced (a review may be found in [34]). David et al. [11] compared performance of nonlinear measures: mutual information, generalized synchronization, and phase synchronization with linear measure—cross correlation for estimation of connectivity in modeled non-linear system. The authors have found higher sensitivity in case of nonlinear measures, when applied to non-linear systems. However, in the above article the noise influence was not taken into account and the investigated non-linear system consisted of two channels only. Netoff et al. [32] also compared cross-correlation with non-linear measures: mutual information, mutual information in two dimensions, phase correlation. Similarly to [11] the authors found that the nonlinear methods are more sensitive to detect non-linear coupling under ideal conditions. However, in the presence of noise cross-correlation was more robust. We read in the discussion of [32]: “We have been as guilty as any of our colleagues in being fascinated by the theory and methods of nonlinear dynamics. Hence we have continually been surprised by the robust capabilities of linear CC” (cross-correlation). It is worth mentioning that cross-correlation is not the best measure of connectivity. The methods based on multivariate AR model are even more robust in respect to noise, as it will be explained below. In addition, the non-linear measures are bivariate, so they suffer from the properties common to all bivariate estimators. The non-linear methods are prone to systematic errors e.g., connected with the choice of the interval in constructing probability function, also these approaches which rely on the embedding theorem require long stationary data. On the other hand, it has been demonstrated that linear estimators of connectivity such as e.g., Directed Transfer Function (DTF) are very robust to noise and perform quite well even in case of non-linear signals [42], therefore the non-linear approaches are hardly recommended, especially in case of EEG, where non-linear character of the signals is rather an exception as it was demonstrated by means of surrogate data tests [1, 39] and by comparison of linear and non-linear forecasting of EEG and ECoG time series [6].

3 Granger causality and granger causality index

The testable definition of causality was introduced in the field of economics by Granger [16]. Granger causality principle states that if some series Y(t) contains information in past terms that helps in the prediction of series X(t), then Y(t) is said to cause X(t). More specifically, if we try to predict a value of X(t) using p previous values of the series X only, we get a prediction error e:

$$ X(t) = \sum\limits_{i = 1}^{p} {A^{\prime}_{11} (j)X(t - j) + e(t)} . $$
(1)

If we try to predict a value of X(t) using p previous values of the series X and p previous values of Y we get another prediction error e 1:

$$ X\left( t \right) = \sum\limits_{j = 1}^{p} {A_{11} } \left( j \right)X\left( {t - j} \right) + \sum\limits_{j = 1}^{p} {A_{12} } \left( j \right)Y\left( {t - j} \right) + e_{1} \left( t \right). $$
(2)

If the variance of e 1 (after including series Y to the prediction) is lower than the variance of e we say that Y causes X in the sense of Granger causality. Similarly we can say that X causes Y in the sense of Granger causality when the variance of e 2 is reduced after including series X in the prediction of series Y:

$$ Y(t) = \sum\limits_{j = 1}^{p} {A_{22} (j)Y(t - j)} + \sum\limits_{j = 1}^{p} {A_{21} (j)X(t - j)} + e_{2} . $$
(3)

Granger causality index is based directly on the definition of causality, namely it shows, if the information contributed by second channel improves the prediction of first channel. The logarithm ratio of residual variances for one and two-channel models is computed:

$$ {\text{GCI}}_{1 \to 2} = \ln (e/e_{1} ). $$
(4)

This definition may be extended to the multichannel case by considering how the inclusion of given channels changes the residual variance ratios. GCI is an estimator working in the time domain. In many applications the estimators dependent on frequency are more appropriate. Such estimators derived from Granger causality concept, but operating on multichannel data, are: DTF and partial directed coherence (PDC).

4 Multivariate estimators of directedness

4.1 Multivariate autoregressive model

Granger causality was defined for two channels, however, Granger in his later study [17] pointed out that the causality principle holds only, if there are no other channels influencing the process. To account for the whole multivariate structure of a process of k channels the multichannel autoregressive model (MVAR) has to be considered.

For a multivariate k-channel process X(t):

$$ {\bf X}(t) = (X_{1} (t), \, X_{2} (t), \ldots , \, X_{k} (t)). $$
(5)

The model takes the form

$$ {\bf X}(t) = \sum\limits_{j = 1}^{p} {\bf A} (j)X(t - j) + {\bf E}(t), $$
(6)

where E(t) are vectors of size k and the coefficients A are k × k-sized matrices.

Equation 4 can be easily transformed to describe relations in the frequency domain. After changing the sign of A and application of Z transform we get:

$$ \begin{aligned} {\bf E}(f) &= {\bf A}(f){\bf X}(f) \hfill \\ {\bf X}(f) &= {\bf A}^{ - 1} (f) {\bf E}(f) = {\bf H}(f){\bf E}(f) \hfill \\ {\bf H}(f) &= \left( {\sum\limits_{m = 0}^{p} {{\bf A}(m)\exp ( - 2\pi imf\Updelta t)} } \right)^{ - 1} . \hfill \\ \end{aligned} $$
(7)

From the form of the above equations we see that the model can be considered as a linear filter with white noises E(f) on its input and the signals X(f) on its output. The matrix of filter coefficients H(f) is called the transfer matrix of the system. It contains information about all relations between data channels in the given set including the phase relations between signals. From transfer matrix cross-spectra and partial coherences can be found. Partial coherence is given by the formula:

$$ C_{ij} (f) = {\frac{{{\bf M}_{ij} (f)}}{{\sqrt {{\bf M}_{ii} (f){\bf M}_{jj} (f)} }}}, $$
(8)

where M ij is a minor of spectral matrix (matrix of spectra and cross-spectra) with the i-th row and j-th column removed. Partial coherence is non-zero only when the given relation between channels is direct. If a signal in a given channel can be explained by a linear combination of some other signals of the set, the partial coherence between them will be low.

The estimation of the coefficients of MVAR is based on the calculation of covariance matrix, therefore additional correlations between channels should not be introduced. The signals should be referenced in respect to the channel which is not involved in the model estimation (e.g., “linked ears”). Common average, bipolar derivation or Hjorth transform must not be used, since they disturb the correlation structure between the signals. The mean value of each signal should be subtracted and it is recommended to divide the signal by the square root of its variance.

The MVAR model is a sort of a filter which separates noise from the signal. This property follows directly from Eq. 7. Therefore, MVAR is especially suitable for analysis of noisy data. For the excessively smoothed time series, where the random component is suppressed, the difficulties in fitting the model may occur. The AR spectral estimates have better statistical properties than FFT estimates, which is easy to see comparing smooth AR power spectral estimates to the fluctuating estimates obtained by means of Fourier transform. The measures of connectivity derived from the MVAR in virtue of model properties are also very robust in respect to noise. It was reported in [20] that for 3-channel model the propagations were correctly estimated by means of DTF when the amplitude of noise was 3 times as big as a signal itself. For biomedical time series where the contribution of noise is quite high the estimates of connectivity based on MVAR are recommended.

4.2 Directed transfer function

Based on the properties of the transfer function of MVAR, DTF was introduced [20] in the form:

$$ {\text{DTF}}_{j \to i}^{2} (f) = {\frac{{\left| {H_{ij} (f)} \right|^{2} }}{{\sum\limits_{m = 1}^{k} {\left| {H_{im} (f)} \right|^{2} } }}}. $$
(9)

The DTF describes causal influence of channel j on channel i at frequency f. The above equation defines a normalized version of DTF, which takes values from 0 to 1 producing a ratio between the inflow from channel j to channel i to all the inflows to channel i.

The non-normalized DTF which is directly related to the coupling strength [22] is defined as:

$$ \theta_{ij}^{2} (f) = \left| {H_{ij} (f)} \right|^{2} . $$
(10)

The DTF found many applications e.g.,: for localization of epileptic foci [14], for estimation of EEG propagation in different sleep stages and wakefulness [21], for determination of transmission between brain structures of an animal during a behavioral test [24], for estimation of cortical connectivity [2], [3] and many others.

The DTF shows not only direct, but also cascade flows, namely in case of propagation 1 → 2 → 3 it shows also propagation 1 → 3. In order to distinguish direct from indirect flows direct Directed Transfer Function (dDTF) was introduced [25].

The dDTF is defined as a multiplication of a modified DTF by partial coherence. The modification of DTF concerned normalization of the function in such a way as to make the denominator independent of frequency. The dDTF (χ ij (f)) showing direct propagation from channel j to i is defined as:

$$ \begin{aligned} \chi_{ij}^{2} (f) &= F_{ij}^{2} (f)C_{ij}^{2} (f) \hfill \\ F_{ij}^{2} (f) &= {\frac{{\left| {H_{ij} (f)} \right|^{2} }}{{\sum\limits_{f} {\sum\limits_{m = 1}^{k} {\left| {H_{im} (f)} \right|^{2} } } }}}, \hfill \\ \end{aligned} $$
(11)

where C ij (f) is partial coherence. χ ij (f) has a nonzero value when both functions F 2 ij (f) and C 2 ij (f) are non-zero, in that case there exists a direct causal relation between channels j → i.

Distinguishing direct from indirect transmission is essential in case of signals from implanted electrodes, for EEG signals recorded by scalp electrodes it is not really important [27].

The DTF and dDTF show propagation when there is a phase difference between signals, they have non-zero value only when there is a phase difference between signals from different derivations. Volume conduction is a zero phase propagation, therefore no phase difference between channels is generated, so in theory volume conduction should not have any influence on DTF results. In practice it has some minor influence e.g., increasing the noise level, however, this influence is not critical, it is much less important than in case of other methods.

In [2] functional connectivity was evaluated by application of DTF to the cortical signals estimated by means of the linear inverse procedure [18]. The procedure returned the amplitude values of EEG on the cortex, however, the phases of the signals were changed by the inverse procedure which influenced the results. They show the causality dependencies between the cortical signals, not exactly the direction of the propagating EEG activity.

4.3 PDC

The PDC was defined by Baccala and Sameshima in 2001 [4] in the following form:

$$ P_{ij} (f) = {\frac{{A_{ij} (f)}}{{\sqrt {{\bf a}_{j}^{*} (f){\bf a}_{j} (f)} }}}. $$
(12)

In the above equation A ij (f) is an element of A(f)—a Fourier transform of MVAR model coefficients A(t), where a j (f) is j-th column of A(f) and the asterisk denotes the transpose and complex conjugate operation. Although it is a function operating in the frequency domain, the dependence of A(f) on the frequency has not a direct correspondence to the power spectrum. From normalization condition it follows that PDC takes values from the interval [0,1]. PDC shows only direct flows between channels. Unlike DTF, PDC is normalized to show a ratio between the outflow from channel j to channel i to all the outflows from the source channel j, so it emphasizes rather the sinks, not the sources.

In neurophysiological applications rather sources, not the sinks are of primary interest, therefore later on the estimator called Generalized Partial Directed Coherence (GPDC) was proposed [5], where normalization factor in the denominator similar to the one applied in DTF was introduced. GPDC is given by the formula:

$$ {\text{GPDC}}_{j \to i} (f) = {\frac{{A_{ij} (f)}}{{\sum\limits_{i = 1}^{k} {\left| {A_{ij} (f)} \right|}^{2} }}}. $$
(13)

Independently, the normalization similar to the one given by formula (13) was proposed in [38]. It has been pointed out in [38] that not renormalized PDC has several drawbacks, namely: (i) PDC is decreased when multiple signals are emitted from a given source, (ii) PDC is not scale-invariant, since it depends on the units of measurement of the source and target processes, and (iii) PDC does not allow conclusions on the absolute strength of the coupling. These disadvantages are alleviated in case of GPDC. PDC and GPDC, similarly to DTF are insensitive to the volume conduction.

4.4 Bivariate versus multivariate estimators of connectivity

The differences between multivariate and bivariate estimators of connectivity may be illustrated by a simple example. Let us consider the common situation when a signal is measured at different distances from the source. Such a case corresponds to the simulation scheme shown at the top of Fig. 1. The EEG signal from the channel 1 is propagating with different delays D to channels 2, 3, 4, and 5. At each step random white noise is added. For bivariate measure of connectivity (DTF) based on 2 channel AR model the flows of activity are observed in each case when there is a delay between two channels. In case of DTF estimated from MVAR model encompassing all channels only the propagation from channel 1 is observed, in agreement with the simulation scheme. This observation is true for any bivariate measure, no matter how it is calculated [7], [27]. No wonder that for bivariate measures very dense patterns of propagation are found e.g., [11] and it is practically impossible to find the sources of propagation. In contrast, the DTF results usually show a few significant sites from which activity is propagating e.g., [8], [9], [14], [21], [28], [29]. This point is also illustrated in Fig. 2. More examples of comparing different methods of directionality estimation may be found in [27] including application to real experimental data. Namely transmission patterns of EEG are estimated for an awake state, eyes closed. It is known that in this state the activity is propagating from the posterior structures of the brain, with some weaker sources in front. This kind of pattern was found by means of DTF calculated from MVAR. For bivariate measure the pattern is disorganized, even the reversal of propagation was observed [27].

Fig. 1
figure 1

Comparison of bivariate and multivariate methods of estimation of directed connectivity. Top simulation scheme (D delay value, at each step white noise is added). Bottom connectivity measures, at the left bivariate, at the right multivariate. Propagation from the channel marked above the column to the channel marked at left. In each box DTF is shown as a function of frequency. At the diagonal power spectra. At the very bottom obtained connections schemes

Fig. 2
figure 2

The snapshots from the movie showing the propagation in the gamma band (35–41 Hz) during the right finger movement (at the right) and movement imagination (at the left) 0.3 s after the cue (upper panel) and 1.6 s after the cue. Before model fitting, the signals were high-pass filtered above 15 Hz. In case of the performed movement, after the cue appearance there is a short burst of propagation from electrode C3 overlying the motor cortex of the finger. In case of the imagination gamma propagation starts later than in case of the movement and the cross-talk between the electrodes overlying sensorimotor areas are observed

The DTF and PDC first found the applications in analysis of EEG or ECoG. More recently methods based on MVAR start to be applied also to the fMRI signals [36] and in multimodal integration of EEG and fMRI [3]. However, in case fMRI signals, because of poor time resolution, only main connections acting during a whole task may be identified. The dynamical fast transmissions involved in the information processing may be identified by means of time-varying estimators based on MVAR.

5 Estimators of dynamical propagation

5.1 Short-time DTF

In order to grasp the dynamic changes of propagation, the method of adaptive filtering or the method based on the sliding window may be applied. Both methods require multiple repetition of the experiment to obtain the statistically satisfactory results. In the case of parametric model the number of data points kN T (k-number of channels, N T -number of points in the data window) has to be bigger (preferably by order of magnitude) than the number of parameters, which in case of MVAR is equal to k 2 p (p model order). In order to evaluate dynamics of the process short data window has to be applied, which requires the increase of the number of the data points. This may achieved by means of the repetition of the experiment. In case when multiple trials are available, it is possible to apply ensemble averaging over realizations. We divide a non-stationary recording into shorter time windows, short enough to treat the data within a window as quasi-stationary. The estimation of MVAR coefficients is based on calculation of the correlation matrix R ij of k signals X i from multivariate set [21]. We calculate the correlation matrix between channels for each trial separately. The resulting model coefficients are based on the correlation matrix averaged over trials. The correlation matrix has a form:

$$ \tilde{R}_{ij} (s) = {\frac{1}{{N_{T} }}}\sum\limits_{r = 1}^{{N_{T} }} {R_{ij}^{(r)} (s) = {\frac{1}{{N_{T} }}}\sum\limits_{r = 1}^{{N_{T} }} {{\frac{1}{{N_{S} }}}\sum\limits_{t = 1}^{{N_{S} }} {X_{i}^{(r)} (t)X_{j}^{(r)} (t + s)} } } . $$
(14)

The averaging concerns correlation matrices (model is fitted independently for each short data window); the data is not averaged in the process. The choice of the window size is always a compromise between quality of the fit and time resolution.

The errors of the SDTF may be evaluated by means of bootstrap method [13]. This procedure corresponds to simulations of the another realizations of the experiment. The variance of the function value is obtained by repeated calculation of the results for a randomly selected (with repetitions) pool of the original data trials [15], [22].

5.2 Kalman filter

An approach alternative to SDTF is a time-continuous fit of MVAR model, which may be performed in an adaptive way by means of Kalman filter. The method assumes that at time t the system is in a hidden state x(t), which means that x(t) cannot be directly observed. The observer knowledge about the state of the system comes from the measurement y(t), which is distorted by noise w(t), which can be expressed as:

$$ y(t) = {\bf M}(t)x(t) + w(t), $$
(15)

where M(t) is the matrix describing the linear operation of taking observation. It is assumed that that the measurement noise w(t) comes from a zero mean normal distribution.

The current state of the system x(t) is assumed to depend only on the previous state x(t1), on the current value of the control vector u(t) and current value of a random perturbation v(t). Formally it can expressed as:

$$ x(t) = {\bf D}(t)x(t - 1) + {\bf B}(t)u(t) + v(t), $$
(16)

where D(t) is the state transition matrix, B(t) is the matrix transforming the control input, v(t) is the process noise coming from the zero mean normal distribution. The noise vectors at each step are all assumed to be mutually independent.

Kalman filter estimates the state of a system by means of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of noisy measurements. Two phases of the computation may be distinguished: predict and update.

The predict phase uses the state estimate from the previous time step to produce estimate at the current time step. In the update phase the current a priori prediction is combined with current observation to refine the a posteriori estimate. Excellent introduction to the Kalman filter may be found in [41].

The adaptation speed of Kalman filter is controlled by the parameter describing the assumed variance of the measurement random component. If we set that parameter too high, the adaptation will proceed slowly, possibly missing important faster phenomena in the investigated signal. For too low value of the adaptation parameter, the filter will follow all fluctuations of the signal resulting in noisy and unstable estimate. Since the method is very sensitive to the adaptation parameter, it has to be chosen carefully depending on the data.

5.3 Application of time-varying connectivity estimators

In case of Kalman filter approach the computation time rises fast with the number of channels and number of realizations. Taking into account that the estimation of the errors are usually based on of bootstrap methods, which involves repetition of computations hundredths of times, the computation time in Kalman filter approach (much longer than in case of SDTF) can be a serious drawback, which hampers its wide application. In most publications concerning adaptive methods for MVAR a few, usually only two channels were considered e.g., [19, 31, 37].

An example of application of time-varying formulation of PDC based on adaptive MVAR model was a study of a foot movement task [12]. Time-dependent MVAR parameter matrices were estimated by means of the recursive least squares algorithm with forgetting factor (RLS), as described in [31]. The MVAR model was fitted to the signals representing the current density on cortex found by solving the linear inverse problem [18]. The dense connections obtained between 16 regions of interest formed a complicated pattern and to find the meaningful transmissions the theoretical graph indexes [33] were applied.

One of the first applications of SDTF was determination of the dynamic propagation during performance of finger movement and its imagination [15], [28]. The results were coherent with the known phenomena of event related synchronization (ERS) and desynchronization (ERD) [35]. Namely during the movement the decrease of activity in alpha and beta bands (ERD) in primary motor area corresponding to the given part of the body is observed and later increase of beta activity called beta rebound follows. In gamma band brief increase during movement was reported [35]. These findings corresponded very well with the results obtained by means of SDTF; namely: the decrease of propagation in alpha and beta band during movement (most pronounced in electrode C3 overlying finger motor area) and subsequent fast increase of propagation in beta band. The short burst of gamma propagation from C3 was accompanying the finger movement. In case of movement imagination this propagation started later and a cross-talk between different sites overlying motor area and supplementary motor area was observed (Fig. 2) (the dynamics of propagation may be observed in animations available at: http://brain.fuw.edu.pl/~kjbli/DTF_MOV.htm). This kind of transmissions are compatible with the neurophysiological hypotheses concerning the interactions of brain structures during simple and complex tasks and modeling studies of so-called surround effect [40].

Another applications of SDTF concerned evaluation of transmission during cognitive experiments. The results of the Continuous Attention Test (CAT) [9], [29] confirmed the engagement of pre-frontal and frontal structures in the task and supported the hypothesis of an active inhibition by pre-supplementary motor area and right inferior frontal cortex. The results obtained by means of SDTF in experiments involving working memory were compatible with fMRI studies on the localization of the active sites and supplied the information concerning the temporal interaction between them [8], [10]. The time-varying transmissions evaluated by means of SDTF in motor tasks experiments are presented at the website http://brain.fuw.edu.pl/~kjbli/DTF_MOV.html in the form of movies and the animations of propagation during CAT test are available at URL http://brain.fuw.edu.pl/~kjbli/CAT_mov.html.

The SDTF may be applied as well for the spike trains. By means of procedure based on low-pass filtering the point processes may be transformed into continuous signals which can become an input to DTF analysis [23].

In the above described contributions we have used SDTF, since the experiments involved scalp electrodes. The comparison between dDTF and DTF for scalp electrodes was made in [27]. In case of dDTF the propagation along anatomical tracts is accentuated, however, we find that the ways of propagation in case of scalp electrodes are not really interesting. Really interesting is which structures are communicating. This aspect is better visible in case of DTF or SDFT than for dDTF, so in case of scalp derivations there is no need to use dDTF.

The situation is different in case of implanted or cortical electrodes. The Short-time direct Directed Transfer Function (SdDTF)—combination of SDTF and dDTF—was used in the analysis of ECoG activity during word repetition [26]. The transmission between brain structures, involved in speech understanding and processing of the verbal information, was found. In the above quoted article, the method of the evaluation of the significance of results by means of semiparametric method was proposed, which solved the problem of multiple repetition effect present in case of time–frequency distributions.

6 Conclusions

We can conclude that multivariate methods based on multichannel autoregressive model such as DTF and PDC are capable to identify the causal relations between the signals and determine the directed propagation of EEG activity as a function of frequency. The frequency dependence of estimators is an important feature, since different EEG rhythms have different role in the information processing. DTF and PDC are based on phase differences between channels, hence they are insensitive to volume conduction and very robust in respect to noise. They can be applied for different kinds of data including point processes. The dynamic propagation may be found by application of ensemble averaging and sliding window or adaptive methods. The physiological evidence obtained by means of the parametric, MVAR-based methods, demonstrate their usefulness in brain studies.