Time domain measures of inter-channel EEG correlations: a comparison of linear, nonparametric and nonlinear measures

Correlations between ten-channel EEGs obtained from thirteen healthy adult participants were investigated. Signals were obtained in two behavioral states: eyes open no task and eyes closed no task. Four time domain measures were compared: Pearson product moment correlation, Spearman rank order correlation, Kendall rank order correlation and mutual information. The psychophysiological utility of each measure was assessed by determining its ability to discriminate between conditions. The sensitivity to epoch length was assessed by repeating calculations with 1, 2, 3, …, 8 s epochs. The robustness to noise was assessed by performing calculations with noise corrupted versions of the original signals (SNRs of 0, 5 and 10 dB). Three results were obtained in these calculations. First, mutual information effectively discriminated between states with less data. Pearson, Spearman and Kendall failed to discriminate between states with a 1 s epoch, while a statistically significant separation was obtained with mutual information. Second, at all epoch durations tested, the measure of between-state discrimination was greater for mutual information. Third, discrimination based on mutual information was more robust to noise. The limitations of this study are discussed. Further comparisons should be made with frequency domain measures, with measures constructed with embedded data and with the maximal information coefficient.


Introduction
The connectivity of the human central nervous system is its most distinctive feature. Classically connectivity was investigated anatomically. An alternative view emerged in the twentieth Century which emphasized the movement of information. Like many concepts, the seemingly straightforward idea of connectivity was found to be far more complicated than originally anticipated when it was examined with sufficient care. This can be seen in the report of the 2002 Functional Connectivity Workshop (Lee et al. 2003). Three distinct conceptualizations of connectivity have emerged: anatomical, functional and effective. Anatomical complexity might seem to be the least problematical, and arguably it is, but nonetheless complications present themselves. A complete anatomical description requires not merely knowledge of geometrical proximity but an understanding of receptor subtypes and the availability of neurotransmitters (Lee et al. 2003). Functional connectivity is defined as the ''temporal correlations between spatially remote neurophysiological events'' (Friston et al. 1993a), and effective complexity is defined as ''the influences that one neural system exerts over another either directly or indirectly'' (Friston et al. 1993b). Horowitz (2003), using the word ''elusive,'' found that all three conceptualizations of connectivity present subtleties of definition and that these problems were compounded when an attempt was made to integrate results obtained from different observational technologies. His analysis led to three conclusions. First, ''we should think of functional (and effective) connectivity not as a single concept or quantity, but rather as forming a class of concepts with multiple members.'' Second, ''functional and effective connectivity must be operationally defined by each investigator who evaluates these quantities.'' Third, ''it is crucial to relate each of the macroscopic definitions to an underlying neural substrate.' ' Fingelkurts et al. (2005) concurred in recognizing that theoretical and methodological clarifications are needed to bring precision to the analysis of CNS connectivity. They argue that the time scale of neuroanatomical change is such that an examination of anatomical connectivity cannot provide a basis for a dynamical investigation of perceptual and cognitive processes. They further argue that effective connectivity is identified by first establishing functional connectivity and combining it with a model specifying the causal links between participating units. They therefore conclude that ''functional connectivity is the most central and challenging of the three conceptions of brain connectivity for theories about neural interactions.'' Given the millisecond time scale of dynamical behavior in the central nervous system, Fingelkurts et al. argue for an essential role of EEG and MEG in investigations of functional connectivity. We concur, and the analysis of temporal correlations of EEG signals is the focus of this contribution. Four time domain procedures for quantifying correlations are compared. A physiological criterion, the ability to discriminate between behavioral states, is used as an adjudicating criterion. Additional measures that should be incorporated in an expanded study are considered in the ''Discussion'' section of this paper.
When using scalp EEG signals in the analysis of functional connectivity an additional question should be considered. Can the analysis be conducted with the original scalp signals, or is it essential to transform these signals to provide an estimate of the current source density? It is not our present purpose to participate in this debate. Conclusions about the comparative effectiveness of different measures for identifying correlations in scalp signals, which is our objective, will be applicable to calculations with current source density estimates. Two additional observations in this regard can be made. First, in practice, calculations should be performed with both original voltage signals and with transformed signals, and the results should be compared. Second, we should bear in mind Horwitz's very valuable observation that each investigator should define the operational definition of connectivity being implemented.
The earliest example of interregional EEG correlation measurement that has come to our attention is Imahori andSuhara (1949 cited by Gevins 1987) where hand calculated autocorrelations of short EEG segments were presented. The use of autocorrelation and cross-correlations to study electroencephalograms is reported to have been suggested by Norbert Wiener in 1949 to a group of researchers at the Massachusetts General Hospital (Barlow 1997). Among this group were Mary Brazier and James Casby who in 1950 started their pioneering work on correlation analysis of the EEG using an electronic digital correlator at the Massachusetts Institute of Technology (Brazier and Casby 1952). An important continuing application of cross-correlation calculations is the correlation of EEGs with templates of averaged event related potentials where the procedure is used to locate single trial event related potentials, ERPs, in background EEG signals (McGillem andAunou 1987 reviewed by Spencer 2005). This procedure was introduced by Woody (1967) to detect epileptic spikes. It was first applied to ERP signals by Kutas et al. (1977). This method continues to be applied in the analysis of epileptic seizures (Filligoi et al. 2011) and in the construction of brain computer interfaces (Cabestang et al. 2007).
The study of CNS correlations evolved to include more sophisticated measures. An important step in this evolutionary process was the introduction of mutual information, a nonlinear measure of correlation, to the analysis of EEGs. The earliest application of mutual information in electroencephalography that we have seen is Callaway and Harris (1974) where it was called the coefficient of information transmission. In this application, mutual information was not calculated directly from voltage time series. Digitizing at 250 Hz, each entry was coded for polarity (positive or negative) and derivative (increasing or decreasing). Callaway and Harris showed that a reading task increased occipital to left hemisphere coupling while a visual processing task increased occipital to right hemisphere coupling. In a subsequent publication (Yagi et al. 1976), Callaway and his colleagues investigated the sensitivity of this measure to epoch length and sampling frequency. Mars and Lopes da Silva (1987) showed that mutual information can identify significant correlations that are not detected by linear measures. Other applications of this measure in electroencephalography were published by Xu et al. (1997), Albano et al. (2000) and Chen et al. (2000). A limiting factor in use of mutual information has been data requirements for the estimation, computational times and uncertainty about the accuracy of the estimate. This point is addressed presently.
While being a problem of general interest in CNS physiology, the quantitative characterization of interregional correlations are of particular importance in the study of traumatic brain injury. The development of current thought about functional connectivity following TBI has many contributors, but two individuals who must appear in any account of this historical process are John Hughlings-Jackson  and Kurt Goldstein (1878-1965. Hughlings-Jackson and Goldstein both concluded that the recovery of function, typically partial recovery, following brain injury argued against a strong localization model of CNS organization (Hughlings-Jackson 1874, 1882Goldstein 1934). In addition to rejecting strong localization, Goldstein's work with CNS injured soldiers following World War I led him to conclude that recovery did not result from repair but rather from adaptation (Zeitlinger 2001). Hughlings-Jackson's and Goldstein views concerning nonlocalization of deficit are consistent with recent research identifying failures of distributed synchronous networks in the etiology of neuropsychiatric disorders (Herrmann and Demiralp 2005;Schnitzler and Gross 2005;Stam 2005;Uhlhaas and Singer 2006). While Goldstein's views on the failure of repair and his emphasis on adaptation following traumatic brain injury must be reconsidered in the light of the discovery of neurogenesis in the adult mammal, evidence indicates that at least for the immediate present they are still essentially correct. This process of adaptation would, one predicts, result in altered patterns of correlations in the postinjury central nervous system. This expectation has been realized in the recent literature (see Table 1 below, these are representative examples drawn from a large literature). In summary, studies of altered functional connectivity following traumatic brain injury utilize three kinds of data, EEG signals, MEG signals and fractional anisotropy measures of axonal tracts characterized by diffusion tensor imaging. This contribution is directed to EEG-based assessments. Three classes of analysis measures are used in these EEG studies, time domain measures, frequency domain measures and measures constructed with embedded data. The focus here is on time domain measures. We explicitly recognize that further comparative studies should include the additional measures described in the ''Discussion'' section of this paper.

Correlation measures assessed
Four time domain measures for quantifying relationships between time series are compared in this investigation: Pearson product moment correlation, Spearman rank order correlation, Kendall rank order correlation and mutual information. These measures will be used to quantify between-channel correlations in EEGs recorded from healthy participants in two behavioral conditions: eyes open, no task and eyes closed, no task. The psychophysiological utility of each measure is assessed by determining its ability to discriminate between these conditions. A brief presentation of the mathematical properties of these measures is given in the ''Appendix''. Qualitative descriptions are given here. The Pearson product moment correlation quantifies linear correlations between variables. The Spearman rank order correlation is the product moment correlation of ranks, and the Kendall rank order Alzheimer's disease Georgopoulos et al. (2007), Güntekin et al. (2008), Locatelli et al. (1998), Rosenbaum et al. (2008), Stam et al. (2006Stam et al. ( , 2007aStam et al. ( , b 2009), Zhou et al. correlation uses the relative ordering of ranks. The mutual information of two time series is the average number of bits of each that can be predicted by measuring the other. The numerical estimation of mutual information can be computationally demanding, and the accuracy of the estimate can be sensitive to the algorithm used. This was demonstrated by the comparison studies conducted by Quian Quiroga et al. (2002) and by Duckrow and Albano (2003). In a valuable study, Quian Quiroga et al. compared five measures of interhemispheric correlations (nonlinear dependencies, phase synchronization, mutual information, cross correlation and coherence). Except for mutual information, the measures showed qualitatively similar results, and, importantly the computations identified interhemispheric dependencies that were not apparent on conventional visual examination performed by a Board certified electroencephalographer. Quian Quiroga et al. used a fixed bin-width histogram method for estimating the joint probability distributions. Estimating the joint probability distribution is a critical element in the estimation of mutual information (see the ''Appendix'' for the mathematical details). Using the same data, Duckrow and Albano used the Fraser-Swinney (1986) adaptive partition when estimating joint probability distributions. This computation of mutual information produced results consistent with the other measures. Several methods for estimating mutual information are reviewed in Khan et al. (2007). In the calculations presented here, we used the algorithm constructed in Cellucci et al. (2005). This is a computationally efficient procedure. In test calculations it requires 0.5 % of the computation time required by the Fraser-Swinney algorithm (comparison calculations reported in Cellucci et al. 2005). Also, in contrast with other algorithms, the Cellucci algorithm incorporates an explicit calculation of the probability of the null hypothesis of no predictive relationship between the two variables. This statistical validation is particularly important in calculations with noisy psychophysiological data.
An important property of mutual information is identified by examining the computational results presented in Fig. 1 and in Table 2 (modified from Cellucci et al. 2005 following an example in Mars and Lopes da Silva 1987). The first test signal consists of normally distributed random numbers. With each measure, the probability of the null hypothesis is significantly greater than zero. That is, each measure correctly failed to detect a nonrandom relationship between variables X and Y. In the case of linearly correlated signals each measure reports a P NULL that is numerically indistinguishable from zero. Again, this is as it should be. An important distinction between measures is seen when the third signal, which is parabolically correlated, is examined. The Pearson product moment correlation failed to detect a linear correlation, P NULL = 0.9912.
The Spearman and Kendall measures which can identify monotonic nonlinear relationships also failed to reject the null hypothesis; P NULL = 0.9928 and P NULL = 0.9989 respectively. In contrast, mutual information identified a nonrandom relationship in parabolic data. The reported probability is of null hypothesis is indistinguishable from zero.
An additional lesson can be learned by considering the example shown in Fig. 2. In this system of paired signals X = 0-6 in steps of 0.0006 and where, as before, e is normally distributed with zero mean and unit variance.  Table 2. In all cases x = -3 to ?3 in steps of 0.0006. a Normally distributed random numbers with zero mean and unit variance. b y = x ? 0.2 9 e, where e is the first test signal. c y = x 2 ? 0.2 9 e. Ten thousand points were used in the calculations. Every tenth point is plotted on the diagram (modified from Cellucci et al. 2005) correlation fail to reject the null hypothesis. For these measures, P NULL is 0.959, 0.964 and 0.944 respectively. Mutual information, however, continues to identify a nonrandom relationship and P NULL remains zero. Thus in the case of the three classical measures of correlation we have the seemingly paradoxical result that evidence for a relationship is lost as more data are available. Two conclusions follow from the examples considered here.
(1) Nonlinear measures should be used in combination with linear and nonparametric measures. (2) Evidence for time domain correlation should be examined as a function of epoch duration.

Electroencephalographic data
The University's Institutional Review Board reviewed and approved all procedures involving human subjects.
Informed consents were obtained from each participant. There were thirteen participants. Participants were healthy adults without a history of head injury or serious psychiatric illness. Multichannel monopolar recordings, referenced to linked earlobes, were obtained from F Z , C Z , P Z , O Z , F 3 , F 4 , C 3 , C 4 , P 3 , and P 4 using an Electrocap and Sensorium EPA-6 amplifiers. Vertical and horizontal eye movements were recorded from electrode sites above and below the right eye and from near the outer canthi of each eye. Artifact corrupted records were removed from the analyses. Artifact corruption was defined as an amplitude difference greater than 120 lV peak-to-peak within 500 msec or a blink in the EOG channel. All EEG impedances were less than 5 KOhm. Signals were amplified, Gain = 18,000, and amplifier frequency cutoff settings of 0.03 and 200 Hz were used. Signals were digitized at 1,024 Hz using a twelve-bit digitizer. Multichannel records were obtained in two conditions: eyes closed, resting and eyes open, resting. Continuous artifact-free records were obtained from each subject in the two conditions (eyes open and eyes closed). Given the results shown in Fig. 2, measures were calculated as a function of epoch duration (1-8 s).

Comparing measures in between-state discriminations
The psychophysiological utility of each measure was assessed by determining its ability to discriminate between eyes open, no task and eyes closed, no task conditions. For concreteness of presentation, the experiment is described by considering the first measure, the product moment correlation which is denoted by r. The EEGs are ten-channel recordings. Thus for a single participant there are 45 distinct channel pairs. The correlation between channel i and channel j, r ij , is measured in each condition to give 45 values of (r ij ) closed and 45 values of (r ij ) open . The operational question becomes can we discriminate between states by comparing (r ij ) closed against (r ij ) open ? As noted above, there were thirteen participants in the study. This gives 585 (number of participants 9 number of channel pairs) (r ij ) closed versus (r ij ) open pairs. They are compared in a paired t test. The test produces a value of t and the corresponding probability of the null hypothesis. In this application the null hypothesis supposes that there is no difference in betweenchannel correlations in the eyes open and eyes closed correlation. A high value of t, and hence a low value of P NULL , indicates a successful discrimination. This process is performed for all four measures. As operationalized in this study, the comparative assessment of these measures of correlation can now be stated in a single question. Which measure gives the largest value of t and lowest values of P NULL ? Concerns have been expressed (Gevins 1987) about the amount of data required to estimate mutual information. The calculations have, therefore, been repeated for 1, 2, …, 8 s epochs.
The values of these four measures are shown in Fig. 3. The results are consistent with expectations. There is a greater between-channel correlation (Pearson, Spearman, Kendall) in the eyes closed condition. Similarly, there is a greater between-channel predictability (mutual information) in the eyes closed condition.
The uncertainties shown in Fig. 3  The null hypothesis is, however, rejected for 1 s durations by mutual information where P NULL \10 -5 . All four measures reject the null hypothesis at epoch durations greater than or equal to 2 s. In all cases, the value of t obtained with mutual information is greater than the value obtained with the other measures. A further understanding of the between state discrimination can be obtained by examining the restatement of the results that is given in the second panel of the diagram where -log 10 (P NULL ) is plotted as a function of epoch duration. A value of ?5, for example, on this graph corresponds to P NULL = 10 -5 The values of -log 10 (P NULL ) obtained with mutual information are consistently greater than those obtained with the other measures.

Robustness to noise
Gevins (1987) raised questions concerning the sensitivity of mutual information calculations to noise. Notably, he did so in the context of the Callaway and Harris (1974) study where the voltage time series were encoded by polarity and sign of the derivative. We have investigated noise sensitivity in the case of direct voltage time series calculations by testing the robustness of these measures to additive noise. All four measures were found to be robust to noise, but as in the previous calculations, mutual information outperformed the other three measures. In this experiment, normally distributed random numbers with zero mean were added to each of the original EEG signals. The random number generator was based on Park and Miller (1988) and incorporated a Bays-Durham shuffle (Knuth 1981) followed by a Box-Muller transformation (Press et al. 1992). The variance of the additive noise was progressively increased to give signal to noise ratios of 10, At SNR = 10 dB all four measures failed to discriminate between conditions when 1 s epochs were examined. All four measures successfully made the discrimination for greater epoch lengths, but as in the case of uncorrupted signals, a greater statistical separation was obtained with mutual information.
At higher noise levels (lower SNR) the degree of between state discrimination as quantified by P NULL is reduced, but the pattern observed with SNR = 10 dB is preserved. Specifically, all four measures fail to discriminate between eyes closed and eyes open with 1 s epochs. All four measures successfully discriminate at longer epochs, and the degree of discrimination obtained with mutual information is greater than that observed with the other three measures.

Discussion
Three results were obtained in these calculations. First, a nonlinear measure, mutual information, effectively a Comparison of correlation measures using original data from 13 subjects. As before, squares identify results from the Pearson product moment correlation. Diamonds identify results from the Spearman rank order correlation. The letter x identifies results from the Kendall rank order correlation and circles identify results obtained with mutual information. b Comparison of correlation measures using data from 13 subjects following addition of gaussian noise giving signal to noise ratios of SNR = 10 dB. Symbols identifying different measures follow the pattern of a. c Comparison of correlation measures using data from 13 subjects following addition of gaussian noise giving signal to noise ratios of SNR = 5 dB. Symbols identifying different measures follow the pattern of a. d. Comparison of correlation measures using data from 13 subjects following addition of gaussian noise giving signal to noise ratios of SNR = 0 dB. Symbols identifying different measures follow the pattern of a. e Example segment of an EEG signal recorded from a single subject at electrode site P z in the eyes closed condition. f. Component of the EEG signal shown in e after addition of gaussian noise, SNR = 10 dB (shown in black). The original signal is shown in red for comparison. g Component of the EEG signal shown in e after addition of gaussian noise, SNR = 5 dB (shown in black). The original signal is shown in red for comparison. h Component of the EEG signal shown in e after addition of gaussian noise, SNR = 0 dB (shown in black). The original signal is shown in red for comparison Cogn Neurodyn (2014) 8:1-15 7 discriminated between states with less data, specifically a 1 s epoch, when other measures failed to discriminate between conditions. Second, at all epoch durations tested, the measure of between-state discrimination was greater for mutual information. Third, discrimination based on mutual information was more robust to noise.
The limitations of this study should be recognized. Three points should be addressed. First, the study is based on signals obtained from thirteen participants. Because the method that is best for one database is not necessarily best in all cases, a different outcome may be obtained with different data. Second, in this study the test criterion was the ability to discriminate between the eyes-open and eyesclosed condition. It is possible that a different measure, a measure other than mutual information, would be more effective if a different test criterion was implemented. Third, this study was limited to a comparison of four time domain measures of correlation. Several other measures have been used to quantify correlation and should be considered. Reshef et al. (2011) have constructed a maximal information criterion that has some properties in common with mutual information. Additional methods include coherence (Nunez et al. 1997(Nunez et al. , 1999, phase locking index Hurtado et al. 2004;Sazonov et al. 2009), imaginary coherency (Stam et al. 2007a, b;Nolte et al. 2004) and phase lag index (Stam et al. 2007a(Stam et al. , b, 2009. As outlined by several authors (Cao and Slobounov 2010;Schiff 2005;Guevara et al. 2005), care must be exercised in the application of these procedures. Recently more sophisticated procedures for assessing correlation have been investigated. Stam and van Djik (2002) and Wendling et al. (2009) have used methods based on embedded data (Takens 1981) to quantify correlation. Cao and Slobounov (2010) analyzed nineteen channel resting EEGs in a three step process. First, independent component analysis (Hyvärinen et al. 2001) was used to identify independent processes. Second, a source reconstruction algorithm (standardized low resolution electromagnetic tomography, sLORETA Pascual-Marqui 2002) was used to identify cortical regions associated with functional activity. Third, using this localization, graph theory was used to quantify connectivity in the resting state. These procedures should be incorporated into an expanded comparison study. The Wendling et al. (2009) results obtained with computationally generated data indicated that no single procedure was best for all cases. This is almost certainly true for biological data. The importance of using more than one measure was further indicated by the results of Dauwels et al. (2010) who found that different measures of synchronization were not well correlated. They concluded that ''therefore they each seem to capture a specific kind of interdependence.'' Our best recommendation is to perform functional connectivity studies with several methods including both original scalp signals and estimates of current source density and compare the results.
It is possible to use mutual information calculations in synchronization studies. In this experimental design, the original EEG signal is bandpass filtered into specified frequency bands. Given the restricted spectrum of the filtered signal, it is possible to estimate its phase by calculating the Hilbert transform (Boashash 1992;Pikovsky et al. 2001). Mutual information calculations can then determine if there is a nonrandom relationship between phase functions measured at different electrode sites.
While recognizing the limitations of this study, the results suggest that when implemented with an adaptive partition of the joint probability distribution, mutual information provides an effective noise-robust measure of correlation. This result may extend beyond functional connectivity studies to include analysis of CNS causal networks and analysis of CNS small world networks, which are briefly considered.
Investigation of CNS causal relationships, the time dependent directional movement of information, may be important in the study of traumatic brain injury. As previously noted, Goldstein's pioneering work on the behavioral neurology of traumatic brain injury led him to conclude that restitution of function following injury resulted from adaptation rather than from repair. This suggests that post-injury alteration of causal networks may provide a sensitive measure of altered CNS function following injury. While measures like correlation, coherence and mutual information can be used to establish the presence of correlative relationships between signals they do not provide any information about the direction of information movement. Additional procedures must be introduced. In most cases, the quantitative assessment of causal relationships between variables is constructed on the following idea. If measuring variable X improves the prediction of variable Y, then Y is, in this limited operational sense, causally dependent on X. It should be stressed that this relationship is not necessarily unidirectional. It can also be the case that with the same data, measuring Y also improves the prediction of X. This conceptualization of causality appears in Wiener (1956) and may be original with Wiener.
An early implementation of this operationalization of causality was published by Granger (1969) in the econometrics literature and popularized by Sims (1972). Granger causality is constructed using linear regression models. If past values of X are useful in predicting the current value of Y in a linear regression, then X is said to be a causal drive of time series Y. As with any statistical procedure, causality tests based on linear regression must be implemented with care. A growing literature has identified circumstances that lead to spurious identification of linear causality (Breitung and Swanson 2002;He and Maekawa 2001).
An extension of mutual information may provide a noiserobust measure of causality. Recall that the mutual information of time series X and Y, I(X, Y) is the average number of bits of one variable that can be predicted by measuring the other. Mutual information can be shown to be symmetrical, that is I(X, Y) = I(Y, X). Therefore while mutual information can establish the presence of a nonrandom relationship between time series, it cannot identify causal relationships. However, a time lagged mutual information in which one of the two variables is time shifted can be used to determine, if, for example, measuring variable X in the past allows prediction of future values of variable Y. We can shift time series X by lag s and calculate I(X s , Y) as a function of s. Similarly, we can calculate I(X, Y s ). If measuring X s allows better prediction of Y, than the other way around, then it can be argued that information is transferred from X to Y. The magnitude of the mutual information and the time lag which produces the greatest value can be used to quantify both the magnitude of the information transfer and the time delay associated with that transfer. A number of investigators have proposed using lagged mutual information to investigate information transfer in distributed systems (Kaneko 1986;Vastano and Swinney 1988;Albano et al. 1999). The procedure has a long history in electroencephalography. Inouye et al. (1983) used an ''entropy analysis'' which was what would now be described as directed mutual information to quantify the direction of information flow and concluded that the dominant longitudinal direction of alpha activity was anterior to posterior. A subsequent publication (Inouye et al. 1993) used directed mutual information to show change in information flow during a cognitively demanding arithmetic task. Mars and his colleagues (Mars and Lopes da Silva 1983;Mars et al. 1985) used mutual information to quantify time delays in the transmission of epileptic seizures. Several other investigators have used lagged mutual information to quantify between-channel information transfer in multichannel EEGs (Xu et al. 1997;Chen et al. 2000;Lopes da Silva 1987). Schreiber (2000), however, has presented valuable results which produced examples where standard lagged mutual information failed to detect information exchange. This motivated the construction of a related measure, transfer entropy, that successfully identified these relationships. The Schreiber results should be considered in the light of the previously cited Duckrow and Albano (2003) calculations that demonstrated the sensitivity of mutual information calculations on the choice of algorithm. This may have been a factor in the Schreiber study. Madulara et al. (2012) calculated transfer entropy using the EEG records analyzed in this paper. Mutual information was generally lower in the eyes open than in the eyes closed condition. In contrast, transfer entropies increased by a factor of two in the eyes open condition. As would be anticipated, the largest one-way transfer entropies were observed to and from the occipital lobe. Consistent with our previous recommendations, we suggest computing both measures (lagged mutual information and transfer entropy). Clinical utility is the final arbiter.
Stated in abstract terms a network is a collection of nodes and connections between the nodes. A small world network is defined as a network that has dense local clusters that are connected by a limited number of long range connections. In a seminal paper, Watts and Strogatz (1998) showed how small world networks can be characterized quantitatively. Small world networks are highly efficient. They can support a high degree of dynamical complexity with a minimum investment in connections (Latora and Marchiori 2001). This is an attractive metaphor for describing the central nervous system. Local networks provide areas of specialization, but these specialized domains can communicate efficiently with the entire brain by long range connections. When applied to multichannel EEG data, the electrode sites are the nodes and the connections are identified by correlation measures. Three types of connections can be identified. In a binary network, a connection is either present or absent. Operationally this is established by assigning a threshold value (connection present/absent) to a measure of correlation. In a weighted network, the value of a connection's strength is assigned on a continuum determined by the correlation measure. In directed networks, the direction of information transfer, not just the strength of the connection, is incorporated into the analysis. These methods are now being utilized in the analysis of the central nervous system (Smith-Bassett and Bullmore 2006;Sporns and Honey 2006;Stam and Reijneveld 2007). Altered small world networks have been observed in clinical populations including patients with CNS tumors (Bartolomei et al. 2006), epilepsy (Ponten et al. 2007;van Dellen et al. 2009), schizophrenia (Rubinov et al. 2009), and Alzheimer's disease (Stam et al. 2007a, b). As would be anticipated alterations in networks are associated with traumatic brain injury (Cao and Slobounov 2010;Nakamura et al. 2009;Tsirka et al. 2011;Zouridakis et al. 2011;Catsellanos et al. 2011a, b). The calculations presented in this paper and in Madulara et al. (2012) suggest that when calculated using an adaptive partition of the joint probability distribution, mutual information, lagged mutual information and transfer entropy can provide computationally efficient, noise-robust metrics for the analysis of CNS small world networks.
The mathematical results showing the efficiency of networks composed of highly connected local regions with limited, but essential, long range connections can inform the discussion of CNS localization of function. The localizationist conceptualization began with Broca's localization of Cogn Neurodyn (2014) 8:1-15 9 expressive aphasia to the third left frontal convolution (Broca 1861) and Wernicke's localization of receptive aphasia to the posterior section of the superior temporal convolution (Wernicke 1908). By the early twentieth century, however, several neurologists argued against a strict localizationist model (Tesak and Code 2008). Kurt Goldstein was a significant contributor to the debate (Goldstein 1927;Ludwig 2012). Goldstein's views were complex and it would be an oversimplification to describe his views as inflexibly antilocalizationist (Ludwig 2012). For example, in his Lokalisation in der Großhirnrinde, Goldstein recognizes Broca's ''flawless establishment of the dependency of the impairment of articulated speech from a lesion in the third left frontal convolution'' (Goldstein 1927, translated Ludwig 2012. He similarly accepts Wernicke's identification of the role of the superior temporal convolution in some presentations of receptive aphasia, but based on clinical observations Goldstein concluded that language functions could not be decomposed into discrete anatomically isolated components. Goldstein's acceptance of localizationist results but his argument for the incompleteness of a localizationist account caused Geschwind (1997) to describe his views as a ''paradoxical position.'' Ludwig proposes that the paradox can be resolved by recognizing that Goldstein introduced a distinction between weak localization (the correlation of symptoms with lesions) and strong localization (the implementation of a process exclusively in a defined locality). We suggest that a quantitative examination of these questions can be constructed by comparing CNS network geometries generated by language dependent ERP tasks in healthy controls and in patients presenting well characterized aphasias. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix: Measures of correlation
Pearson product moment correlation Let fXg ¼ fx 1 ; x 2 ; . . .; x N D g and fYg ¼ fy 1 ; y 2 ; . . .; y N D g be time series of paired observations. N D is the number of elements in each set. The product moment correlation coefficient r is given by where x is the mean of {X} and y is the mean of {Y}. There are several procedures for calculating the probability of the null hypothesis of no correlation between {X} and {Y} (Press et al. 1992). A robust procedure that was used here is based on a t-distribution where and m = N D -2 is the number of degrees of freedom., The probability of the null hypothesis is : I x (a, b) is the incomplete beta function. The 95 % confidence limits for r, r Low and r High , can be computed by converting r to Fisher's z.
Spearman rank order correlation As before let {X} and {Y} be time series of N D paired observations. fR X g ¼ R X 1 ; R X 2 ; . . .; R X N D n o gives the ranks of the values of X. In cases of ties the average ranks are entered. {R Y } is defined analogously. The Spearman rank order correlation, q S , is the product moment correlation of ranks.
When there are no ties this becomes It can be shown that q S reduces to the Pearson product moment correlation when calculations are performed on ranks in the absence of ties. The probability of the null hypothesis (no correlation) is calculated as before with t taking the value t S .
The Spearman rank order correlation is less sensitive to outliers than the product moment correlation. Importantly, the rank order correlation can detect nonlinear correlations provided that the relation between X and Y is approximately monotonic. A helpful example is given in Triola (2008, p. 713). The rank order correlation is less efficient than the product moment correlation in the sense that the nonparametric measure requires 100 observations to achieve the same results as 91 observations analyzed with the Pearson correlation (Triola 2008, p. 677).

Kendall rank order correlation
Consider two consecutive paired observations (X i , Y i ) and (X i?1 , Y i?1 ). If both X and Y increase, then X i?1 -X i , Y i?1 -Y i , and (X i?1 -X i )(Y i?1 -Y i ) are positive. If both variables decrease between observation i and i ? 1, then (X i?1 -X i )(Y i?1 -Y i ) is again positive. If these two variables are negatively correlated between these two observations, then (X i?1 -X i )(Y i?1 -Y i ) is negative. The Kendall rank correlation coefficient is constructed by examining these relationships over all possible pairs of observations. If (X i?1 -X i )(Y j?1 -Y j ) is positive, then variable j is increased by 1. If (X i?1 -X i )(Y i?1 -Y i ) is negative then variable j is decreased by 1. If it is zero, then j is unchanged. These comparisons are made not just across temporally adjacent pairs, that is between (X i , Y i ) and (X i?1 , Y i?1 ), but rather for all possible (X i , Y i ) and (X j , Y j ) pairs. There are N D (N D -1)/2 distinct pairs, giving j a maximum possible value of N D (N D -1)/2. Kendall's s is the normalized value of j.
s has a value between -1 and ?1. In the null hypothesis of no association between {X} and {Y}, s is normally distributed and has the standard deviation (Press et al. 1992, p. 637).
The probability of the null hypothesis is computed from the complementary error function.
Again letting R X i and R Y i denote the ranks of {X} and {Y}, it is seen that ðR X i À R X j ÞðR Y i À R Y j Þ has the same sign as (X i -X j )(Y i -Y j ) and therefore j calculated from ranks is identical to j calculated using X and Y values. s is therefore seen to be a nonparametric correlation that does not make any assumptions about the distributions of {X} and {Y}. It is generally observed that q S and s are highly correlated. This anticipation is borne out in the calculations presented here. s provides a means of identifying monotonic correlations. A more general search for correlations which would include non-monotonic associations requires alternative measures.

Mutual information
Given {X} and {Y}, time series of paired observations. Again, N D is the number of elements in each set. The mutual information of variables X and Y, denoted I(X, Y), is the average number of bits of variable Y that can be predicted by measuring variable X. It can be shown (Cover and Thomas 1991) that mutual information is symmetrical; I(X, Y) = I(Y, X). For finite data sets I(X, Y) can be approximated by estimating the probability distributions of each variable and their joint probability distribution. Each variable's distribution is approximated by a histogram. Let N X be the number of bins in the histogram of variable X. O X (i) is the occupancy of the i-th bin and P X (i) = O X (i)/ N D is the probability of occupation in the i-th bin. (The procedure for determining N X and the upper and lower bound of each element of the partition is described presently.) N Y is the number of elements in the histogram of variable Y. In the general case N X and N Y are not necessarily equal. O Y (i) and P Y (i), j = 1, 2, …, N Y are the corresponding occupancies and probabilities. P XY (i, j), the joint probability distribution, is the probability that an (x, y) pair is an element in the i-th bin of the partition of the X axis and the j-th bin of the partition of the Y axis. Mutual information is defined by where there is no contribution to the sum if P XY (i, j) = 0. If variables X and Y are statistically independent, then P XY (i, j) = P X (i)P Y (j) and I(X, Y) = 0. Thus in a calculation of mutual information, the null hypothesis is statistical independence of variables X and Y, in which case I(X, Y) is indistinguishable form zero. Let E XY-NULL (i, j) be the expected occupancy of element (i, j) of the XY partition if The number of degrees of freedom, t, is given by t = (N X -1)(N Y -1). The probability of the null hypothesis of statistical independence is Q is the incomplete gamma function.
Qða; xÞ ¼ 1 CðaÞ When examining finite data sets the estimated value of I(X, Y) is critically dependent on the partition of X and Y values used to estimate P X (i), P Y (i) and P XY (i, j). Several different procedures can be used to estimate these distributions. We apply here a specific implementation of an algorithm using a nonuniform partition that was introduced in Cellucci et al. (2005). This algorithm considers the special case where the same number of elements, N E , is used to partition the X and Y variables; N E = N X = N Y . The bins span the range x min to x max on the X axis and y min to y max on the Y axis. In this algorithm, the widths of the bins are varied independently on each axis to meet the criterion of uniform occupancy; that is each element has occupancy N D /N E = O X (i) = O Y (j) giving P X (i) = P Y (j) = 1/N E . It should be understood, however, that the values of P XY (i, j) will not be uniform. The equiprobable partition of each axis ensures that X N E i¼1 P XY ði; jÞ ¼ X N E j¼1 P XY ði; jÞ ¼ 1=N E But the P XY (i, j) values will be different. The assumption of a partition giving P X (i) = P Y (j) = 1/N E gives E XYÀNULL ði; jÞ ¼ N D P X ðiÞP Y ðjÞ ¼ N D =N 2 E When estimating P XY (i, j) we must address the question, what is the appropriate value of N E ? This is the two dimensional analog of the histogram problem, which is the appropriate number of bins in a histogram? The morphology of the distribution cannot be detected if the number of bins is too small. This is seen by consider the limiting case of a single bin. Conversely, if the number of bins is too large, occupancies are zero or one and again the shape of the distribution cannot be determined. The number of bins for either a one dimensional or two dimensional distribution should be as large as possible, but not too large. In this algorithm, N E is determined by applying a variant of the Cochran criterion (Cochran 1954) to E XY-NULL (i, j). This criterion requires E XY-NULL (i, j) C5 for at least 80 % of the elements of the partition. We impose a more conservative criterion and require E XY-NULL (i, j) C5 in all elements. N E is the largest positive integer satisfying this criterion. We have previously derived an expression for E XY-NULL (i, j) for an equi-probable partition of the X and Y axes. Our criterion on E XY-NULL (i, j) becomes E XYÀNULL ði; jÞ ¼ N D =N 2 E ! 5 N E is the largest integer meeting the criterion ðN D =5Þ 1=2 ! N E . If, for example, N D = 8,192, then ðN D =5Þ 1=2 ¼ 40:447 and N E = 40. O X (i) and O Y (j) will be either 204 or 205. The between bin differences of 204 or 205 occur because 8,192 is not a multiple of 40. The upper and lower bound of each element of the partition are varied to give the best possible approximation of P X (i) = -P Y (j) = 1/N E . When the bin assignments of X and Y values in the time series are known, P XY (i, j) can be determined. The estimate of mutual information and the probability of the null hypothesis then follow from the previous formulas.