# Time domain measures of inter-channel EEG correlations: a comparison of linear, nonparametric and nonlinear measures

## Authors

## Abstract

Correlations between ten-channel EEGs obtained from thirteen healthy adult participants were investigated. Signals were obtained in two behavioral states: eyes open no task and eyes closed no task. Four time domain measures were compared: Pearson product moment correlation, Spearman rank order correlation, Kendall rank order correlation and mutual information. The psychophysiological utility of each measure was assessed by determining its ability to discriminate between conditions. The sensitivity to epoch length was assessed by repeating calculations with 1, 2, 3, …, 8 s epochs. The robustness to noise was assessed by performing calculations with noise corrupted versions of the original signals (SNRs of 0, 5 and 10 dB). Three results were obtained in these calculations. First, mutual information effectively discriminated between states with less data. Pearson, Spearman and Kendall failed to discriminate between states with a 1 s epoch, while a statistically significant separation was obtained with mutual information. Second, at all epoch durations tested, the measure of between-state discrimination was greater for mutual information. Third, discrimination based on mutual information was more robust to noise. The limitations of this study are discussed. Further comparisons should be made with frequency domain measures, with measures constructed with embedded data and with the maximal information coefficient.

### Keywords

EEG Quantitative EEG Pearson product moment correlation Spearman rank order correlation Kendall rank order correlation Mutual information## Introduction

The connectivity of the human central nervous system is its most distinctive feature. Classically connectivity was investigated anatomically. An alternative view emerged in the twentieth Century which emphasized the movement of information. Like many concepts, the seemingly straightforward idea of connectivity was found to be far more complicated than originally anticipated when it was examined with sufficient care. This can be seen in the report of the 2002 Functional Connectivity Workshop (Lee et al. 2003). Three distinct conceptualizations of connectivity have emerged: anatomical, functional and effective. Anatomical complexity might seem to be the least problematical, and arguably it is, but nonetheless complications present themselves. A complete anatomical description requires not merely knowledge of geometrical proximity but an understanding of receptor subtypes and the availability of neurotransmitters (Lee et al. 2003). Functional connectivity is defined as the “temporal correlations between spatially remote neurophysiological events” (Friston et al. 1993a), and effective complexity is defined as “the influences that one neural system exerts over another either directly or indirectly” (Friston et al. 1993b). Horowitz (2003), using the word “elusive,” found that all three conceptualizations of connectivity present subtleties of definition and that these problems were compounded when an attempt was made to integrate results obtained from different observational technologies. His analysis led to three conclusions. First, “we should think of functional (and effective) connectivity not as a single concept or quantity, but rather as forming a class of concepts with multiple members.” Second, “functional and effective connectivity must be operationally defined by each investigator who evaluates these quantities.” Third, “it is crucial to relate each of the macroscopic definitions to an underlying neural substrate.”

Fingelkurts et al. (2005) concurred in recognizing that theoretical and methodological clarifications are needed to bring precision to the analysis of CNS connectivity. They argue that the time scale of neuroanatomical change is such that an examination of anatomical connectivity cannot provide a basis for a dynamical investigation of perceptual and cognitive processes. They further argue that effective connectivity is identified by first establishing functional connectivity and combining it with a model specifying the causal links between participating units. They therefore conclude that “functional connectivity is the most central and challenging of the three conceptions of brain connectivity for theories about neural interactions.” Given the millisecond time scale of dynamical behavior in the central nervous system, Fingelkurts et al. argue for an essential role of EEG and MEG in investigations of functional connectivity. We concur, and the analysis of temporal correlations of EEG signals is the focus of this contribution. Four time domain procedures for quantifying correlations are compared. A physiological criterion, the ability to discriminate between behavioral states, is used as an adjudicating criterion. Additional measures that should be incorporated in an expanded study are considered in the “Discussion” section of this paper.

When using scalp EEG signals in the analysis of functional connectivity an additional question should be considered. Can the analysis be conducted with the original scalp signals, or is it essential to transform these signals to provide an estimate of the current source density? It is not our present purpose to participate in this debate. Conclusions about the comparative effectiveness of different measures for identifying correlations in scalp signals, which is our objective, will be applicable to calculations with current source density estimates. Two additional observations in this regard can be made. First, in practice, calculations should be performed with both original voltage signals and with transformed signals, and the results should be compared. Second, we should bear in mind Horwitz’s very valuable observation that each investigator should define the operational definition of connectivity being implemented.

The earliest example of interregional EEG correlation measurement that has come to our attention is Imahori and Suhara (1949 cited by Gevins 1987) where hand calculated autocorrelations of short EEG segments were presented. The use of autocorrelation and cross-correlations to study electroencephalograms is reported to have been suggested by Norbert Wiener in 1949 to a group of researchers at the Massachusetts General Hospital (Barlow 1997). Among this group were Mary Brazier and James Casby who in 1950 started their pioneering work on correlation analysis of the EEG using an electronic digital correlator at the Massachusetts Institute of Technology (Brazier and Casby 1952). An important continuing application of cross-correlation calculations is the correlation of EEGs with templates of averaged event related potentials where the procedure is used to locate single trial event related potentials, ERPs, in background EEG signals (McGillem and Aunou 1987 reviewed by Spencer 2005). This procedure was introduced by Woody (1967) to detect epileptic spikes. It was first applied to ERP signals by Kutas et al. (1977). This method continues to be applied in the analysis of epileptic seizures (Filligoi et al. 2011) and in the construction of brain computer interfaces (Cabestang et al. 2007).

The study of CNS correlations evolved to include more sophisticated measures. An important step in this evolutionary process was the introduction of mutual information, a nonlinear measure of correlation, to the analysis of EEGs. The earliest application of mutual information in electroencephalography that we have seen is Callaway and Harris (1974) where it was called the coefficient of information transmission. In this application, mutual information was not calculated directly from voltage time series. Digitizing at 250 Hz, each entry was coded for polarity (positive or negative) and derivative (increasing or decreasing). Callaway and Harris showed that a reading task increased occipital to left hemisphere coupling while a visual processing task increased occipital to right hemisphere coupling. In a subsequent publication (Yagi et al. 1976), Callaway and his colleagues investigated the sensitivity of this measure to epoch length and sampling frequency. Mars and Lopes da Silva (1987) showed that mutual information can identify significant correlations that are not detected by linear measures. Other applications of this measure in electroencephalography were published by Xu et al. (1997), Albano et al. (2000) and Chen et al. (2000). A limiting factor in use of mutual information has been data requirements for the estimation, computational times and uncertainty about the accuracy of the estimate. This point is addressed presently.

Pathological conditions associated with altered functional connectivity (representative examples)

Alzheimer’s disease |
Georgopoulos et al. (2007), Güntekin et al. (2008), Locatelli et al. (1998), Rosenbaum et al. (2008), Stam et al. (2006, 2007a, b 2009), Zhou et al. (2008) |

Epileptic seizures |
Ponten et al. (2007) |

Intra-arterial amobarbital injection |
Douw et al. (2010) |

Autism spectrum disorder |
Belmonte et al. (2004), Just et al. (2004), Kana et al. (2007), Murias et al. (2007), Rippon et al. (2006), Vidal et al. (2006) |

Brain tumors | |

Multiple sclerosis | |

Preterm birth |
Mullen et al. (2011) |

PTSD | |

Schizophrenia |
Breakspear et al. (2003), Georgopoulos et al. (2007), Lawrie et al. (2002), Lynall et al. (2010), Michelyannis et al. (2006), Symond et al. (2005) |

Stroke |
Grefkes and Fink (2012) |

Traumatic brain injury |
Cao and Slobounov 2010), Castellanos et al. (2010, 2011a, b), Ham and Sharp 2012), Kasahara et al. (2010), Kumar et al. (2009), Nakamura et al. (2009), Sponheim et al. (2011), Tsirka et al. (2011) |

## Correlation measures assessed

Four time domain measures for quantifying relationships between time series are compared in this investigation: Pearson product moment correlation, Spearman rank order correlation, Kendall rank order correlation and mutual information. These measures will be used to quantify between-channel correlations in EEGs recorded from healthy participants in two behavioral conditions: eyes open, no task and eyes closed, no task. The psychophysiological utility of each measure is assessed by determining its ability to discriminate between these conditions.

A brief presentation of the mathematical properties of these measures is given in the “Appendix”. Qualitative descriptions are given here. The Pearson product moment correlation quantifies linear correlations between variables. The Spearman rank order correlation is the product moment correlation of ranks, and the Kendall rank order correlation uses the relative ordering of ranks. The mutual information of two time series is the average number of bits of each that can be predicted by measuring the other. The numerical estimation of mutual information can be computationally demanding, and the accuracy of the estimate can be sensitive to the algorithm used. This was demonstrated by the comparison studies conducted by Quian Quiroga et al. (2002) and by Duckrow and Albano (2003). In a valuable study, Quian Quiroga et al. compared five measures of interhemispheric correlations (nonlinear dependencies, phase synchronization, mutual information, cross correlation and coherence). Except for mutual information, the measures showed qualitatively similar results, and, importantly the computations identified interhemispheric dependencies that were not apparent on conventional visual examination performed by a Board certified electroencephalographer. Quian Quiroga et al. used a fixed bin-width histogram method for estimating the joint probability distributions. Estimating the joint probability distribution is a critical element in the estimation of mutual information (see the “Appendix” for the mathematical details). Using the same data, Duckrow and Albano used the Fraser–Swinney (1986) adaptive partition when estimating joint probability distributions. This computation of mutual information produced results consistent with the other measures. Several methods for estimating mutual information are reviewed in Khan et al. (2007). In the calculations presented here, we used the algorithm constructed in Cellucci et al. (2005). This is a computationally efficient procedure. In test calculations it requires 0.5 % of the computation time required by the Fraser–Swinney algorithm (comparison calculations reported in Cellucci et al. 2005). Also, in contrast with other algorithms, the Cellucci algorithm incorporates an explicit calculation of the probability of the null hypothesis of no predictive relationship between the two variables. This statistical validation is particularly important in calculations with noisy psychophysiological data.

_{NULL}that is numerically indistinguishable from zero. Again, this is as it should be. An important distinction between measures is seen when the third signal, which is parabolically correlated, is examined. The Pearson product moment correlation failed to detect a linear correlation, P

_{NULL}= 0.9912. The Spearman and Kendall measures which can identify monotonic nonlinear relationships also failed to reject the null hypothesis; P

_{NULL}= 0.9928 and P

_{NULL}= 0.9989 respectively. In contrast, mutual information identified a nonrandom relationship in parabolic data. The reported probability is of null hypothesis is indistinguishable from zero.

Correlation calculations (modified from Cellucci et al. 2005)

Normally distributed random |
Linearly correlated |
Parabolically correlated | |
---|---|---|---|

Pearson r |
r = −0.0037 |
r = 0.9934 |
r = 0.0001 |

Pearson P |
P |
P |
P |

Spearman ρ |
ρ |
ρ |
ρ |

Spearman P |
P |
P |
P |

Kendall τ |
τ = 0.0027 |
τ = 0.9270 |
τ ≤ 10 |

Kendall P |
P |
P |
P |

Mutual information (bits) |
I = 0.1356 |
I = 2.9186 |
I = 3.0304 |

Mutual information P |
P |
P |
P |

_{NULL}is numerically indistinguishable from zero in all four cases. If one considers X ∈ [0, 6], then the Pearson product moment correlation, Spearman rank order correlation and Kendall rank order correlation fail to reject the null hypothesis. For these measures, P

_{NULL}is 0.959, 0.964 and 0.944 respectively. Mutual information, however, continues to identify a nonrandom relationship and P

_{NULL}remains zero. Thus in the case of the three classical measures of correlation we have the seemingly paradoxical result that evidence for a relationship is lost as more data are available.

Two conclusions follow from the examples considered here. (1) Nonlinear measures should be used in combination with linear and nonparametric measures. (2) Evidence for time domain correlation should be examined as a function of epoch duration.

## Electroencephalographic data

The University’s Institutional Review Board reviewed and approved all procedures involving human subjects. Informed consents were obtained from each participant. There were thirteen participants. Participants were healthy adults without a history of head injury or serious psychiatric illness. Multichannel monopolar recordings, referenced to linked earlobes, were obtained from F_{Z}, C_{Z}, P_{Z}, O_{Z}, F_{3}, F_{4}, C_{3}, C_{4}, P_{3}, and P_{4} using an Electrocap and Sensorium EPA-6 amplifiers. Vertical and horizontal eye movements were recorded from electrode sites above and below the right eye and from near the outer canthi of each eye. Artifact corrupted records were removed from the analyses. Artifact corruption was defined as an amplitude difference greater than 120 μV peak-to-peak within 500 msec or a blink in the EOG channel. All EEG impedances were less than 5 KOhm. Signals were amplified, Gain = 18,000, and amplifier frequency cutoff settings of 0.03 and 200 Hz were used. Signals were digitized at 1,024 Hz using a twelve-bit digitizer. Multichannel records were obtained in two conditions: eyes closed, resting and eyes open, resting. Continuous artifact-free records were obtained from each subject in the two conditions (eyes open and eyes closed). Given the results shown in Fig. 2, measures were calculated as a function of epoch duration (1–8 s).

## Comparing measures in between-state discriminations

The psychophysiological utility of each measure was assessed by determining its ability to discriminate between eyes open, no task and eyes closed, no task conditions. For concreteness of presentation, the experiment is described by considering the first measure, the product moment correlation which is denoted by r. The EEGs are ten-channel recordings. Thus for a single participant there are 45 distinct channel pairs. The correlation between channel i and channel j, r_{ij}, is measured in each condition to give 45 values of (r_{ij})_{closed} and 45 values of (r_{ij})_{open}. The operational question becomes can we discriminate between states by comparing (r_{ij})_{closed} against (r_{ij})_{open}? As noted above, there were thirteen participants in the study. This gives 585 (number of participants × number of channel pairs) (r_{ij})_{closed} versus (r_{ij})_{open} pairs. They are compared in a paired *t* test. The test produces a value of t and the corresponding probability of the null hypothesis. In this application the null hypothesis supposes that there is no difference in between-channel correlations in the eyes open and eyes closed correlation. A high value of t, and hence a low value of P_{NULL}, indicates a successful discrimination.

This process is performed for all four measures. As operationalized in this study, the comparative assessment of these measures of correlation can now be stated in a single question. Which measure gives the largest value of t and lowest values of P_{NULL}? Concerns have been expressed (Gevins 1987) about the amount of data required to estimate mutual information. The calculations have, therefore, been repeated for 1, 2, …, 8 s epochs.

*t*test for epoch durations of 1, 2, …, 8 s. In the case of 1 s durations, Pearson, Spearman and Kendall correlations do not discriminate between the two behavioral conditions. They fail to reject the null hypothesis. The respective values of P

_{NULL}are 0.807, 0.854 and 0.699. The null hypothesis is, however, rejected for 1 s durations by mutual information where P

_{NULL}<10

^{−5}. All four measures reject the null hypothesis at epoch durations greater than or equal to 2 s. In all cases, the value of t obtained with mutual information is greater than the value obtained with the other measures. A further understanding of the between state discrimination can be obtained by examining the restatement of the results that is given in the second panel of the diagram where −log

_{10}(P

_{NULL}) is plotted as a function of epoch duration. A value of +5, for example, on this graph corresponds to P

_{NULL}= 10

^{−5}The values of −log

_{10}(P

_{NULL}) obtained with mutual information are consistently greater than those obtained with the other measures.

## Robustness to noise

At SNR = 10 dB all four measures failed to discriminate between conditions when 1 s epochs were examined. All four measures successfully made the discrimination for greater epoch lengths, but as in the case of uncorrupted signals, a greater statistical separation was obtained with mutual information.

At higher noise levels (lower SNR) the degree of between state discrimination as quantified by P_{NULL} is reduced, but the pattern observed with SNR = 10 dB is preserved. Specifically, all four measures fail to discriminate between eyes closed and eyes open with 1 s epochs. All four measures successfully discriminate at longer epochs, and the degree of discrimination obtained with mutual information is greater than that observed with the other three measures.

## Discussion

Three results were obtained in these calculations. First, a nonlinear measure, mutual information, effectively discriminated between states with less data, specifically a 1 s epoch, when other measures failed to discriminate between conditions. Second, at all epoch durations tested, the measure of between-state discrimination was greater for mutual information. Third, discrimination based on mutual information was more robust to noise.

The limitations of this study should be recognized. Three points should be addressed. First, the study is based on signals obtained from thirteen participants. Because the method that is best for one database is not necessarily best in all cases, a different outcome may be obtained with different data. Second, in this study the test criterion was the ability to discriminate between the eyes-open and eyes-closed condition. It is possible that a different measure, a measure other than mutual information, would be more effective if a different test criterion was implemented. Third, this study was limited to a comparison of four time domain measures of correlation. Several other measures have been used to quantify correlation and should be considered. Reshef et al. (2011) have constructed a maximal information criterion that has some properties in common with mutual information. Additional methods include coherence (Nunez et al. 1997, 1999), phase locking index (Stam et al. 2009; Hurtado et al. 2004; Sazonov et al. 2009), imaginary coherency (Stam et al. 2007a, b; Nolte et al. 2004) and phase lag index (Stam et al. 2007a, b, 2009). As outlined by several authors (Cao and Slobounov 2010; Schiff 2005; Guevara et al. 2005), care must be exercised in the application of these procedures. Recently more sophisticated procedures for assessing correlation have been investigated. Stam and van Djik (2002) and Wendling et al. (2009) have used methods based on embedded data (Takens 1981) to quantify correlation. Cao and Slobounov (2010) analyzed nineteen channel resting EEGs in a three step process. First, independent component analysis (Hyvärinen et al. 2001) was used to identify independent processes. Second, a source reconstruction algorithm (standardized low resolution electromagnetic tomography, sLORETA (Pascual-Marqui et al. 2002; Pascual-Marqui 2002) was used to identify cortical regions associated with functional activity. Third, using this localization, graph theory was used to quantify connectivity in the resting state. These procedures should be incorporated into an expanded comparison study. The Wendling et al. (2009) results obtained with computationally generated data indicated that no single procedure was best for all cases. This is almost certainly true for biological data. The importance of using more than one measure was further indicated by the results of Dauwels et al. (2010) who found that different measures of synchronization were not well correlated. They concluded that “therefore they each seem to capture a specific kind of interdependence.” Our best recommendation is to perform functional connectivity studies with several methods including both original scalp signals and estimates of current source density and compare the results.

It is possible to use mutual information calculations in synchronization studies. In this experimental design, the original EEG signal is bandpass filtered into specified frequency bands. Given the restricted spectrum of the filtered signal, it is possible to estimate its phase by calculating the Hilbert transform (Boashash 1992; Pikovsky et al. 2001). Mutual information calculations can then determine if there is a nonrandom relationship between phase functions measured at different electrode sites.

While recognizing the limitations of this study, the results suggest that when implemented with an adaptive partition of the joint probability distribution, mutual information provides an effective noise-robust measure of correlation. This result may extend beyond functional connectivity studies to include analysis of CNS causal networks and analysis of CNS small world networks, which are briefly considered.

Investigation of CNS causal relationships, the time dependent directional movement of information, may be important in the study of traumatic brain injury. As previously noted, Goldstein’s pioneering work on the behavioral neurology of traumatic brain injury led him to conclude that restitution of function following injury resulted from adaptation rather than from repair. This suggests that post-injury alteration of causal networks may provide a sensitive measure of altered CNS function following injury. While measures like correlation, coherence and mutual information can be used to establish the presence of correlative relationships between signals they do not provide any information about the direction of information movement. Additional procedures must be introduced. In most cases, the quantitative assessment of causal relationships between variables is constructed on the following idea. If measuring variable X improves the prediction of variable Y, then Y is, in this limited operational sense, causally dependent on X. It should be stressed that this relationship is not necessarily unidirectional. It can also be the case that with the same data, measuring Y also improves the prediction of X. This conceptualization of causality appears in Wiener (1956) and may be original with Wiener.

An early implementation of this operationalization of causality was published by Granger (1969) in the econometrics literature and popularized by Sims (1972). Granger causality is constructed using linear regression models. If past values of X are useful in predicting the current value of Y in a linear regression, then X is said to be a causal drive of time series Y. As with any statistical procedure, causality tests based on linear regression must be implemented with care. A growing literature has identified circumstances that lead to spurious identification of linear causality (Breitung and Swanson 2002; He and Maekawa 2001).

An extension of mutual information may provide a noise-robust measure of causality. Recall that the mutual information of time series X and Y, I(X, Y) is the average number of bits of one variable that can be predicted by measuring the other. Mutual information can be shown to be symmetrical, that is I(X, Y) = I(Y, X). Therefore while mutual information can establish the presence of a nonrandom relationship between time series, it cannot identify causal relationships. However, a time lagged mutual information in which one of the two variables is time shifted can be used to determine, if, for example, measuring variable X in the past allows prediction of future values of variable Y. We can shift time series X by lag τ and calculate I(X_{τ}, Y) as a function of τ. Similarly, we can calculate I(X, Y_{τ}). If measuring X_{τ} allows better prediction of Y, than the other way around, then it can be argued that information is transferred from X to Y. The magnitude of the mutual information and the time lag which produces the greatest value can be used to quantify both the magnitude of the information transfer and the time delay associated with that transfer. A number of investigators have proposed using lagged mutual information to investigate information transfer in distributed systems (Kaneko 1986; Vastano and Swinney 1988; Albano et al. 1999). The procedure has a long history in electroencephalography. Inouye et al. (1983) used an “entropy analysis” which was what would now be described as directed mutual information to quantify the direction of information flow and concluded that the dominant longitudinal direction of alpha activity was anterior to posterior. A subsequent publication (Inouye et al. 1993) used directed mutual information to show change in information flow during a cognitively demanding arithmetic task. Mars and his colleagues (Mars and Lopes da Silva 1983; Mars et al. 1985) used mutual information to quantify time delays in the transmission of epileptic seizures. Several other investigators have used lagged mutual information to quantify between-channel information transfer in multichannel EEGs (Xu et al. 1997; Chen et al. 2000; Lopes da Silva 1987). Schreiber (2000), however, has presented valuable results which produced examples where standard lagged mutual information failed to detect information exchange. This motivated the construction of a related measure, transfer entropy, that successfully identified these relationships. The Schreiber results should be considered in the light of the previously cited Duckrow and Albano (2003) calculations that demonstrated the sensitivity of mutual information calculations on the choice of algorithm. This may have been a factor in the Schreiber study. Madulara et al. (2012) calculated transfer entropy using the EEG records analyzed in this paper. Mutual information was generally lower in the eyes open than in the eyes closed condition. In contrast, transfer entropies increased by a factor of two in the eyes open condition. As would be anticipated, the largest one-way transfer entropies were observed to and from the occipital lobe. Consistent with our previous recommendations, we suggest computing both measures (lagged mutual information and transfer entropy). Clinical utility is the final arbiter.

Stated in abstract terms a network is a collection of nodes and connections between the nodes. A small world network is defined as a network that has dense local clusters that are connected by a limited number of long range connections. In a seminal paper, Watts and Strogatz (1998) showed how small world networks can be characterized quantitatively. Small world networks are highly efficient. They can support a high degree of dynamical complexity with a minimum investment in connections (Latora and Marchiori 2001). This is an attractive metaphor for describing the central nervous system. Local networks provide areas of specialization, but these specialized domains can communicate efficiently with the entire brain by long range connections. When applied to multichannel EEG data, the electrode sites are the nodes and the connections are identified by correlation measures. Three types of connections can be identified. In a binary network, a connection is either present or absent. Operationally this is established by assigning a threshold value (connection present/absent) to a measure of correlation. In a weighted network, the value of a connection’s strength is assigned on a continuum determined by the correlation measure. In directed networks, the direction of information transfer, not just the strength of the connection, is incorporated into the analysis. These methods are now being utilized in the analysis of the central nervous system (Smith-Bassett and Bullmore 2006; Sporns and Honey 2006; Stam and Reijneveld 2007). Altered small world networks have been observed in clinical populations including patients with CNS tumors (Bartolomei et al. 2006), epilepsy (Ponten et al. 2007; van Dellen et al. 2009), schizophrenia (Rubinov et al. 2009), and Alzheimer’s disease (Stam et al. 2007a, b). As would be anticipated alterations in networks are associated with traumatic brain injury (Cao and Slobounov 2010; Nakamura et al. 2009; Tsirka et al. 2011; Zouridakis et al. 2011; Catsellanos et al. 2011a, b). The calculations presented in this paper and in Madulara et al. (2012) suggest that when calculated using an adaptive partition of the joint probability distribution, mutual information, lagged mutual information and transfer entropy can provide computationally efficient, noise-robust metrics for the analysis of CNS small world networks.

The mathematical results showing the efficiency of networks composed of highly connected local regions with limited, but essential, long range connections can inform the discussion of CNS localization of function. The localizationist conceptualization began with Broca’s localization of expressive aphasia to the third left frontal convolution (Broca 1861) and Wernicke’s localization of receptive aphasia to the posterior section of the superior temporal convolution (Wernicke 1908). By the early twentieth century, however, several neurologists argued against a strict localizationist model (Tesak and Code 2008). Kurt Goldstein was a significant contributor to the debate (Goldstein 1927; Ludwig 2012). Goldstein’s views were complex and it would be an oversimplification to describe his views as inflexibly antilocalizationist (Ludwig 2012). For example, in his Lokalisation in der Großhirnrinde, Goldstein recognizes Broca’s “flawless establishment of the dependency of the impairment of articulated speech from a lesion in the third left frontal convolution” (Goldstein 1927, translated Ludwig 2012). He similarly accepts Wernicke’s identification of the role of the superior temporal convolution in some presentations of receptive aphasia, but based on clinical observations Goldstein concluded that language functions could not be decomposed into discrete anatomically isolated components. Goldstein’s acceptance of localizationist results but his argument for the incompleteness of a localizationist account caused Geschwind (1997) to describe his views as a “paradoxical position.” Ludwig proposes that the paradox can be resolved by recognizing that Goldstein introduced a distinction between weak localization (the correlation of symptoms with lesions) and strong localization (the implementation of a process exclusively in a defined locality). We suggest that a quantitative examination of these questions can be constructed by comparing CNS network geometries generated by language dependent ERP tasks in healthy controls and in patients presenting well characterized aphasias.

## Acknowledgments

The opinions and assertions contained herein are the private opinions of the authors and are not to be construed as official or reflecting the views of the United States Department of Defense. PER and BMR would like to acknowledge support from the Traumatic Injury Research Program of the Uniformed Services University of the Health Sciences, from the Defense Medical Research and Development Program and from the United States Marine Corps Systems Command. JDB, LCCA and AMA were supported in part by the Department of Science and Technology, Republic of the Philippines.

## Appendix: Measures of correlation

### Pearson product moment correlation

_{D}is the number of elements in each set. The product moment correlation coefficient r is given by

_{D}− 2 is the number of degrees of freedom., The probability of the null hypothesis is

_{x}(a, b) is the incomplete beta function.

_{Low}and r

_{High}, can be computed by converting r to Fisher’s z.

_{D}> 10.

### Spearman rank order correlation

_{D}paired observations. \( \{ {\text{R}}_{\text{X}} \} = \left\{ {{\text{R}}_{{{\text{X}}_{1} }} ,{\text{R}}_{{{\text{X}}_{2} }} , \ldots ,{\text{R}}_{{{\text{X}}_{{{\text{N}}_{\text{D}} }} }} } \right\} \) gives the ranks of the values of X. In cases of ties the average ranks are entered. {R

_{Y}} is defined analogously. The Spearman rank order correlation, ρ

_{S}, is the product moment correlation of ranks.

_{S}reduces to the Pearson product moment correlation when calculations are performed on ranks in the absence of ties. The probability of the null hypothesis (no correlation) is calculated as before with t taking the value t

_{S}.

### Kendall rank order correlation

_{i}, Y

_{i}) and (X

_{i+1}, Y

_{i+1}). If both X and Y increase, then X

_{i+1}−X

_{i}, Y

_{i+1}−Y

_{i}, and (X

_{i+1}−X

_{i})(Y

_{i+1}−Y

_{i}) are positive. If both variables decrease between observation i and i + 1, then (X

_{i+1}−X

_{i})(Y

_{i+1}−Y

_{i}) is again positive. If these two variables are negatively correlated between these two observations, then (X

_{i+1}−X

_{i})(Y

_{i+1}−Y

_{i}) is negative. The Kendall rank correlation coefficient is constructed by examining these relationships over all possible pairs of observations. If (X

_{i+1}−X

_{i})(Y

_{j+1}−Y

_{j}) is positive, then variable κ is increased by 1. If (X

_{i+1}−X

_{i})(Y

_{i+1}−Y

_{i}) is negative then variable κ is decreased by 1. If it is zero, then κ is unchanged. These comparisons are made not just across temporally adjacent pairs, that is between (X

_{i}, Y

_{i}) and (X

_{i+1}, Y

_{i+1}), but rather for all possible (X

_{i}, Y

_{i}) and (X

_{j}, Y

_{j}) pairs. There are N

_{D}(N

_{D}− 1)/2 distinct pairs, giving κ a maximum possible value of N

_{D}(N

_{D}− 1)/2. Kendall’s τ is the normalized value of κ.

Again letting \( {\text{R}}_{{{\text{X}}_{\text{i}} }} \) and \( {\text{R}}_{{{\text{Y}}_{\text{i}} }} \) denote the ranks of {X} and {Y}, it is seen that \( ({\text{R}}_{{{\text{X}}_{\text{i}} }} - {\text{R}}_{{{\text{X}}_{\text{j}} }} )({\text{R}}_{{{\text{Y}}_{\text{i}} }} - {\text{R}}_{{{\text{Y}}_{\text{j}} }} ) \) has the same sign as (X_{i} − X_{j})(Y_{i} − Y_{j}) and therefore κ calculated from ranks is identical to κ calculated using X and Y values. τ is therefore seen to be a nonparametric correlation that does not make any assumptions about the distributions of {X} and {Y}. It is generally observed that ρ_{S} and τ are highly correlated. This anticipation is borne out in the calculations presented here. τ provides a means of identifying monotonic correlations. A more general search for correlations which would include non-monotonic associations requires alternative measures.

### Mutual information

Given {X} and {Y}, time series of paired observations. Again, N_{D} is the number of elements in each set. The mutual information of variables X and Y, denoted I(X, Y), is the average number of bits of variable Y that can be predicted by measuring variable X. It can be shown (Cover and Thomas 1991) that mutual information is symmetrical; I(X, Y) = I(Y, X). For finite data sets I(X, Y) can be approximated by estimating the probability distributions of each variable and their joint probability distribution. Each variable’s distribution is approximated by a histogram. Let N_{X} be the number of bins in the histogram of variable X. O_{X}(i) is the occupancy of the i-th bin and P_{X}(i) = O_{X}(i)/N_{D} is the probability of occupation in the i-th bin. (The procedure for determining N_{X} and the upper and lower bound of each element of the partition is described presently.) N_{Y} is the number of elements in the histogram of variable Y. In the general case N_{X} and N_{Y} are not necessarily equal. O_{Y}(i) and P_{Y}(i), j = 1, 2, …, N_{Y} are the corresponding occupancies and probabilities.

_{XY}(i, j), the joint probability distribution, is the probability that an (x, y) pair is an element in the i-th bin of the partition of the X axis and the j-th bin of the partition of the Y axis. Mutual information is defined by

_{XY}(i, j) = 0. If variables X and Y are statistically independent, then P

_{XY}(i, j) = P

_{X}(i)P

_{Y}(j) and I(X, Y) = 0. Thus in a calculation of mutual information, the null hypothesis is statistical independence of variables X and Y, in which case I(X, Y) is indistinguishable form zero. Let E

_{XY-NULL}(i, j) be the expected occupancy of element (i, j) of the XY partition if X and Y are independent. Under the assumption of independence E

_{XY-NULL}(i, j) becomes

_{XY}(i, j) be the observed occupancy in each element of the partition. The corresponding value of χ

^{2}is

_{X}− 1)(N

_{Y}− 1). The probability of the null hypothesis of statistical independence is

_{X}(i), P

_{Y}(i) and P

_{XY}(i, j). Several different procedures can be used to estimate these distributions. We apply here a specific implementation of an algorithm using a nonuniform partition that was introduced in Cellucci et al. (2005). This algorithm considers the special case where the same number of elements, N

_{E}, is used to partition the X and Y variables; N

_{E}= N

_{X}= N

_{Y}. The bins span the range x

_{min}to x

_{max}on the X axis and y

_{min}to y

_{max}on the Y axis. In this algorithm, the widths of the bins are varied independently on each axis to meet the criterion of uniform occupancy; that is each element has occupancy N

_{D}/N

_{E}= O

_{X}(i) = O

_{Y}(j) giving P

_{X}(i) = P

_{Y}(j) = 1/N

_{E}. It should be understood, however, that the values of P

_{XY}(i, j) will not be uniform. The equi-probable partition of each axis ensures that

_{XY}(i, j) values will be different. The assumption of a partition giving P

_{X}(i) = P

_{Y}(j) = 1/N

_{E}gives

_{XY}(i, j) we must address the question, what is the appropriate value of N

_{E}? This is the two dimensional analog of the histogram problem, which is the appropriate number of bins in a histogram? The morphology of the distribution cannot be detected if the number of bins is too small. This is seen by consider the limiting case of a single bin. Conversely, if the number of bins is too large, occupancies are zero or one and again the shape of the distribution cannot be determined. The number of bins for either a one dimensional or two dimensional distribution should be as large as possible, but not too large. In this algorithm, N

_{E}is determined by applying a variant of the Cochran criterion (Cochran 1954) to E

_{XY-NULL}(i, j). This criterion requires E

_{XY-NULL}(i, j) ≥5 for at least 80 % of the elements of the partition. We impose a more conservative criterion and require E

_{XY-NULL}(i, j) ≥5 in all elements. N

_{E}is the largest positive integer satisfying this criterion. We have previously derived an expression for E

_{XY-NULL}(i, j) for an equi-probable partition of the X and Y axes. Our criterion on E

_{XY-NULL}(i, j) becomes

_{E}is the largest integer meeting the criterion \( ({\text{N}}_{\text{D}} /5)^{1/2} \ge {\text{N}}_{\text{E}} \). If, for example, N

_{D}= 8,192, then \( ({\text{N}}_{\text{D}} /5)^{1/2} = 40.447 \) and N

_{E}= 40. O

_{X}(i) and O

_{Y}(j) will be either 204 or 205. The between bin differences of 204 or 205 occur because 8,192 is not a multiple of 40. The upper and lower bound of each element of the partition are varied to give the best possible approximation of P

_{X}(i) = P

_{Y}(j) = 1/N

_{E}. When the bin assignments of X and Y values in the time series are known, P

_{XY}(i, j) can be determined. The estimate of mutual information and the probability of the null hypothesis then follow from the previous formulas.

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.