Cognitive Neurodynamics

, Volume 8, Issue 1, pp 1–15

Time domain measures of inter-channel EEG correlations: a comparison of linear, nonparametric and nonlinear measures

Authors

  • J. D. Bonita
    • Department of PhysicsMindanao State University-Iligan Institute of Technology
  • L. C. C. AmbolodeII
    • Department of PhysicsMindanao State University-Iligan Institute of Technology
  • B. M. Rosenberg
    • Thomas Jefferson University College of Medicine
  • C. J. Cellucci
    • Aquinas, LLC
  • T. A. A. Watanabe
    • Lannister-Finn
    • Department of Military and Emergency MedicineUniformed Services University of the Health Sciences
  • A. M. Albano
    • Physics DepartmentBryn Mawr College
Open AccessReview Paper

DOI: 10.1007/s11571-013-9267-8

Abstract

Correlations between ten-channel EEGs obtained from thirteen healthy adult participants were investigated. Signals were obtained in two behavioral states: eyes open no task and eyes closed no task. Four time domain measures were compared: Pearson product moment correlation, Spearman rank order correlation, Kendall rank order correlation and mutual information. The psychophysiological utility of each measure was assessed by determining its ability to discriminate between conditions. The sensitivity to epoch length was assessed by repeating calculations with 1, 2, 3, …, 8 s epochs. The robustness to noise was assessed by performing calculations with noise corrupted versions of the original signals (SNRs of 0, 5 and 10 dB). Three results were obtained in these calculations. First, mutual information effectively discriminated between states with less data. Pearson, Spearman and Kendall failed to discriminate between states with a 1 s epoch, while a statistically significant separation was obtained with mutual information. Second, at all epoch durations tested, the measure of between-state discrimination was greater for mutual information. Third, discrimination based on mutual information was more robust to noise. The limitations of this study are discussed. Further comparisons should be made with frequency domain measures, with measures constructed with embedded data and with the maximal information coefficient.

Keywords

EEG Quantitative EEG Pearson product moment correlation Spearman rank order correlation Kendall rank order correlation Mutual information

Introduction

The connectivity of the human central nervous system is its most distinctive feature. Classically connectivity was investigated anatomically. An alternative view emerged in the twentieth Century which emphasized the movement of information. Like many concepts, the seemingly straightforward idea of connectivity was found to be far more complicated than originally anticipated when it was examined with sufficient care. This can be seen in the report of the 2002 Functional Connectivity Workshop (Lee et al. 2003). Three distinct conceptualizations of connectivity have emerged: anatomical, functional and effective. Anatomical complexity might seem to be the least problematical, and arguably it is, but nonetheless complications present themselves. A complete anatomical description requires not merely knowledge of geometrical proximity but an understanding of receptor subtypes and the availability of neurotransmitters (Lee et al. 2003). Functional connectivity is defined as the “temporal correlations between spatially remote neurophysiological events” (Friston et al. 1993a), and effective complexity is defined as “the influences that one neural system exerts over another either directly or indirectly” (Friston et al. 1993b). Horowitz (2003), using the word “elusive,” found that all three conceptualizations of connectivity present subtleties of definition and that these problems were compounded when an attempt was made to integrate results obtained from different observational technologies. His analysis led to three conclusions. First, “we should think of functional (and effective) connectivity not as a single concept or quantity, but rather as forming a class of concepts with multiple members.” Second, “functional and effective connectivity must be operationally defined by each investigator who evaluates these quantities.” Third, “it is crucial to relate each of the macroscopic definitions to an underlying neural substrate.”

Fingelkurts et al. (2005) concurred in recognizing that theoretical and methodological clarifications are needed to bring precision to the analysis of CNS connectivity. They argue that the time scale of neuroanatomical change is such that an examination of anatomical connectivity cannot provide a basis for a dynamical investigation of perceptual and cognitive processes. They further argue that effective connectivity is identified by first establishing functional connectivity and combining it with a model specifying the causal links between participating units. They therefore conclude that “functional connectivity is the most central and challenging of the three conceptions of brain connectivity for theories about neural interactions.” Given the millisecond time scale of dynamical behavior in the central nervous system, Fingelkurts et al. argue for an essential role of EEG and MEG in investigations of functional connectivity. We concur, and the analysis of temporal correlations of EEG signals is the focus of this contribution. Four time domain procedures for quantifying correlations are compared. A physiological criterion, the ability to discriminate between behavioral states, is used as an adjudicating criterion. Additional measures that should be incorporated in an expanded study are considered in the “Discussion” section of this paper.

When using scalp EEG signals in the analysis of functional connectivity an additional question should be considered. Can the analysis be conducted with the original scalp signals, or is it essential to transform these signals to provide an estimate of the current source density? It is not our present purpose to participate in this debate. Conclusions about the comparative effectiveness of different measures for identifying correlations in scalp signals, which is our objective, will be applicable to calculations with current source density estimates. Two additional observations in this regard can be made. First, in practice, calculations should be performed with both original voltage signals and with transformed signals, and the results should be compared. Second, we should bear in mind Horwitz’s very valuable observation that each investigator should define the operational definition of connectivity being implemented.

The earliest example of interregional EEG correlation measurement that has come to our attention is Imahori and Suhara (1949 cited by Gevins 1987) where hand calculated autocorrelations of short EEG segments were presented. The use of autocorrelation and cross-correlations to study electroencephalograms is reported to have been suggested by Norbert Wiener in 1949 to a group of researchers at the Massachusetts General Hospital (Barlow 1997). Among this group were Mary Brazier and James Casby who in 1950 started their pioneering work on correlation analysis of the EEG using an electronic digital correlator at the Massachusetts Institute of Technology (Brazier and Casby 1952). An important continuing application of cross-correlation calculations is the correlation of EEGs with templates of averaged event related potentials where the procedure is used to locate single trial event related potentials, ERPs, in background EEG signals (McGillem and Aunou 1987 reviewed by Spencer 2005). This procedure was introduced by Woody (1967) to detect epileptic spikes. It was first applied to ERP signals by Kutas et al. (1977). This method continues to be applied in the analysis of epileptic seizures (Filligoi et al. 2011) and in the construction of brain computer interfaces (Cabestang et al. 2007).

The study of CNS correlations evolved to include more sophisticated measures. An important step in this evolutionary process was the introduction of mutual information, a nonlinear measure of correlation, to the analysis of EEGs. The earliest application of mutual information in electroencephalography that we have seen is Callaway and Harris (1974) where it was called the coefficient of information transmission. In this application, mutual information was not calculated directly from voltage time series. Digitizing at 250 Hz, each entry was coded for polarity (positive or negative) and derivative (increasing or decreasing). Callaway and Harris showed that a reading task increased occipital to left hemisphere coupling while a visual processing task increased occipital to right hemisphere coupling. In a subsequent publication (Yagi et al. 1976), Callaway and his colleagues investigated the sensitivity of this measure to epoch length and sampling frequency. Mars and Lopes da Silva (1987) showed that mutual information can identify significant correlations that are not detected by linear measures. Other applications of this measure in electroencephalography were published by Xu et al. (1997), Albano et al. (2000) and Chen et al. (2000). A limiting factor in use of mutual information has been data requirements for the estimation, computational times and uncertainty about the accuracy of the estimate. This point is addressed presently.

While being a problem of general interest in CNS physiology, the quantitative characterization of interregional correlations are of particular importance in the study of traumatic brain injury. The development of current thought about functional connectivity following TBI has many contributors, but two individuals who must appear in any account of this historical process are John Hughlings-Jackson (1835–1911) and Kurt Goldstein (1878–1965). Hughlings-Jackson and Goldstein both concluded that the recovery of function, typically partial recovery, following brain injury argued against a strong localization model of CNS organization (Hughlings-Jackson 1874, 1882; Goldstein 1934). In addition to rejecting strong localization, Goldstein’s work with CNS injured soldiers following World War I led him to conclude that recovery did not result from repair but rather from adaptation (Zeitlinger 2001). Hughlings-Jackson’s and Goldstein views concerning nonlocalization of deficit are consistent with recent research identifying failures of distributed synchronous networks in the etiology of neuropsychiatric disorders (Herrmann and Demiralp 2005; Schnitzler and Gross 2005; Stam 2005; Uhlhaas and Singer 2006). While Goldstein’s views on the failure of repair and his emphasis on adaptation following traumatic brain injury must be reconsidered in the light of the discovery of neurogenesis in the adult mammal, evidence indicates that at least for the immediate present they are still essentially correct. This process of adaptation would, one predicts, result in altered patterns of correlations in the post-injury central nervous system. This expectation has been realized in the recent literature (see Table 1 below, these are representative examples drawn from a large literature). In summary, studies of altered functional connectivity following traumatic brain injury utilize three kinds of data, EEG signals, MEG signals and fractional anisotropy measures of axonal tracts characterized by diffusion tensor imaging. This contribution is directed to EEG-based assessments. Three classes of analysis measures are used in these EEG studies, time domain measures, frequency domain measures and measures constructed with embedded data. The focus here is on time domain measures. We explicitly recognize that further comparative studies should include the additional measures described in the “Discussion” section of this paper.
Table 1

Pathological conditions associated with altered functional connectivity (representative examples)

Alzheimer’s disease

Georgopoulos et al. (2007), Güntekin et al. (2008), Locatelli et al. (1998), Rosenbaum et al. (2008), Stam et al. (2006, 2007a, b 2009), Zhou et al. (2008)

Epileptic seizures

Ponten et al. (2007)

Intra-arterial amobarbital injection

Douw et al. (2010)

Autism spectrum disorder

Belmonte et al. (2004), Just et al. (2004), Kana et al. (2007), Murias et al. (2007), Rippon et al. (2006), Vidal et al. (2006)

Brain tumors

Bartolomei et al. (2006), Bosma et al. (2008)

Multiple sclerosis

Georgopoulos et al. (2007), Lenne et al. (2012)

Preterm birth

Mullen et al. (2011)

PTSD

Lanius et al. (2004), Shaw 2002

Schizophrenia

Breakspear et al. (2003), Georgopoulos et al. (2007), Lawrie et al. (2002), Lynall et al. (2010), Michelyannis et al. (2006), Symond et al. (2005)

Stroke

Grefkes and Fink (2012)

Traumatic brain injury

Cao and Slobounov 2010), Castellanos et al. (2010, 2011a, b), Ham and Sharp 2012), Kasahara et al. (2010), Kumar et al. (2009), Nakamura et al. (2009), Sponheim et al. (2011), Tsirka et al. (2011)

Correlation measures assessed

Four time domain measures for quantifying relationships between time series are compared in this investigation: Pearson product moment correlation, Spearman rank order correlation, Kendall rank order correlation and mutual information. These measures will be used to quantify between-channel correlations in EEGs recorded from healthy participants in two behavioral conditions: eyes open, no task and eyes closed, no task. The psychophysiological utility of each measure is assessed by determining its ability to discriminate between these conditions.

A brief presentation of the mathematical properties of these measures is given in the “Appendix”. Qualitative descriptions are given here. The Pearson product moment correlation quantifies linear correlations between variables. The Spearman rank order correlation is the product moment correlation of ranks, and the Kendall rank order correlation uses the relative ordering of ranks. The mutual information of two time series is the average number of bits of each that can be predicted by measuring the other. The numerical estimation of mutual information can be computationally demanding, and the accuracy of the estimate can be sensitive to the algorithm used. This was demonstrated by the comparison studies conducted by Quian Quiroga et al. (2002) and by Duckrow and Albano (2003). In a valuable study, Quian Quiroga et al. compared five measures of interhemispheric correlations (nonlinear dependencies, phase synchronization, mutual information, cross correlation and coherence). Except for mutual information, the measures showed qualitatively similar results, and, importantly the computations identified interhemispheric dependencies that were not apparent on conventional visual examination performed by a Board certified electroencephalographer. Quian Quiroga et al. used a fixed bin-width histogram method for estimating the joint probability distributions. Estimating the joint probability distribution is a critical element in the estimation of mutual information (see the “Appendix” for the mathematical details). Using the same data, Duckrow and Albano used the Fraser–Swinney (1986) adaptive partition when estimating joint probability distributions. This computation of mutual information produced results consistent with the other measures. Several methods for estimating mutual information are reviewed in Khan et al. (2007). In the calculations presented here, we used the algorithm constructed in Cellucci et al. (2005). This is a computationally efficient procedure. In test calculations it requires 0.5 % of the computation time required by the Fraser–Swinney algorithm (comparison calculations reported in Cellucci et al. 2005). Also, in contrast with other algorithms, the Cellucci algorithm incorporates an explicit calculation of the probability of the null hypothesis of no predictive relationship between the two variables. This statistical validation is particularly important in calculations with noisy psychophysiological data.

An important property of mutual information is identified by examining the computational results presented in Fig. 1 and in Table 2 (modified from Cellucci et al. 2005 following an example in Mars and Lopes da Silva 1987). The first test signal consists of normally distributed random numbers. With each measure, the probability of the null hypothesis is significantly greater than zero. That is, each measure correctly failed to detect a nonrandom relationship between variables X and Y. In the case of linearly correlated signals each measure reports a PNULL that is numerically indistinguishable from zero. Again, this is as it should be. An important distinction between measures is seen when the third signal, which is parabolically correlated, is examined. The Pearson product moment correlation failed to detect a linear correlation, PNULL = 0.9912. The Spearman and Kendall measures which can identify monotonic nonlinear relationships also failed to reject the null hypothesis; PNULL = 0.9928 and PNULL = 0.9989 respectively. In contrast, mutual information identified a nonrandom relationship in parabolic data. The reported probability is of null hypothesis is indistinguishable from zero.
https://static-content.springer.com/image/art%3A10.1007%2Fs11571-013-9267-8/MediaObjects/11571_2013_9267_Fig1_HTML.gif
Fig. 1

Three test signals used in the calculations reported in Table 2. In all cases x = −3 to +3 in steps of 0.0006. a Normally distributed random numbers with zero mean and unit variance. b y = x + 0.2 × ε, where ε is the first test signal. c y = x2 + 0.2 × ε. Ten thousand points were used in the calculations. Every tenth point is plotted on the diagram (modified from Cellucci et al. 2005)

Table 2

Correlation calculations (modified from Cellucci et al. 2005)

 

Normally distributed random

Linearly correlated

Parabolically correlated

Pearson r

r = −0.0037

r = 0.9934

r = 0.0001

Pearson PNULL

PNULL = 0.7112

PNULL ≈ 0

PNULL = 0.9912

Spearman ρS

ρS = −0.0040

ρS = 0.9936

ρS ≤ 10−4

Spearman PNULL

PNULL = 0.6854

PNULL ≈ 0

PNULL = 0.9928

Kendall τ

τ = 0.0027

τ = 0.9270

τ ≤ 10−5

Kendall PNULL

PNULL = 0.6845

PNULL ≈ 0

PNULL ≈ 0.9989

Mutual information (bits)

I = 0.1356

I = 2.9186

I = 3.0304

Mutual information PNULL

PNULL = 0.7851

PNULL ≈ 0

PNULL ≈ 0

An additional lesson can be learned by considering the example shown in Fig. 2. In this system of paired signals X = 0–6 in steps of 0.0006 and
$$ {\text{Y}} = \left\{ {\begin{array}{*{20}c} {2{\text{X}} + 0.1 \times \varepsilon \quad 0 \le {\text{X}} \le 3} \hfill \\ {12 - 2{\text{X}} + 0.1 \times \varepsilon \quad 3 < {\text{X}} \le 6} \hfill \\ \end{array} } \right. $$
where, as before, ε is normally distributed with zero mean and unit variance. If the signals are examined over the first half of the diagram, X ∈ [0, 3], all four measures detect a significant relationship. PNULL is numerically indistinguishable from zero in all four cases. If one considers X ∈ [0, 6], then the Pearson product moment correlation, Spearman rank order correlation and Kendall rank order correlation fail to reject the null hypothesis. For these measures, PNULL is 0.959, 0.964 and 0.944 respectively. Mutual information, however, continues to identify a nonrandom relationship and PNULL remains zero. Thus in the case of the three classical measures of correlation we have the seemingly paradoxical result that evidence for a relationship is lost as more data are available.
https://static-content.springer.com/image/art%3A10.1007%2Fs11571-013-9267-8/MediaObjects/11571_2013_9267_Fig2_HTML.gif
Fig. 2

Non-monotonically correlated test signals. X = 0 to 6 in steps of 0.0006. Y = 2X + 0.1ε for X ∈ [0, 3] and Y = 12 − 2X + 0.1ε for X ∈ [3, 6]. All four measures detect a correlation for X ∈ [0, 3]. Only mutual information detects a nonrandom relationship when the paired signals are analyzed for X ∈ [0, 6]. Ten thousand points were used in the calculations. Every tenth point is plotted on the diagram

Two conclusions follow from the examples considered here. (1) Nonlinear measures should be used in combination with linear and nonparametric measures. (2) Evidence for time domain correlation should be examined as a function of epoch duration.

Electroencephalographic data

The University’s Institutional Review Board reviewed and approved all procedures involving human subjects. Informed consents were obtained from each participant. There were thirteen participants. Participants were healthy adults without a history of head injury or serious psychiatric illness. Multichannel monopolar recordings, referenced to linked earlobes, were obtained from FZ, CZ, PZ, OZ, F3, F4, C3, C4, P3, and P4 using an Electrocap and Sensorium EPA-6 amplifiers. Vertical and horizontal eye movements were recorded from electrode sites above and below the right eye and from near the outer canthi of each eye. Artifact corrupted records were removed from the analyses. Artifact corruption was defined as an amplitude difference greater than 120 μV peak-to-peak within 500 msec or a blink in the EOG channel. All EEG impedances were less than 5 KOhm. Signals were amplified, Gain = 18,000, and amplifier frequency cutoff settings of 0.03 and 200 Hz were used. Signals were digitized at 1,024 Hz using a twelve-bit digitizer. Multichannel records were obtained in two conditions: eyes closed, resting and eyes open, resting. Continuous artifact-free records were obtained from each subject in the two conditions (eyes open and eyes closed). Given the results shown in Fig. 2, measures were calculated as a function of epoch duration (1–8 s).

Comparing measures in between-state discriminations

The psychophysiological utility of each measure was assessed by determining its ability to discriminate between eyes open, no task and eyes closed, no task conditions. For concreteness of presentation, the experiment is described by considering the first measure, the product moment correlation which is denoted by r. The EEGs are ten-channel recordings. Thus for a single participant there are 45 distinct channel pairs. The correlation between channel i and channel j, rij, is measured in each condition to give 45 values of (rij)closed and 45 values of (rij)open. The operational question becomes can we discriminate between states by comparing (rij)closed against (rij)open? As noted above, there were thirteen participants in the study. This gives 585 (number of participants × number of channel pairs) (rij)closed versus (rij)open pairs. They are compared in a paired t test. The test produces a value of t and the corresponding probability of the null hypothesis. In this application the null hypothesis supposes that there is no difference in between-channel correlations in the eyes open and eyes closed correlation. A high value of t, and hence a low value of PNULL, indicates a successful discrimination.

This process is performed for all four measures. As operationalized in this study, the comparative assessment of these measures of correlation can now be stated in a single question. Which measure gives the largest value of t and lowest values of PNULL? Concerns have been expressed (Gevins 1987) about the amount of data required to estimate mutual information. The calculations have, therefore, been repeated for 1, 2, …, 8 s epochs.

The values of these four measures are shown in Fig. 3. The results are consistent with expectations. There is a greater between-channel correlation (Pearson, Spearman, Kendall) in the eyes closed condition. Similarly, there is a greater between-channel predictability (mutual information) in the eyes closed condition.
https://static-content.springer.com/image/art%3A10.1007%2Fs11571-013-9267-8/MediaObjects/11571_2013_9267_Fig3_HTML.gif
Fig. 3

Correlation measures as a function of epoch length. The mean values of Pearson r, Spearman rho, Kendall tau and mutual information are calculated for the indicated epoch duration. Values in red are group means and standard deviations for the eyes-closed condition. Values in black were obtained with eyes-open data

The uncertainties shown in Fig. 3 are standard deviations of group means. When comparing correlation results obtained in the eyes-closed condition against those in the eyes open condition the appropriate comparison is not based on group means and standard deviations. Rather, the comparison is by matched channel pairs. For example, the C3–C4 correlation observed in the eyes-closed condition is compared against the C3–C4 correlation obtained in the eyes-open condition. The collective statistical result of this paired test is shown in Fig. 4. The upper panel shows the t values obtained in the eyes open versus eyes closed paired t test for epoch durations of 1, 2, …, 8 s. In the case of 1 s durations, Pearson, Spearman and Kendall correlations do not discriminate between the two behavioral conditions. They fail to reject the null hypothesis. The respective values of PNULL are 0.807, 0.854 and 0.699. The null hypothesis is, however, rejected for 1 s durations by mutual information where PNULL <10−5. All four measures reject the null hypothesis at epoch durations greater than or equal to 2 s. In all cases, the value of t obtained with mutual information is greater than the value obtained with the other measures. A further understanding of the between state discrimination can be obtained by examining the restatement of the results that is given in the second panel of the diagram where −log10 (PNULL) is plotted as a function of epoch duration. A value of +5, for example, on this graph corresponds to PNULL = 10−5 The values of −log10 (PNULL) obtained with mutual information are consistently greater than those obtained with the other measures.
https://static-content.springer.com/image/art%3A10.1007%2Fs11571-013-9267-8/MediaObjects/11571_2013_9267_Fig4_HTML.gif
Fig. 4

Comparison of correlation measures as a function of epoch length. a Values of t obtained in an eyes closed versus eyes open paired t test as a function of epoch duration. b Values of −log10 PNULL for the corresponding probabilities of the null hypothesis. a, b Squares identify results from the Pearson product moment correlation. Diamonds identify results from the Spearman rank order correlation. The letter x identifies results from the Kendall rank order correlation and circles identify results obtained with mutual information

Robustness to noise

Gevins (1987) raised questions concerning the sensitivity of mutual information calculations to noise. Notably, he did so in the context of the Callaway and Harris (1974) study where the voltage time series were encoded by polarity and sign of the derivative. We have investigated noise sensitivity in the case of direct voltage time series calculations by testing the robustness of these measures to additive noise. All four measures were found to be robust to noise, but as in the previous calculations, mutual information outperformed the other three measures. In this experiment, normally distributed random numbers with zero mean were added to each of the original EEG signals. The random number generator was based on Park and Miller (1988) and incorporated a Bays–Durham shuffle (Knuth 1981) followed by a Box–Muller transformation (Press et al. 1992). The variance of the additive noise was progressively increased to give signal to noise ratios of 10, 5 and 0 dB. A qualitative understanding of each signal to noise ratio is given in Panels F, G and H of Fig. 5. The signal presented in black is the noise corrupted signal. This is the input signal used in the calculations. The red signal is the original signal. For reference, it is superimposed on the corrupted signal.
https://static-content.springer.com/image/art%3A10.1007%2Fs11571-013-9267-8/MediaObjects/11571_2013_9267_Fig5_HTML.gif
Fig. 5

Robustness of correlation measures to additive gaussian noise. a Comparison of correlation measures using original data from 13 subjects. As before, squares identify results from the Pearson product moment correlation. Diamonds identify results from the Spearman rank order correlation. The letter x identifies results from the Kendall rank order correlation and circles identify results obtained with mutual information. b Comparison of correlation measures using data from 13 subjects following addition of gaussian noise giving signal to noise ratios of SNR = 10 dB. Symbols identifying different measures follow the pattern of a. c Comparison of correlation measures using data from 13 subjects following addition of gaussian noise giving signal to noise ratios of SNR = 5 dB. Symbols identifying different measures follow the pattern of a. d. Comparison of correlation measures using data from 13 subjects following addition of gaussian noise giving signal to noise ratios of SNR = 0 dB. Symbols identifying different measures follow the pattern of a. e Example segment of an EEG signal recorded from a single subject at electrode site Pz in the eyes closed condition. f. Component of the EEG signal shown in e after addition of gaussian noise, SNR = 10 dB (shown in black). The original signal is shown in red for comparison. g Component of the EEG signal shown in e after addition of gaussian noise, SNR = 5 dB (shown in black). The original signal is shown in red for comparison. h Component of the EEG signal shown in e after addition of gaussian noise, SNR = 0 dB (shown in black). The original signal is shown in red for comparison

At SNR = 10 dB all four measures failed to discriminate between conditions when 1 s epochs were examined. All four measures successfully made the discrimination for greater epoch lengths, but as in the case of uncorrupted signals, a greater statistical separation was obtained with mutual information.

At higher noise levels (lower SNR) the degree of between state discrimination as quantified by PNULL is reduced, but the pattern observed with SNR = 10 dB is preserved. Specifically, all four measures fail to discriminate between eyes closed and eyes open with 1 s epochs. All four measures successfully discriminate at longer epochs, and the degree of discrimination obtained with mutual information is greater than that observed with the other three measures.

Discussion

Three results were obtained in these calculations. First, a nonlinear measure, mutual information, effectively discriminated between states with less data, specifically a 1 s epoch, when other measures failed to discriminate between conditions. Second, at all epoch durations tested, the measure of between-state discrimination was greater for mutual information. Third, discrimination based on mutual information was more robust to noise.

The limitations of this study should be recognized. Three points should be addressed. First, the study is based on signals obtained from thirteen participants. Because the method that is best for one database is not necessarily best in all cases, a different outcome may be obtained with different data. Second, in this study the test criterion was the ability to discriminate between the eyes-open and eyes-closed condition. It is possible that a different measure, a measure other than mutual information, would be more effective if a different test criterion was implemented. Third, this study was limited to a comparison of four time domain measures of correlation. Several other measures have been used to quantify correlation and should be considered. Reshef et al. (2011) have constructed a maximal information criterion that has some properties in common with mutual information. Additional methods include coherence (Nunez et al. 1997, 1999), phase locking index (Stam et al. 2009; Hurtado et al. 2004; Sazonov et al. 2009), imaginary coherency (Stam et al. 2007a, b; Nolte et al. 2004) and phase lag index (Stam et al. 2007a, b, 2009). As outlined by several authors (Cao and Slobounov 2010; Schiff 2005; Guevara et al. 2005), care must be exercised in the application of these procedures. Recently more sophisticated procedures for assessing correlation have been investigated. Stam and van Djik (2002) and Wendling et al. (2009) have used methods based on embedded data (Takens 1981) to quantify correlation. Cao and Slobounov (2010) analyzed nineteen channel resting EEGs in a three step process. First, independent component analysis (Hyvärinen et al. 2001) was used to identify independent processes. Second, a source reconstruction algorithm (standardized low resolution electromagnetic tomography, sLORETA (Pascual-Marqui et al. 2002; Pascual-Marqui 2002) was used to identify cortical regions associated with functional activity. Third, using this localization, graph theory was used to quantify connectivity in the resting state. These procedures should be incorporated into an expanded comparison study. The Wendling et al. (2009) results obtained with computationally generated data indicated that no single procedure was best for all cases. This is almost certainly true for biological data. The importance of using more than one measure was further indicated by the results of Dauwels et al. (2010) who found that different measures of synchronization were not well correlated. They concluded that “therefore they each seem to capture a specific kind of interdependence.” Our best recommendation is to perform functional connectivity studies with several methods including both original scalp signals and estimates of current source density and compare the results.

It is possible to use mutual information calculations in synchronization studies. In this experimental design, the original EEG signal is bandpass filtered into specified frequency bands. Given the restricted spectrum of the filtered signal, it is possible to estimate its phase by calculating the Hilbert transform (Boashash 1992; Pikovsky et al. 2001). Mutual information calculations can then determine if there is a nonrandom relationship between phase functions measured at different electrode sites.

While recognizing the limitations of this study, the results suggest that when implemented with an adaptive partition of the joint probability distribution, mutual information provides an effective noise-robust measure of correlation. This result may extend beyond functional connectivity studies to include analysis of CNS causal networks and analysis of CNS small world networks, which are briefly considered.

Investigation of CNS causal relationships, the time dependent directional movement of information, may be important in the study of traumatic brain injury. As previously noted, Goldstein’s pioneering work on the behavioral neurology of traumatic brain injury led him to conclude that restitution of function following injury resulted from adaptation rather than from repair. This suggests that post-injury alteration of causal networks may provide a sensitive measure of altered CNS function following injury. While measures like correlation, coherence and mutual information can be used to establish the presence of correlative relationships between signals they do not provide any information about the direction of information movement. Additional procedures must be introduced. In most cases, the quantitative assessment of causal relationships between variables is constructed on the following idea. If measuring variable X improves the prediction of variable Y, then Y is, in this limited operational sense, causally dependent on X. It should be stressed that this relationship is not necessarily unidirectional. It can also be the case that with the same data, measuring Y also improves the prediction of X. This conceptualization of causality appears in Wiener (1956) and may be original with Wiener.

An early implementation of this operationalization of causality was published by Granger (1969) in the econometrics literature and popularized by Sims (1972). Granger causality is constructed using linear regression models. If past values of X are useful in predicting the current value of Y in a linear regression, then X is said to be a causal drive of time series Y. As with any statistical procedure, causality tests based on linear regression must be implemented with care. A growing literature has identified circumstances that lead to spurious identification of linear causality (Breitung and Swanson 2002; He and Maekawa 2001).

An extension of mutual information may provide a noise-robust measure of causality. Recall that the mutual information of time series X and Y, I(X, Y) is the average number of bits of one variable that can be predicted by measuring the other. Mutual information can be shown to be symmetrical, that is I(X, Y) = I(Y, X). Therefore while mutual information can establish the presence of a nonrandom relationship between time series, it cannot identify causal relationships. However, a time lagged mutual information in which one of the two variables is time shifted can be used to determine, if, for example, measuring variable X in the past allows prediction of future values of variable Y. We can shift time series X by lag τ and calculate I(Xτ, Y) as a function of τ. Similarly, we can calculate I(X, Yτ). If measuring Xτ allows better prediction of Y, than the other way around, then it can be argued that information is transferred from X to Y. The magnitude of the mutual information and the time lag which produces the greatest value can be used to quantify both the magnitude of the information transfer and the time delay associated with that transfer. A number of investigators have proposed using lagged mutual information to investigate information transfer in distributed systems (Kaneko 1986; Vastano and Swinney 1988; Albano et al. 1999). The procedure has a long history in electroencephalography. Inouye et al. (1983) used an “entropy analysis” which was what would now be described as directed mutual information to quantify the direction of information flow and concluded that the dominant longitudinal direction of alpha activity was anterior to posterior. A subsequent publication (Inouye et al. 1993) used directed mutual information to show change in information flow during a cognitively demanding arithmetic task. Mars and his colleagues (Mars and Lopes da Silva 1983; Mars et al. 1985) used mutual information to quantify time delays in the transmission of epileptic seizures. Several other investigators have used lagged mutual information to quantify between-channel information transfer in multichannel EEGs (Xu et al. 1997; Chen et al. 2000; Lopes da Silva 1987). Schreiber (2000), however, has presented valuable results which produced examples where standard lagged mutual information failed to detect information exchange. This motivated the construction of a related measure, transfer entropy, that successfully identified these relationships. The Schreiber results should be considered in the light of the previously cited Duckrow and Albano (2003) calculations that demonstrated the sensitivity of mutual information calculations on the choice of algorithm. This may have been a factor in the Schreiber study. Madulara et al. (2012) calculated transfer entropy using the EEG records analyzed in this paper. Mutual information was generally lower in the eyes open than in the eyes closed condition. In contrast, transfer entropies increased by a factor of two in the eyes open condition. As would be anticipated, the largest one-way transfer entropies were observed to and from the occipital lobe. Consistent with our previous recommendations, we suggest computing both measures (lagged mutual information and transfer entropy). Clinical utility is the final arbiter.

Stated in abstract terms a network is a collection of nodes and connections between the nodes. A small world network is defined as a network that has dense local clusters that are connected by a limited number of long range connections. In a seminal paper, Watts and Strogatz (1998) showed how small world networks can be characterized quantitatively. Small world networks are highly efficient. They can support a high degree of dynamical complexity with a minimum investment in connections (Latora and Marchiori 2001). This is an attractive metaphor for describing the central nervous system. Local networks provide areas of specialization, but these specialized domains can communicate efficiently with the entire brain by long range connections. When applied to multichannel EEG data, the electrode sites are the nodes and the connections are identified by correlation measures. Three types of connections can be identified. In a binary network, a connection is either present or absent. Operationally this is established by assigning a threshold value (connection present/absent) to a measure of correlation. In a weighted network, the value of a connection’s strength is assigned on a continuum determined by the correlation measure. In directed networks, the direction of information transfer, not just the strength of the connection, is incorporated into the analysis. These methods are now being utilized in the analysis of the central nervous system (Smith-Bassett and Bullmore 2006; Sporns and Honey 2006; Stam and Reijneveld 2007). Altered small world networks have been observed in clinical populations including patients with CNS tumors (Bartolomei et al. 2006), epilepsy (Ponten et al. 2007; van Dellen et al. 2009), schizophrenia (Rubinov et al. 2009), and Alzheimer’s disease (Stam et al. 2007a, b). As would be anticipated alterations in networks are associated with traumatic brain injury (Cao and Slobounov 2010; Nakamura et al. 2009; Tsirka et al. 2011; Zouridakis et al. 2011; Catsellanos et al. 2011a, b). The calculations presented in this paper and in Madulara et al. (2012) suggest that when calculated using an adaptive partition of the joint probability distribution, mutual information, lagged mutual information and transfer entropy can provide computationally efficient, noise-robust metrics for the analysis of CNS small world networks.

The mathematical results showing the efficiency of networks composed of highly connected local regions with limited, but essential, long range connections can inform the discussion of CNS localization of function. The localizationist conceptualization began with Broca’s localization of expressive aphasia to the third left frontal convolution (Broca 1861) and Wernicke’s localization of receptive aphasia to the posterior section of the superior temporal convolution (Wernicke 1908). By the early twentieth century, however, several neurologists argued against a strict localizationist model (Tesak and Code 2008). Kurt Goldstein was a significant contributor to the debate (Goldstein 1927; Ludwig 2012). Goldstein’s views were complex and it would be an oversimplification to describe his views as inflexibly antilocalizationist (Ludwig 2012). For example, in his Lokalisation in der Großhirnrinde, Goldstein recognizes Broca’s “flawless establishment of the dependency of the impairment of articulated speech from a lesion in the third left frontal convolution” (Goldstein 1927, translated Ludwig 2012). He similarly accepts Wernicke’s identification of the role of the superior temporal convolution in some presentations of receptive aphasia, but based on clinical observations Goldstein concluded that language functions could not be decomposed into discrete anatomically isolated components. Goldstein’s acceptance of localizationist results but his argument for the incompleteness of a localizationist account caused Geschwind (1997) to describe his views as a “paradoxical position.” Ludwig proposes that the paradox can be resolved by recognizing that Goldstein introduced a distinction between weak localization (the correlation of symptoms with lesions) and strong localization (the implementation of a process exclusively in a defined locality). We suggest that a quantitative examination of these questions can be constructed by comparing CNS network geometries generated by language dependent ERP tasks in healthy controls and in patients presenting well characterized aphasias.

Acknowledgments

The opinions and assertions contained herein are the private opinions of the authors and are not to be construed as official or reflecting the views of the United States Department of Defense. PER and BMR would like to acknowledge support from the Traumatic Injury Research Program of the Uniformed Services University of the Health Sciences, from the Defense Medical Research and Development Program and from the United States Marine Corps Systems Command. JDB, LCCA and AMA were supported in part by the Department of Science and Technology, Republic of the Philippines.

Appendix: Measures of correlation

Pearson product moment correlation

Let \( \{ {\text{X}}\} = \{ {\text{x}}_{1} ,{\text{x}}_{2} , \ldots ,{\text{x}}_{{{\text{N}}_{\text{D}} }} \} \) and \( \{ {\text{Y}}\} = \{ {\text{y}}_{1} ,{\text{y}}_{2} , \ldots ,{\text{y}}_{{{\text{N}}_{\text{D}} }} \} \) be time series of paired observations. ND is the number of elements in each set. The product moment correlation coefficient r is given by
$$ {\text{r}} = \frac{{\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {({\text{x}}_{\text{i}} - \overline{\text{x}} )({\text{y}}_{\text{i}} - \overline{\text{y}} )} }}{{\left\{ {\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {({\text{x}}_{\text{i}} - \overline{\text{x}} )^{2} } } \right\}^{1/2} \left\{ {\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {({\text{y}}_{\text{i}} - \overline{\text{y}} )^{2} } } \right\}^{1/2} }} $$
where \( \overline{\text{x}} \) is the mean of {X} and \( \overline{\text{y}} \) is the mean of {Y}. There are several procedures for calculating the probability of the null hypothesis of no correlation between {X} and {Y} (Press et al. 1992). A robust procedure that was used here is based on a t-distribution where
$$ {\text{t}} = {\text{r}}\left\{ {\frac{{{\text{N}}_{\text{D}} - 2}}{{1 - {\text{r}}^{2} }}} \right\}^{1/2} $$
and ν = ND − 2 is the number of degrees of freedom., The probability of the null hypothesis is
$$ {\text{P}}_{\text{NULL}} = I_{{\frac{\upnu }{{\upnu + {\text{t}}^{2} }}}} \left( {\frac{\upnu }{2},\frac{1}{2}} \right). $$
Ix(a, b) is the incomplete beta function.
The 95 % confidence limits for r, rLow and rHigh, can be computed by converting r to Fisher’s z.
$$ {\text{z}} = \frac{1}{2}\ln \left\{ {\frac{{1 + {\text{r}}}}{{1 - {\text{r}}}}} \right\} $$
Z is normally distributed and has a standard deviation \( \upsigma = 1/({\text{N}}_{\text{D}} - 3)^{1/2} \) (Press et al. 1992, p. 632). The 95 % confidence bounds are \( {\text{z}}_{\text{Low}} = 1 - 1.96\upsigma \) and \( {\text{z}}_{\text{High}} = 1 + 1.96\upsigma \). The corresponding values of r are found from
$$ {\text{r}} = \frac{{{\text{e}}^{{2{\text{z}}}} - 1}}{{{\text{e}}^{{2{\text{z}}}} + 1}} $$
Press et al. (1992, p. 631) suggest that this should be a legitimate calculation for ND > 10.

Spearman rank order correlation

As before let {X} and {Y} be time series of ND paired observations. \( \{ {\text{R}}_{\text{X}} \} = \left\{ {{\text{R}}_{{{\text{X}}_{1} }} ,{\text{R}}_{{{\text{X}}_{2} }} , \ldots ,{\text{R}}_{{{\text{X}}_{{{\text{N}}_{\text{D}} }} }} } \right\} \) gives the ranks of the values of X. In cases of ties the average ranks are entered. {RY} is defined analogously. The Spearman rank order correlation, ρS, is the product moment correlation of ranks.
$$ \uprho_{\text{S}} = \frac{{\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {\left( {{\text{R}}_{{{\text{X}}_{\text{i}} }} - \overline{\text{R}}_{\text{X}} } \right)\left( {{\text{R}}_{{{\text{Y}}_{\text{i}} }} - \overline{\text{R}}_{\text{Y}} } \right)} }}{{\left\{ {\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {\left( {{\text{R}}_{{{\text{X}}_{\text{i}} }} - \overline{\text{R}}_{\text{X}} } \right)} } \right\}^{1/2} \left\{ {\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {\left( {{\text{R}}_{{{\text{Y}}_{\text{i}} }} - \overline{\text{R}}_{\text{Y}} } \right)} } \right\}^{1/2} }} $$
When there are no ties this becomes
$$ \uprho_{\text{S}} = 1 - \frac{{6\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{D}} }} {\left( {{\text{R}}_{{{\text{X}}_{\text{i}} }} - {\text{R}}_{{{\text{Y}}_{\text{i}} }} } \right)^{2} } }}{{{\text{N}}_{\text{D}}^{3} - {\text{N}}_{\text{D}} }} $$
It can be shown that ρS reduces to the Pearson product moment correlation when calculations are performed on ranks in the absence of ties. The probability of the null hypothesis (no correlation) is calculated as before with t taking the value tS.
$$ {\text{t}}_{\text{S}} = \uprho_{\text{S}} \left\{ {\frac{{{\text{N}}_{\text{D}} - 2}}{{1 - \uprho_{\text{S}}^{2} }}} \right\}^{1/2} $$
The Spearman rank order correlation is less sensitive to outliers than the product moment correlation. Importantly, the rank order correlation can detect nonlinear correlations provided that the relation between X and Y is approximately monotonic. A helpful example is given in Triola (2008, p. 713). The rank order correlation is less efficient than the product moment correlation in the sense that the nonparametric measure requires 100 observations to achieve the same results as 91 observations analyzed with the Pearson correlation (Triola 2008, p. 677).

Kendall rank order correlation

Consider two consecutive paired observations (Xi, Yi) and (Xi+1, Yi+1). If both X and Y increase, then Xi+1−Xi, Yi+1−Yi, and (Xi+1−Xi)(Yi+1−Yi) are positive. If both variables decrease between observation i and i + 1, then (Xi+1−Xi)(Yi+1−Yi) is again positive. If these two variables are negatively correlated between these two observations, then (Xi+1−Xi)(Yi+1−Yi) is negative. The Kendall rank correlation coefficient is constructed by examining these relationships over all possible pairs of observations. If (Xi+1−Xi)(Yj+1−Yj) is positive, then variable κ is increased by 1. If (Xi+1−Xi)(Yi+1−Yi) is negative then variable κ is decreased by 1. If it is zero, then κ is unchanged. These comparisons are made not just across temporally adjacent pairs, that is between (Xi, Yi) and (Xi+1, Yi+1), but rather for all possible (Xi, Yi) and (Xj, Yj) pairs. There are ND(ND − 1)/2 distinct pairs, giving κ a maximum possible value of ND(ND − 1)/2. Kendall’s τ is the normalized value of κ.
$$ \uptau = \frac{\upkappa }{{{\text{N}}_{\text{D}} ({\text{N}}_{\text{D}} - 1)/2}} $$
τ has a value between −1 and +1. In the null hypothesis of no association between {X} and {Y}, τ is normally distributed and has the standard deviation (Press et al. 1992, p. 637).
$$ \upsigma = \left\{ {\frac{{4{\text{N}}_{\text{D}} + 10}}{{9{\text{N}}_{\text{D}} ({\text{N}}_{\text{D}} - 1)}}} \right\}^{1/2} $$
The probability of the null hypothesis is computed from the complementary error function.
$$ {\text{P}}_{\text{NULL}} = \frac{2}{\sqrt \uppi }\int\limits_{|\uptau |/\sqrt 2 \upsigma }^{\infty } {{\text{e}}^{{ - {\text{t}}^{2} }} {\text{dt}}} $$

Again letting \( {\text{R}}_{{{\text{X}}_{\text{i}} }} \) and \( {\text{R}}_{{{\text{Y}}_{\text{i}} }} \) denote the ranks of {X} and {Y}, it is seen that \( ({\text{R}}_{{{\text{X}}_{\text{i}} }} - {\text{R}}_{{{\text{X}}_{\text{j}} }} )({\text{R}}_{{{\text{Y}}_{\text{i}} }} - {\text{R}}_{{{\text{Y}}_{\text{j}} }} ) \) has the same sign as (Xi − Xj)(Yi − Yj) and therefore κ calculated from ranks is identical to κ calculated using X and Y values. τ is therefore seen to be a nonparametric correlation that does not make any assumptions about the distributions of {X} and {Y}. It is generally observed that ρS and τ are highly correlated. This anticipation is borne out in the calculations presented here. τ provides a means of identifying monotonic correlations. A more general search for correlations which would include non-monotonic associations requires alternative measures.

Mutual information

Given {X} and {Y}, time series of paired observations. Again, ND is the number of elements in each set. The mutual information of variables X and Y, denoted I(X, Y), is the average number of bits of variable Y that can be predicted by measuring variable X. It can be shown (Cover and Thomas 1991) that mutual information is symmetrical; I(X, Y) = I(Y, X). For finite data sets I(X, Y) can be approximated by estimating the probability distributions of each variable and their joint probability distribution. Each variable’s distribution is approximated by a histogram. Let NX be the number of bins in the histogram of variable X. OX(i) is the occupancy of the i-th bin and PX(i) = OX(i)/ND is the probability of occupation in the i-th bin. (The procedure for determining NX and the upper and lower bound of each element of the partition is described presently.) NY is the number of elements in the histogram of variable Y. In the general case NX and NY are not necessarily equal. OY(i) and PY(i), j = 1, 2, …, NY are the corresponding occupancies and probabilities.

PXY(i, j), the joint probability distribution, is the probability that an (x, y) pair is an element in the i-th bin of the partition of the X axis and the j-th bin of the partition of the Y axis. Mutual information is defined by
$$ {\text{I}}({\text{X}},{\text{Y}}) = \sum\limits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{X}} }} {\sum\limits_{{{\text{j}} = 1}}^{{{\text{N}}_{\text{Y}} }} {{\text{P}}_{\text{XY}} ({\text{i}},{\text{j}})} } \log_{2} \left\{ {\frac{{{\text{P}}_{\text{XY}} ({\text{i}},{\text{j}})}}{{{\text{P}}_{\text{X}} ({\text{i}}){\text{P}}_{\text{Y}} ({\text{j}})}}} \right\} $$
where there is no contribution to the sum if PXY(i, j) = 0. If variables X and Y are statistically independent, then PXY(i, j) = PX(i)PY(j) and I(X, Y) = 0. Thus in a calculation of mutual information, the null hypothesis is statistical independence of variables X and Y, in which case I(X, Y) is indistinguishable form zero. Let EXY-NULL(i, j) be the expected occupancy of element (i, j) of the XY partition if X and Y are independent. Under the assumption of independence EXY-NULL(i, j) becomes
$$ {\text{E}}_{{{\text{XY}} - {\text{NULL}}}} (i,j) = {\text{N}}_{\text{D}} {\text{P}}_{\text{X}} ({\text{i}}){\text{P}}_{\text{Y}} ( {\text{j)}} $$
Let OXY(i, j) be the observed occupancy in each element of the partition. The corresponding value of χ2 is
$$ \upchi^{2} = \sum\limits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{X}} }} {\sum\limits_{{{\text{j}} = 1}}^{{{\text{N}}_{\text{Y}} }} {\frac{{\{ {\text{E}}_{{{\text{XY}} - {\text{NULL}}}} ({\text{i}},{\text{j}}) - {\text{O}}_{\text{XY}} ({\text{i}},{\text{j}})\}^{2} }}{{{\text{E}}_{{{\text{XY}} - {\text{NULL}}}} ({\text{i}},{\text{j}})}}} } $$
The number of degrees of freedom, υ, is given by υ = (NX − 1)(NY − 1). The probability of the null hypothesis of statistical independence is
$$ {\text{P}}_{\text{NULL}} = {\text{Q}}\left( {\frac{\upsilon }{2},\frac{{\upchi^{2} }}{2}} \right) $$
Q is the incomplete gamma function.
$$ {\text{Q}}({\text{a}},{\text{x}}) = \frac{1}{{\Upgamma ({\text{a}})}}\int\limits_{\text{x}}^{\infty } {{\text{e}}^{{ - {\text{t}}}} } {\text{t}}^{{{\text{a}} - 1}} {\text{dt}} $$
$$ \Upgamma (a) = \int\limits_{0}^{\infty } {{\text{e}}^{{ - {\text{t}}}} } {\text{t}}^{{{\text{a}} - 1}} {\text{dt}} $$
When examining finite data sets the estimated value of I(X, Y) is critically dependent on the partition of X and Y values used to estimate PX(i), PY(i) and PXY(i, j). Several different procedures can be used to estimate these distributions. We apply here a specific implementation of an algorithm using a nonuniform partition that was introduced in Cellucci et al. (2005). This algorithm considers the special case where the same number of elements, NE, is used to partition the X and Y variables; NE = NX = NY. The bins span the range xmin to xmax on the X axis and ymin to ymax on the Y axis. In this algorithm, the widths of the bins are varied independently on each axis to meet the criterion of uniform occupancy; that is each element has occupancy ND/NE = OX(i) = OY(j) giving PX(i) = PY(j) = 1/NE. It should be understood, however, that the values of PXY(i, j) will not be uniform. The equi-probable partition of each axis ensures that
$$ \sum\limits_{{{\text{i}} = 1}}^{{{\text{N}}_{\text{E}} }} {{\text{P}}_{\text{XY}} ({\text{i}},{\text{j}}) = \sum\limits_{{{\text{j}} = 1}}^{{{\text{N}}_{\text{E}} }} {{\text{P}}_{\text{XY}} ({\text{i}},{\text{j}}) = 1/{\text{N}}_{\text{E}} } } $$
But the PXY(i, j) values will be different. The assumption of a partition giving PX(i) = PY(j) = 1/NE gives
$$ {\text{E}}_{{{\text{XY}} - {\text{NULL}}}} ({\text{i}},{\text{j}}) = {\text{N}}_{\text{D}} {\text{P}}_{\text{X}} ({\text{i}}){\text{P}}_{\text{Y}} ({\text{j}}) = {\text{N}}_{\text{D}} /{\text{N}}_{\text{E}}^{2} $$
When estimating PXY(i, j) we must address the question, what is the appropriate value of NE? This is the two dimensional analog of the histogram problem, which is the appropriate number of bins in a histogram? The morphology of the distribution cannot be detected if the number of bins is too small. This is seen by consider the limiting case of a single bin. Conversely, if the number of bins is too large, occupancies are zero or one and again the shape of the distribution cannot be determined. The number of bins for either a one dimensional or two dimensional distribution should be as large as possible, but not too large. In this algorithm, NE is determined by applying a variant of the Cochran criterion (Cochran 1954) to EXY-NULL(i, j). This criterion requires EXY-NULL(i, j) ≥5 for at least 80 % of the elements of the partition. We impose a more conservative criterion and require EXY-NULL(i, j) ≥5 in all elements. NE is the largest positive integer satisfying this criterion. We have previously derived an expression for EXY-NULL(i, j) for an equi-probable partition of the X and Y axes. Our criterion on EXY-NULL(i, j) becomes
$$ {\text{E}}_{{{\text{XY}} - {\text{NULL}}}} ({\text{i}},{\text{j}}) = {\text{N}}_{\text{D}} /{\text{N}}_{\text{E}}^{2} \ge 5 $$
NE is the largest integer meeting the criterion \( ({\text{N}}_{\text{D}} /5)^{1/2} \ge {\text{N}}_{\text{E}} \). If, for example, ND = 8,192, then \( ({\text{N}}_{\text{D}} /5)^{1/2} = 40.447 \) and NE = 40. OX(i) and OY(j) will be either 204 or 205. The between bin differences of 204 or 205 occur because 8,192 is not a multiple of 40. The upper and lower bound of each element of the partition are varied to give the best possible approximation of PX(i) = PY(j) = 1/NE. When the bin assignments of X and Y values in the time series are known, PXY(i, j) can be determined. The estimate of mutual information and the probability of the null hypothesis then follow from the previous formulas.

Copyright information

© The Author(s) 2013

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.