# CuBIC: cumulant based inference of higher-order correlations in massively parallel spike trains

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.1007/s10827-009-0195-x

- Cite this article as:
- Staude, B., Rotter, S. & Grün, S. J Comput Neurosci (2010) 29: 327. doi:10.1007/s10827-009-0195-x

## Abstract

Recent developments in electrophysiological and optical recording techniques enable the simultaneous observation of large numbers of neurons. A meaningful interpretation of the resulting multivariate data, however, presents a serious challenge. In particular, the estimation of higher-order correlations that characterize the cooperative dynamics of groups of neurons is impeded by the combinatorial explosion of the parameter space. The resulting requirements with respect to sample size and recording time has rendered the detection of coordinated neuronal groups exceedingly difficult. Here we describe a novel approach to infer higher-order correlations in massively parallel spike trains that is less susceptible to these problems. Based on the superimposed activity of all recorded neurons, the cumulant-based inference of higher-order correlations (CuBIC) presented here exploits the fact that the absence of higher-order correlations imposes also strong constraints on correlations of lower order. Thus, estimates of only few lower-order cumulants suffice to infer higher-order correlations in the population. As a consequence, CuBIC is much better compatible with the constraints of *in vivo* recordings than previous approaches, which is shown by a systematic analysis of its parameter dependence.

### Keywords

Multichannel recordings Spike train analysis Higher-order correlations Cell assembly hypothesis## 1 Introduction

More than 50 years after its first conception (Hebb 1949), the idea that entities of thought or perception are represented by the coordinated activity of (large) neuronal groups has lost nothing of its attraction (e.g., Wennekers et al. 2003; Harris 2005; Sakurai and Takahashi 2006). Now known as the “assembly hypothesis”, the concept of a single, unified principle underlying such diverse computational tasks as association and pattern completion (e.g., Palm 1982), binding (e.g., Singer and Gray 1995), language processing (e.g., Wennekers et al. 2006), or memory formation and retrieval (e.g., Pastalkova et al. 2008) has inspired numerous theoretical and experimental neuroscientists. However, whether or not the dynamic formation of cell assemblies constitutes a fundamental principle of cortical information processing remains a controversial issue of current research (e.g., Singer 1999; Shadlen and Movshon 1999; van Vreeswijk 2006). While initially mainly technical problems limited the experimental surge for support of the assembly hypothesis (state-of-the-art electrophysiological setups in the last century allowed to record only few neurons simultaneously), the recent advent of multi-electrode array and optical imaging techniques reveals fundamental shortcomings of available analysis tools (Brown et al. 2004).

Based on the efficacy of synchronized presynaptic spikes to reliably generate output spikes (Abeles 1982; König et al. 1995; Diesmann et al. 1999), the temporal coordination of spike timing is a commonly accepted signature of assembly activity (e.g., Gerstein et al. 1989; Abeles 1991; Singer et al. 1997; Harris 2005). Consequently, approaches to detect assembly activity have focused on the detection of correlated spiking. Statistically, coordinated spiking of large neuronal groups has been associated to higher-order correlations among the corresponding spike trains (Martignon et al. 1995, 2000), where “genuine higher-order correlations” are assumed, if coincident spikes of a neuronal group cannot be explained/predicted by the firing rate and pairwise correlations alone. Mathematical frameworks for the estimation of such correlations exist (Nakahara and Amari 2002; Martignon et al. 1995; Schneider and Grün 2003; Gütig et al. 2003), however, run into combinatorial difficulties, as they assign one correlation parameter for each group of neurons and thus require in the order of 2^{N} parameters for a population of *N* simultaneously recorded neurons. Comparing the number of parameters to the available sample size of typical electrophysiological recordings (e.g. a population of *N* = 100 neurons implies ∼10^{30} parameters while 100 s of data sampled at 10 kHz provide ∼10^{6} samples) illustrates the principal infeasibility of this approach. In fact, the estimation of such higher-order correlations runs into severe practical problems even for populations of *N*∼10 neurons (Martignon et al. 1995, 2000; Del Prete et al. 2004).

Due to the limitations of multivariate approaches, most studies in favor of cortical cell-assemblies resort to pairwise interactions. And indeed, the existence and functional relevance of pairwise interactions has been demonstrated in various cortical systems and behavioral paradigms (e.g., Eggermont 1990; Vaadia et al. 1995; Kreiter and Singer 1996; Riehle et al. 1997; Kohn and Smith 2005; Sakurai and Takahashi 2006; Fujisawa et al. 2008). However, although pairwise analysis can indicate highly correlated groups of neurons (Berger et al. 2007; Fujisawa et al. 2008), knowledge of higher-order correlations is essential to conclude on large, synchronously firing assemblies (see examples in Fig. 4 for illustration). The importance to transcend pairwise descriptions of parallel spike trains is further underscored by the strong impact of higher-order correlations on the dynamics of cortical neurons (e.g., Bohte et al. 2000; Kuhn et al. 2003). Taken together, we believe that the increasing number of simultaneously observable neurons can only be exploited by analysis techniques that go beyond mere pairwise correlations but are nevertheless applicable given the limited sample size of electrophysiological data (Brown et al. 2004).

In this study, we present a novel cumulant based inference procedure for higher-order correlations (CuBIC). The method is specifically designed to detect the presence of higher-order correlations under the constraints of the limited sample size typical of standard experimental paradigms. This is achieved by combining three basic ideas. First, CuBIC is based on the binned and summed spiking activity across all recorded neurons (population spike count). This transforms the multivariate problem to estimate all ∼2^{N} model parameters into a parsimoniously parametrized univariate problem, thereby dramatically reducing the required sample size. Furthermore, pooling the spiking activity avoids the need for sorting the multi-unit spike trains recorded on a single electrode into isolated single neuron spike trains. Correlations among the individual spike trains are measured by the cumulants of the population spike count. Importantly, the cumulant correlations of higher order used here do not conform to the higher-order parameters of the exponential family used by, e.g., Schneidman et al. (2006). Second, we use the compound Poisson process (e.g., Snyder and Miller 1991; Daley and Vere-Jones 2005) as a flexible, intuitive, and analytically tractable model of correlated spiking activity. Here, correlations among spike trains are modeled by the insertion of additional coincident events in continuous time (Holgate 1964; Ehm et al. 2007; Johnson and Goodman 2008; Staude et al. 2010). As a consequence, the conceptual difference between the discretely sampled data (population spike count) and the continuous time spike trains is an explicit feature of the present framework. And third, we formalize the observation that even small correlations of lower order can imply synchronized spiking of large neuronal groups (Amari et al. 2003; Benucci et al. 2004; Schneidman et al. 2006).

The first and second ideas are elaborated in Section 2. Based on the compound Poisson process, Section 3 thoroughly formalizes the third idea, where we analytically derive confidence intervals for a hierarchy of statistical hypothesis tests. The tests are then combined to compute a lower bound \(\hat{\xi}\) for the order of correlations in a given data set . Importantly, the inferred lower bound \(\hat{\xi}\) can considerably exceed the order of the estimated correlations. Thereby, CuBIC avoids the direct estimation of higher-order correlations, which is practically infeasible for orders of \(m\gtrsim5\), but nevertheless reveals their presence. As shown by extensive Monte Carlo simulations (Section 4), using the third cumulant suffices to reliably detect existing correlations of order > 10 in large neuronal pools (here *N*∼100 neurons with average count-correlation coefficients of *c*∼0.01). Thus, CuBIC achieves an unprecedented sensitivity for higher-order correlations in scenarios with reasonable sample sizes, i.e. experiment durations of *T* < 100 s. The Discussion section relates the cumulant-based correlations used here to the higher-order interaction parameters of the exponential family (e.g., Martignon et al. 1995; Nakahara and Amari 2002; Shlens et al. 2006) and critically discusses the implications of the hypothesized compound Poisson process. Preliminary results have been presented previously in abstract form (Staude et al. 2007).

## 2 Measurement & model

### 2.1 Measurement

*T*denotes transpose), one pattern for every time bin of width

*h*. The components of

**X**are the binned, discretized spike trains, i.e.

*X*

_{i}(

*s*) is the spike count of the

*i*th neuron in the interval [

*s*,

*s*+

*h*). Given these patterns, we define the population spike count

*Z*(

*s*) at

*s*as the total spike count in the

*s*th bin (Fig. 1)

The variable *Z*(*s*), from here on referred to as the “complexity” of the population at *s*, counts the number of spikes that fall into the interval [*sh*,(*s* + 1)*h*), irrespective of the neuron IDs that emitted these spikes. In the case where the *X*_{i} are binary (“1” for one or more spikes in the bin, “0” for no spike), *Z*(*s*) is simply the number of neurons that spike in the time slice. As opposed to most other frameworks for correlation analysis (e.g., Aertsen et al. 1989; Martignon et al. 1995; Grün et al. 2002a; Nakahara and Amari 2002; Shlens et al. 2006), however, the method presented in this study does not assume binary variables. As a consequence, our term “complexity” as the “number of spikes per bin in the population” does not comply with the “number of active neurons per bin” used in binary frameworks.

*Z*(

*s*) in different bins as i.i.d. random variables, we here assume that

*Z*(

*s*) and

*Z*(

*s*+

*k*) are independent for

*k*≠ 0 (zero memory), and that the distribution of

*Z*(

*s*) does not depend on the time bin

*s*(stationarity). We name the resulting distribution of population spike counts

Despite ignoring the specific neuron IDs that contribute to the patterns **X**(*s*), the complexity distribution nevertheless contains information about the correlation structure of the population. For instance, if the counting variables {*X*_{i}}_{i ∈ {1,...,N}} are independent Poisson counts, the corresponding population spike count, being the sum of the independent Poisson variables, is again a Poisson variable, and thus the complexity distribution *f*_{Z} is a Poisson distribution. Correlations change the relative probabilities for patterns of high and low complexities as compared to the independent case, as can be seen by comparing the complexity distribution of the correlated population and its independent Poisson fit in Fig. 1 (blue bars and dashed gray line, respectively; see also Grün et al. 2008a and Louis and Grün 2009). We quantify such deviations from independence by means of the cumulants *κ*_{m}[*Z*] of the population spike count *Z*.

#### 2.1.1 Correlations and cumulants

Like the more familiar (raw) moments \(\operatorname{\mathrm{E}}[Z^{m}]\) of a random variable *Z*, the cumulants *κ*_{m}[*Z*] characterize the shape of its distribution (see Appendix A and, e.g., Stratonovich 1967; Gardiner 2003). Most common are the first two cumulants, the expectation and the variance. These admit simple expressions in terms of the moments: \(\kappa_{1}[Z]=\operatorname{\mathrm{E}}[Z]\) and \(\kappa_{2}[Z]=\operatorname{\mathrm{E}}[Z^{2}]-\operatorname{\mathrm{E}}[Z]^{2}=\operatorname{\mathrm{Var}}[Z]\). Similar expressions for higher cumulants, however, are exceedingly complicated (see Stuart and Ord 1987 for explicit expressions for *m* ≤ 10).

*m*= 2 and \(Z=\sum_{i=1}^{N}X_{i}\), for instance, we have the well-known variance-covariance relationship

Equation (1) shows that \(\operatorname{\mathrm{Var}}[\sum_{i=1}^{N}X_{i}]=\sum_{i=1}^{N}\operatorname{\mathrm{Var}}[X_{i}]\) if \(\sum_{i\neq j}\operatorname{\mathrm{Cov}}[X_{i},X_{j}]=0\), which is the case if the *X*_{i} are (pairwise) uncorrelated. Cumulant correlations of higher order, the so-called “mixed” or “connected” cumulants, are a straightforward generalization of the covariance in exactly this sense: if the population has neither pairwise nor triple-wise cumulant correlations, then \(\kappa_{3}[Z]=\sum_{i=1}^{N}\kappa_{3}[X_{i}]\) (see Appendix A for a concise definition, and Gardiner 2003, for a general introduction). Higher-order cumulant correlations generalize the covariance also with respect to the fact that \(\operatorname{\mathrm{E}}[X_{1}X_{2}]=\operatorname{\mathrm{E}}[X_{1}]\operatorname{\mathrm{E}}[X_{2}]\) implies \(\operatorname{\mathrm{Cov}}[X_{1},X_{2}]=0\). That is, if the multivariate random vector **X** decomposes into independent subgroups, then the cumulant correlations vanish (Streitberg 1990). For notational consistency, we will from now on stick to the cumulant notation, e.g., use “first/second cumulant” instead of the more familiar terms “mean/variance”.

A further consequence of Eq. (1) is that *κ*_{2}[*Z*] is influenced only by the single process statistics (via \(\sum_{i=1}^{N}\operatorname{\mathrm{Var}}[X_{i}]\)) and *pairwise* correlations (via \(\sum_{i\neq j}\operatorname{\mathrm{Cov}}[X_{i},X_{j}]\)). No higher-order correlations contribute to the second cumulant. This holds also for higher cumulants, i.e. *κ*_{m}[*Z*] depends on correlations among the *X*_{i} of maximal order *m*. And finally, in the same way that *κ*_{2}[*Z*] depends on pairwise correlations *and* the single process statistics, also *κ*_{m}[*Z*] does not measure pure *m*th order correlations, but depends on correlations of all orders up to *m*. While a correction of the second cumulant for the influence of the single process statistics would be straightforward (subtracting \(\sum_{i=1}^{N}\operatorname{\mathrm{Var}}[X_{i}]\) in Eq. (1), see Appendix B), correcting higher cumulants for the influence of correlations of lower order is exceedingly complicated. We therefore employ a parametric model for *Z*, the compound Poisson process (see next section), the parameters of which can be interpreted straightforwardly in terms of higher-order correlations among the *X*_{i}.

We would like to stress that the higher-order correlations defined by cumulants differ strongly from the higher-order parameters of the exponential family used by, e.g. Martignon et al. (1995), Nakahara and Amari (2002), Shlens et al. (2006). The relationship between these two frameworks is discussed in more detail in Section 5.1 (see also Staude et al. 2010).

### 2.2 Model

*discretized,*i.e. binned, population spike count

*Z*(

*s*) of the previous section, the proposed model operates in

*continuous*, i.e. unbinned time. That is, we model the process \(z(t)=\sum_{i=1}^{N}x_{i}(t)\), where \(x_{i}(t)=\sum_{j}\delta(t-t_{j}^{i})\) denotes the

*i*th unbinned, continuous-time spike train (

*i*= 1,...,

*N*) with spike-event times \(t_{j}^{i}\). The model we propose for

*z*(

*t*) is that of a compound Poisson process (CPP)

*t*

_{j}constitute a Poisson process, and the marks

*a*

_{j}are i.i.d. integer-valued random variables, drawn independently for all

*t*

_{j}. The marks

*a*

_{j}determine the number of neurons that fire at time

*t*

_{j}, and will be referred to as the “amplitude” of the event at time

*t*

_{j}. The probability that an event has a specific amplitude is determined by the amplitude distribution

*f*

_{A}, i.e. \(f_{A}(\xi)=\text{Pr}\{a_{j}=\xi\}\) (see Fig. 1). The Poisson process that generates the events

*t*

_{j}is called the “carrier process” of the model and its rate

*ν*is the “carrier rate”. Processes of this type are also referred to as generalized, or marked, Poisson processes (see e.g. Snyder and Miller 1991 for a general definition and Ehm et al. 2007 for an application to spike train analysis).

To interpret the CPP as the lumped process of a correlated population of *N* spike trains, and to utilize the CPP for the generation of artificial data, the events of *z*(*t*) are assigned to individual processes *x*_{i}(*t*) (*i* = 1,...,*N*). The simplest model draws the *a*_{j} process IDs that receive a spike at *t*_{j} as a random subset from {1,...*N*}, independently for all event times *t*_{j} (Fig. 1). This results in a homogeneous population of correlated Poisson processes, where all processes have the same rate, all pairs of processes have identical correlations, and so on. The SIP/MIP models presented in Kuhn et al. (2003) are special cases of this more general model (see also Appendix B for examples). As the inference procedure CuBIC presented in the remainder of this study is based solely on the summed activity, however, it is not affected by the details of the copying procedure. In particular, CuBIC does not presuppose that the population is homogeneous.

### 2.3 Relating measurement and model

*ν*and

*f*

_{A}in terms of correlations among the counting variables

*X*

_{i}, we now relate the former to the cumulants of the population spike count

*Z*. First of all, note that as the carriers process

*z*(

*t*) is a Poisson process, the population spike counts in different bins are i.i.d. For

*l*= 1,...,

*N*, define the processes

*y*

_{l}(

*t*) that determine all event times

*t*

_{j}in Eq. (2) with given amplitude

*a*

_{j}=

*l*(compare Appendix A). Then the CPP

*z*(

*t*) admits the representation \(z(t)=\sum_{l=1}^{N}l\cdot y_{l}(t)\). As a consequence, the discretized population spike count satisfies \(Z=\sum_{l=1}^{N}l\, Y_{l}\), where the

*Y*

_{l}are the counting variables obtained from the processes

*y*

_{l}(

*t*) using a bin size

*h*. As the event times

*t*

_{j}of

*z*(

*t*) follow a Poisson process and the subsequent amplitudes

*a*

_{j}are independent, the

*y*

_{l}(

*t*) are independent Poisson processes. The rate

*ν*

_{l}of

*y*

_{l}(

*t*) is given by

*ν*

_{l}=

*f*

_{A}(

*l*)·

*ν*(

*l*= 1,2,...). Hence, the

*Y*

_{l}are Poisson variables with parameter

*ν*

_{l}

*h*. As a consequence, all cumulants of

*Y*

_{l}are identical and given by

*κ*

_{m}[

*Y*

_{l}] =

*ν*

_{l}

*h*(

*m*= 1,2,...; Gardiner 2003). The scaling behavior of the cumulants \(\kappa_{m}[lY_{l}]=l^{m}\kappa_{m}[Y_{l}]\) for all

*m*(Mattner 1999) yields

*ν*

_{l}=

*f*

_{A}(

*l*)·

*ν*and \(\sum_{l=1}^{N}l^{m}\cdot f_{A}(l)=\operatorname{\mathrm{E}}[A^{m}]=:\mu_{m}[A]\), we finally have

Equation (3) is the central equation of this study and requires a few more remarks. First, it implies that the first two cumulants of the population spike count, *κ*_{1}[*Z*] and *κ*_{2}[*Z*], are determined solely by the carrier rate *ν* and the first two moments of the amplitude distribution, *μ*_{1}[*A*] and *μ*_{2}[*A*]. As *κ*_{1}[*Z*] and *κ*_{2}[*Z*] determine firing rates and pairwise correlations in the population (Appendix B, Eqs. (24) and (26)), we conclude that all CPP models with identical carrier rate *ν* and identical *μ*_{1}[*A*] and *μ*_{2}[*A*] generate populations with identical rates and pairwise correlations. Differences in higher-order correlations can thus be modeled by choosing different higher moments for *f*_{A} (see Section 2.4 for the precise relationship between the entries of the amplitude distribution and higher-order correlations in the population). This makes the CPP a very flexible and convenient model to generate artificial data sets with identical firing rates and pairwise correlations, yet different higher-order correlations (see Section 4 and Appendix B for examples, and Kuhn et al. 2003; Ehm et al. 2007; Staude et al. 2010 for alternative parametrizations).

Second, the moments of the strictly positive, integer-valued variable *A* increase with the order, i.e. satisfy *μ*_{m}[*A*] ≤ *μ*_{m + 1}[*A*] for *m* ≥ 1. By Eq. (3), this implies that also the cumulants of *Z* increase with the order, i.e. *κ*_{m}[*Z*] ≤ *κ*_{m + 1}[*Z*]. This is a reformulation of the fact that insertion of joint spikes as done in the CPP generates only positive correlations (e.g., Brette 2009; Johnson and Goodman 2008, Johnson and Goodman, unpublished manuscript), and it shows that certain combinations of cumulants cannot be realized by the CPP.

### 2.4 Terminology

The conceptual difference between the measured data (the counting variables *X*_{i} and the complexity distribution *f*_{Z}) and the parametric model that we assume to underlie this measurement (the CPP) is crucial for the remainder of this study. To illustrate this difference, note that *f*_{Z} depends on the bin size *h* used in the discretization. Larger bin widths increase the probabilities for high complexities, while smaller bin widths increase the probability for empty bins. The parameters of the CPP, on the other hand, do not change with the bin size. For instance, if all events of *z*(*t*) have amplitude *a*_{j} = 1, i.e. the amplitude distribution has a single peak at *ξ* = 1, the population consists of independent Poisson processes, irrespective of the bin size used for the analysis, or even the number of recorded neurons. In other words: patterns of complexity *Z* ≥ 2 can occur by chance, while events of amplitude *ξ* ≥ 2 do not occur by chance but imply correlations in the population. The order of the correlation, in turn, is determined by the amplitudes of the events in the corresponding CPP (see Appendix A for the proof):

**Theorem 1**

*Let*\(z(t)=\sum_{i=1}^N x_i(t)\)* be a compound Poisson process with amplitude distribution **f*_{A}*, and let *\({\bf{X}}=(X_1,\ldots,X_N)^T\)* be the vector of counting variables obtained from the x*_{i}(*t*)* with a bin width h**. Then the components of ***X*** have correlations of order m** if and only if f*_{A}* assigns non-zero probabilities to amplitudes *≥ *m.*

The above theorem confirms the intuitive conception that, within the framework of the CPP, correlations of a certain order *m* require injected coincidences into at least *m* processes (events of amplitudes ≥ *m*). Also, a population with only pairwise but no higher-order correlations has events of maximal amplitude *m* = 2. Importantly, correlations in the above theorem are defined strictly on the basis of the discretized counting variables *X*_{i}. As a consequence, they do not resolve (and do not depend on) the perfect temporal precision of the coincident events in the CPP: if the events of *z*(*t*) were copied into the individual processes with a temporal jitter that is small with respect to the bin size *h*, the correlations among the counting variables will hardly be affected (see Section 5.3.3 for a more detailed discussion). Note also that events of amplitude *ξ* induce not only correlations of order *ξ*, but also of orders < *ξ*.

## 3 Cumulant based inference of higher-order correlations (CuBIC)

*c*= 1. As

*c*= 1 implies identity for all pairs of spike trains, all four spike trains of this example must be identical (Fig. 2, left). In other words, a spike in one neuron implies joint spike events in all the other neurons. Thus, the data must have correlation of order 4.

*m*of the measured correlations (here

*m*= 2). CuBIC formalizes this example in the framework of the CPP, and generalizes it in two aspects:

- 1.
Assume the correlation coefficients had been somewhat smaller than 1: do we still need correlation of order 4 to explain the measured pairwise correlation? Or would correlation of order 3 as in Fig. 2, middle, or even of order 2 as in Fig. 2, right, suffice to explain the measured correlations? In other words: What order of correlation is minimally required to explain the measured pairwise correlations?

- 2.
Can we formulate a similar reasoning for measured correlations of higher order? What order of correlation is minimally required to explain the measured correlations of third (fourth,...) order?

### 3.1 Testing pairwise correlations (*m* = 2)

We approach the first aspect with a hierarchy of statistical hypothesis tests, labeled by the integer *ξ*. For fixed *ξ*, the null hypothesis \(H_{0}^{2,\xi}\) states that the pairwise correlations (thus the superscript “2”) in the population are compatible with the assumption that there is no correlation beyond order *ξ*; the alternative \(H_{a}^{2,\xi}\) states that correlation of order higher than *ξ* is necessary to explain the pairwise correlations. Rejection of e.g. \(H_{0}^{2,4}\) in favor of \(H_{a}^{2,4}\) thus implies the existence of correlation of at least order 5. The test statistics to decide between these alternatives is the second sample cumulant of the population spike count, and the compound Poisson process provides the framework to analytically derive the required confidence bounds.

#### 3.1.1 The null hypothesis \(H_{0}^{2,\xi}\)

The formulation of a null hypothesis requires a careful distinction between the random variable that describes the experiment, and the data sample that is tested against the hypothesis. For a fixed value of the test parameter *ξ*, this section formulates the null hypothesis \(H_{0}^{2,\xi}\), using the random variable *Z*′ that describes the population spike count of a neuronal population. Section 3.1.2 explains how a given data sample {*Z*′_{1},...,*Z*′_{L}} is tested against \(H_{0}^{2,\xi}\).

*Z*′, and (b) the maximal order of correlation in the model is

*ξ*. Using the amplitude distribution

*f*

_{A}to parametrize its correlation structure, this CPP model is the solution of the constrained maximization problem

*ν*and

*f*

_{A}are the CPP parameters that determine the population spike count

*Z*. We rewrite Eq. (4) by observing that the second constraint (\(f_{A}(k)=0\text{ for }k>\xi\)) together with Eq. (3) implies \(\kappa_{2}[{Z }]=\mu_{2}[A]\nu h=\sum_{l=1}^{\xi}l^{2}f_{A}(l)\nu h\). With

*ν*

_{l}=

*f*

_{A}(

*l*)·

*ν*(Section 2.3), we thus have

In Eq. (6), both the function to maximize (\(\vec{\xi}_{2}\cdot\vec{\nu}_{\xi}h\)) and the constraint (\(\vec{\xi}_{1}\cdot\vec{\nu}_{\xi}h=\kappa_{1}[Z']\)) depend linearly on the parameters \(\vec{\nu}_{\xi}=(\nu_{1},\ldots,\nu_{\xi})\). Problems of this type, so-called Linear Programming Problems, are uniquely solvable, e.g. using the Simplex Method (Press et al. 1992, Chapter 10.8). The solution yields the upper bound for the second cumulant \(\kappa_{2,\xi}^{*}\) and the corresponding parameter vector \(\vec{\nu}_{\xi}^{*}\). The carrier rate and amplitude distribution of the CPP that maximizes Eq. (6) are then given by \(\nu^{*}=\sum_{l=1}^{\xi}\nu_{l}^{*}\) and \(f_{A^{*}}^{}(l)=\nu_{l}^{*}/\nu^{*}\).

*ξ*. Then, the second cumulant of its population spike count

*Z*′ must be smaller than the upper bound \(\kappa_{2,\xi}^{*}\) computed in the previous section. We thus formulate the null hypothesis

*ξ*.

#### 3.1.2 Test statistics and their distributions

*κ*

_{1}[

*Z*′] and

*κ*

_{2}[

*Z*′] of the random variable

*Z*′. Given only a data sample {

*Z*′

_{1},...,

*Z*′

_{L}} of size

*L*, we estimate these quantities by the standard sample mean and (unbiased) sample variance (Stuart and Ord 1987)

*k*

_{2}under the null hypothesis. To derive this distribution, recall that \(\operatorname{\mathrm{E}}[k_{2}]=\kappa_{2}[Z']\) and \(\operatorname{\mathrm{Var}}[k_{2}]=\frac{\kappa_{4}[Z']}{L}-\frac{2\kappa_{2}[Z']^{2}}{L-1}\), where the

*κ*

_{i}[

*Z*′] are the unknown cumulants of the variable

*Z*′ that underlies the sample (Stuart and Ord 1987). The trick of CuBIC is to assume in \(H_{0}^{2,\xi}\) that

*Z*′ is the population spike count of the CPP that solves Eq. (6) after the unknown cumulant

*κ*

_{1}[

*Z*′] has been substituted with its estimate

*k*

_{1}. Under this assumption, all cumulants of

*Z*′ can be computed by inserting the model parameters

*ν*

^{*}and \(f_{A^{*}}\) into Eq. (3), which yields

*L*> 10.000 (roughly corresponding to 10–100 s of data, see Section 4.2 and Fig. 6(d)), the distribution of

*k*

_{2}under \(H_{0}^{2,\xi}\) is well approximated by a normal distribution. Taken together, under \(H_{0}^{2,\xi}\) the test statistics

*k*

_{2}is normal with mean \(\kappa_{2,\xi}^{*}\) and variance given by Eq. (7), such that the

*p*-value for the test against \(H_{0}^{2,\xi}\) is

The rejection of the null hypothesis \(H_{0}^{2,\xi}\) for a given *ξ* implies that the pairwise correlations in the data are not compatible with the assumption that there is no correlation beyond order *ξ*. Thus, rejecting \(H_{0}^{2,\xi}\) implies that *ξ* + 1 is a lower bound for the order of correlation.

### 3.2 Testing higher-order correlations (*m* > 2)

*m*= 2) to exploit also estimated correlations of higher order (

*m*> 2). The difference to the tests with

*m*= 2 is that the upper bound for the

*m*th cumulant involves all cumulants up to order

*m*− 1. The generalization of the maximization problem (Eq. (4)) reads

*κ*

_{i}[

*Z*′] are the cumulants of the random variable

*Z*′. Using the

*ν*-parametrization of Eq. (5), we can rewrite Eq. (8) as the Linear Programming Problem

^{m − 1}is a (

*ξ*×

*m*− 1)-dimensional coefficient matrix with entries \(\Xi_{i,l}^{m-1}=l^{i}\), and \(\vec{\kappa}_{m-1}[Z']=(\kappa_{1}[Z'],\ldots,\kappa_{m-1}[Z'])^{T}\) is the column-vector of the first

*m*− 1 cumulants of

*Z*′. Solving Eq. (9) yields both the maximal

*m*th cumulant \(\kappa_{m,\xi}^{*}\) and the rate vector \(\vec{\nu}_{\xi}^{*}\), which again can be used to compute the corresponding model parameters

*ν*

^{*}and \(f_{A}^{*}\).

*m*= 2, the (

*m*,

*ξ*)-null hypothesis \(H_{0}^{m,\xi}\) states that correlation of orders 1,2,...,

*m*are compatible with the assumption that there is no correlation beyond order

*ξ*. This implies that

*κ*

_{m}[

*Z*′] falls below the upper bound \(\kappa_{m,\xi}^{*}\), and we define

*Z*′

_{1},...,

*Z*′

_{L}} against \(H_{0}^{m,\xi}\), we estimate correlations of order

*m*by the

*m*th sample cumulant of the population spike count, the so-called the

*m*th

*k*-statistics

*k*

_{m}(Stuart and Ord 1987; the sample mean and unbiased sample variance defined above are the first two

*k*-statistics). And again, we assume that

*Z*′ is the population spike count of the CPP model that solves Eq. (4) after the unknown

*κ*

_{i}[

*Z*′] have been replaced by the estimates

*k*

_{i}(

*i*= 1,...,

*m*− 1). Then, the test statistics

*k*

_{m}is normally distributed with mean \(\kappa_{m,\xi}^{*}\) and variance \(\operatorname{\mathrm{Var}}[k_{m}]\), where expressions for \(\operatorname{\mathrm{Var}}[k_{m}]\) can be found in the literature (Stuart and Ord 1987). The

*p*-value of \(H_{0}^{m,\xi}\) is thus given by

Note that after substituting the true cumulant vector \(\vec{\kappa}_{m-1}[Z']\) by its estimate \(\vec{k}_{m-1}:=(k_{1},\ldots,k_{m-1})^{T}\) in Eq. (9), the resulting constraint \(\Xi^{m-1}\vec{\nu}_{\xi}=\vec{k}_{m-1}\) consists of *m* − 1 equations for the *ξ* positive parameters *ν*_{1},...,*ν*_{ξ}. Unfortunately, there can be combinations of estimated cumulants for which these equations cannot be solved. In this case, Eq. (9) does not posses a solution, and the corresponding null hypothesis cannot be tested (see also Section 3.3).

### 3.3 Computing the lower bound

In the preceding section, CuBIC was presented as a collection of hypothesis tests \(H_{0}^{m,\xi}\), labeled by the indices *m* (the order of the estimated cumulant) and *ξ* (the maximal order of correlation in the null). We now combine these tests to infer a lower bound \(\hat{\xi}\) for the order of correlation in a given data set. To do so, recall that rejecting \(H_{0}^{m,\xi}\) for given *m* and *ξ* implies that the combination of the first *m* cumulants requires correlation of order > *ξ*. As a consequence, every rejected hypothesis \(H_{0}^{m,\xi}\) implies that *ξ* + 1 is a lower bound for the highest order of correlation in the data. The question thus is: which of these bounds should we use?

Evidently, we aim to infer the highest order of correlation that is present in the data. Hence, the objective is to find the maximum of all the lower bounds that were obtained from the hierarchy of tests. We thus aim for the pair (*m*,*ξ*) with the highest value of *ξ* such that the corresponding null hypothesis \(H_{0}^{m,\xi}\) is rejected.

*m*, the inner loop searches the highest

*ξ*for which \(H_{0}^{m,\xi}\) is rejected; the outer loop iterates over subsequent orders

*m*. The free parameters of the algorithm are the test level

*α*, and, to ensure termination of the loops, upper bounds for the cumulant (

*m*

_{max}) and the maximal order of correlation assumed in the null (

*ξ*

_{max}; see Section 5 for their choice). After initializing the test variables (

*m*= 2,

*ξ*= 1, \(\hat{\xi}_{m}=1\)), we estimate the first

*m*cumulants from the data by computing the corresponding

*k*-statistics. Next, we check if Eq. (9) is solvable for the current values of

*m*and

*ξ*(yellow box). This check consists of two steps. The first step checks if the first

*m*− 1 estimated cumulants increase with order

*m*, a requirement of the CPP model assumed to underlie the data (left rhomb in yellow box; compare last paragraph of Section 2.3). If this is not the case, then

*k*

_{1}≤

*k*

_{2}≤ ... ≤

*k*

_{m′ − 1}is false for all

*m*′ ≥

*m*, which implies that Eq. (9) cannot be solved for

*m*′ ≥

*m*. If the solution, i.e. the maximal cumulant \(\kappa_{m',\xi}^{*}\) and the model parameters

*ν*

^{*}and \(f_{A^{*}}\), are not available, however, the corresponding

*p*-values (Eq. (10)) cannot be computed. In this case, no hypothesis \(H_{0}^{m',\xi}\) with

*m*′ ≥

*m*can be tested, and the procedure is terminated. If the first

*m*− 1 estimated cumulants are in principle compatible with the CPP, we check in the second step if the current value of

*ξ*can solve Eq. (9) (right rhomb in yellow box). If this is not the case,

*ξ*is increased by 1 until either Eq. (9) can be solved, or

*ξ*reaches its upper bound

*ξ*

_{max}. In the former case, the data are tested against \(H_{0}^{m,\xi}\) (blue box), in the latter case the test is skipped. At rejection of \(H_{0}^{m,\xi}\) (rightward arrow at the box

*p*

_{m,ξ}<

*α*) we set \(\hat{\xi}_{m}=\xi+1\) and, unless

*ξ*reached the upper bound

*ξ*

_{max}, repeat the inner loop with

*ξ*set to

*ξ*+ 1. The loop terminates with \(\hat{\xi}_{m}=\max\{\xi|H_{0}^{m,\xi}\text{ rejected}\}+1\), which is the largest lower bound for the current value of

*m*. As we reject the “zeroth” hypothesis \(H_{0}^{m,0}\) by convention (every data set has correlations of order > 0), this holds even if the tests for the current value of

*m*were skipped. The inner loop is repeated with new

*m*set to

*m*+ 1, until

*m*reaches the upper bound

*m*

_{max}. Finally, the lower bounds obtained for the different values of

*m*are compared, and their maximum is returned as the absolute lower bound \(\hat{\xi}\). A MatLab-implementation of the proposed algorithm is available upon request.

## 4 Simulation results

### 4.1 Illustration

*N*= 100) with identical firing rates (

*λ*= 10 Hz) and pairwise correlations (

*c*= 0.01 in

*N*

_{C}= 30 and

*c*= 0 in the remaining 70 neurons). The difference between the data sets lies solely in their higher-order correlation structure, i.e. their maximal order of correlation (Fig. 4, top panels). All sets have amplitudes at

*ξ*= 1 that generate independent “background spikes”. Correlations are induced by additional events of amplitudes

*ξ*

_{syn}> 1 that differ across data sets:

*ξ*

_{syn}= 2 in Set1 (left),

*ξ*

_{syn}= 7 in Set2 (middle), and

*ξ*

_{syn}= 15 in Set3 (right). Thus, Set1 has correlation of order 1 and 2, Set2 has correlations up to order 7, and Set3 has correlations up to order 15. The identical pairwise correlations in the three data sets are achieved by adjusting the rate \(\nu_{{\xi_{syn}}}=\nu\cdot f_{A}({\xi_{syn}})\) of the higher-order events (see Appendix B for the relationship between the model parameters

*ν*,

*f*

_{A}and the population statistics

*N*,

*N*

_{C},

*λ*and

*c*). Test parameters were set to

*ξ*

_{max}= 15 and

*m*

_{max}= 4.

The similarity of the raster displays and population spike counts (Fig. 4, 2nd and 3rd from top) of the three data sets may illustrate that such standard visualization methods cannot reveal the differences in the higher-order properties of the data sets. Larger probabilities for patterns of high complexity induced by the higher-order correlations present in Set2 and Set3 are visible in the complexity distributions (third row of panels) only when plotted on a logarithmic scale (green lines). The low rates of the higher-order events imposed by the small pairwise correlations (*ν*_{7} = 2.07 Hz in Set2, *ν*_{15} = 0.41 Hz in Set3) makes their detection in the three distributions (bars) very difficult on a linear scale.

The identical pairwise correlations in all three data sets are reflected in identical results for tests with *m* = 2. For all data sets, \(H_{0}^{2,\xi}\) is rejected (*p* < 0.05, indicated by red marks) only for *ξ* = 1, implying that the pairwise correlations are significant and events of amplitude 1 are not enough to explain the data. As \(H_{0}^{2,\xi}\) is retained for *ξ* > 1 in all data sets (*p* > 0.05), the lower bound for *m* = 2 is \(\hat{\xi}_{2}=2\) (see Fig. 3), i.e. the tests with *m* = 2 do not imply correlation of order ≥ 2 in any of the data sets.

Test results with *m* = 3 differ strongly for the three data sets (Fig. 4, lower panels, right graphs). For Set1, only \(H_{0}^{3,1}\) is rejected and all \(H_{0}^{3,\xi}\) for *ξ* ≥ 2 are retained. Hence \(\hat{\xi}_{3}=2\). For Set2, \(H_{0}^{3,\xi}\) is rejected for *ξ* = 1,...,6, yielding \(\hat{\xi}_{3}=7\), while for Set3 the rejection of all null-hypotheses for *ξ* ≤ 12 yields \(\hat{\xi}_{3}=13\). In all tests with *m* = 4 (data not shown), the smallest values for *ξ* that solved Eq. (9) yielded non-significant *p* values (for Set1 *p*_{4,2} = 0.5, for Set2 *p*_{4,9} = 0.8, for Set3 *p*_{4,16} = 0.67). Hence, no hypothesis with *m* = 4 was rejected except for \(H_{0}^{4,0}\) which is rejected by convention. Thus, \(\hat{\xi}_{4}=1\) in all data sets. For the total lower bounds \(\hat{\xi}=\max_{m}\{{\hat{\xi}_{m}}\}=\max{\{\hat{\xi}_{2},\hat{\xi}_{3},\hat{\xi}_{4}\}}\), we thus obtain \(\hat{\xi}=2\) for Set1, \(\hat{\xi}=7\) for Set2, and \(\hat{\xi}=13\) for Set3.

The lower bound corresponds to the maximal order of correlation in Set1 (*ξ*_{syn} = 2) and Set2 (*ξ*_{syn} = 7). Only for Set3 the lower bound falls short of the true maximum by 2 (*ξ*_{syn} = 15), but nevertheless indicates correlations of higher order than in Set2. Taken together, the three examples illustrate that CuBIC reliably detects differences in higher-order statistics, even if firing rates (here *λ* = 10 Hz) and very small pairwise correlations (here *c* = 0.01 in 30 out of 100 neurons) are identical.

### 4.2 Parameter dependence of test performance

Several parameters are likely to influence CuBIC’s performance. For instance, the number of recorded neurons, *N*, the collection of firing rates, *λ*_{1},...,*λ*_{N}, the duration of the experiment, *T*, and the bin size, *h*, influence the amount of available data and can thus be expected to effect test results. Also the correlation structure, i.e. the rate and the order of injected coincidences, parametrized by the *N* − 1 probabilities of the amplitude distribution *f*_{A}, is likely to affect the performance of the method. However, some of the parameters mentioned above are in fact redundant with respect to the test statistics used here, i.e. the *k*-statistics of the population spike count *Z*. For instance, *Z* depends only on the summed firing rate \(\Lambda=\sum_{i=1}^{N}\lambda_{i}\), but not on the precise combination of the rates of individual neurons. To analyze the parameter dependence of CuBIC (Section 4.2.3), we focus on parameters that are relevant in experimental data and avoid the above-mentioned redundancies. Furthermore, we restrict our analysis to tests with the third cumulant, thus study the inner loop in Fig. 3 for *m* = 3. To keep notation simple, we drop the subscript 3 and write \(\hat{\xi}=\hat{\xi}_{3}\) in the remainder of this section. The upper bound for *ξ* was *ξ*_{max} = 30 in all simulations.

#### 4.2.1 Parameters

*T*, the total firing rate of the population, \(\Lambda=\sum_{i=1}^{N}\lambda_{i}\), the bin width used to discretize the data,

*h*, and two parameters,

*ξ*

_{syn}and

*ρ*, that characterize the correlation structure (amplitude distribution,

*f*

_{A}) of the data. We here consider amplitude distributions that consist of two isolated peaks only: a “background-peak” at

*ξ*= 1 and a second peak at

*ξ*=

*ξ*

_{syn}that determines the maximal order of correlation in the data (see for examples Fig. 4, top row). The relative height of the two peaks is parametrized by the moments ratio

*i*th moment of

*A*.

*μ*

_{i}[

*A*] in Eq. (11) by the cumulants of

*Z*via Eq. (3) shows that

*ρ*equals the Fano Factor of the population spike count, i.e. \(\rho=\frac{\kappa_{2}[Z]}{\kappa_{1}[Z]}=\frac{\operatorname{\mathrm{Var}}[Z]}{\operatorname{\mathrm{E}}[Z]}\) (see e.g., Kumar et al. 2008; Kriener et al. 2008). This interpretation is valid for arbitrary amplitude distributions, and provides a method to estimate

*ρ*from data. For a more intuitive interpretation of

*ρ*, consider a population of

*N*neurons with identical firing rates, where a sub-population of

*N*

_{C}neurons forms a homogeneously correlated subgroup, while the remaining

*N*−

*N*

_{C}neurons fire independently (compare rasters displays in Fig. 4). In this case

*ρ*relates to the (average) pairwise count correlation coefficient

*c*of the correlated subgroup

*N*=

*N*

_{C}, this simplifies to \(c=\frac{\rho-1}{N-1}\) (compare e.g., Kumar et al. 2008; Kriener et al. 2008). In either case, the correlation parameter

*ρ*determines the effect of higher-order events on the pairwise correlations in the population.

#### 4.2.2 Quantification of test performance

*ρ*gradually sharpens and shifts \(p(\hat{\xi})\) to higher values of \(\hat{\xi}\) (Fig. 5(b)). CuBIC performs optimally if the lower bound corresponds to the maximal order of correlation in all Monte-Carlo simulations, i.e. if \(\hat{\xi}={\xi_{syn}}\) for all data sets. In this case, \(p(\hat{\xi})\) is a delta-peak located at

*ξ*

_{syn}(Fig. 5(b), top panel).

CuBIC performs optimally if \(p(\hat{\xi})\) is a delta peak as in Fig. 5(b), top panel, i.e. if *ξ*_{05} + 1 = *ξ*_{syn} = *ξ*_{95}. More generally, a value of *ξ*_{05} ≫ 0 indicates that CuBIC reliably infers the existence of higher-order correlations (\(\hat{\xi}\gg0\) in 95% of all cases), while a value of *ξ*_{95} ≤ *ξ*_{syn} guarantees that the actual order is not overestimated (\(\hat{\xi}>{\xi_{syn}}\) in less than 5% of all cases, no false positives).

*k*

_{2}<

*k*

_{1}, represent untestable data sets and yield lower bounds of \(\hat{\xi}=1\). To avoid border effects for extremely small pairwise correlations (

*ρ*< 1.05), we set \(\hat{\xi}=1\) not only if

*k*

_{2}<

*k*

_{1}but also if

*k*

_{2}does not significantly exceed

*k*

_{1}, i.e. if \(H_{0}^{2,1}\) is retained (marked as dotted lines in Fig 6(c)).

#### 4.2.3 Results

Figure 6 shows the dependence of CuBIC’s performance on the population firing rate Λ (Fig. 6(a)), the duration of the experiment *T* (Fig. 6(b)), the correlation strength (population Fano Factor) *ρ* (Fig. 6(c)) and the bin width *h* (Fig. 6(d)) for different values of *ξ*_{syn} (crosses in Fig. 6(a–d) denote the default parameters that are fixed while only one of them is varied).

Changes in the population firing rate Λ, the duration of the experiment *T*, and the correlation strength *ρ* have qualitatively very similar effects on the performance (Fig. 6(a), (b), and (c), respectively). For all analyzed values of *ξ*_{syn} (dark blue, green, and yellow lines) and small values of either of the parameters, i.e. Λ = 10 Hz (Fig. 6(a)), *T* = 1 s (Fig. 6(b)) or *ρ* = 1.0087 (Fig. 6(c)), the distributions of lower bounds \(p(\hat{\xi})\) extend from *ξ*_{05} = 0 (lower lines) to *ξ*_{95} < *ξ*_{syn} (upper lines below colored triangles). Hence, lower bounds might not exceed the trivial value of \(\hat{\xi}=1\) for such small parameter values. Increasing either of the parameters gradually shifts \(p(\hat{\xi})\), i.e. *ξ*_{05} and *ξ*_{95}, to higher values of \(\hat{\xi}\), thus indicating improved performance. The exact quality of the performance, captured by the width of \(p(\hat{\xi})\), i.e. the distance between *ξ*_{05} and *ξ*_{95}, and in particular its difference from *ξ*_{syn} (colored triangles, compare also Fig. 5(b)), depends crucially on the order of correlation *ξ*_{syn} present in the data.

For *ξ*_{syn} = 7 (dark blue lines), CuBIC performs optimally (*ξ*_{05} + 1 = 7 = *ξ*_{95}, compare Fig. 5(b)) for experiment durations of *T* > 100 s (Fig. 6(b)) and a wide range population Fano Factors (*ρ* ≥ 1.17, Fig. 6(c)). Such optimality is not guaranteed for the analyzed range of the population firing rate Λ, although population rates above Λ = 400 Hz yield a lower percentile of *ξ*_{05} = 5, indicating close-to-optimal performance (\(\hat{\xi}\geq6={\xi_{syn}}-1\) in 95% of all cases, Fig. 6(a)). Smaller values of either of the parameters shift \(p(\hat{\xi})\) to lower values, indicating impaired performance. Nevertheless, as the lower percentile *ξ*_{05} is considerably larger than zero even for population rates as low as Λ = 20 Hz, experiment durations of *T* = 6 s and population Fano Factors of *ρ* = 1.02, CuBIC reliably detects the presence of higher-order correlations for a wide range of parameter combinations.

In the other extreme, where the same pairwise correlations (identical value of *ρ*) are realized by correlations of order *ξ*_{syn} = 30 (yellow lines), \(p(\hat{\xi})\) falls short of its optimum (*ξ*_{05} + 1 = 30 = *ξ*_{95}, see Fig. 5(b), top panel) even for high values of all investigated parameters, i.e. population rates of Λ = 10^{4} Hz, experiment durations of *T* = 10^{3} s and correlation strengths of *ρ* = 5.35 (Fig. 6(a), (b) and (c), respectively). Note, however, that also here *ξ*_{05} ≫ 0 even for moderate values of Λ, *T* and *ρ*, implying that the presence of higher-order correlations is reliably detected, albeit not their actual order. For the default values Λ = 10^{3} Hz, *T* = 100 s and *ρ* = 1.087 (yellow crosses in Fig. 6(a–d)), for instance, lower bounds for *ξ*_{syn} = 30 fall between \(\hat{\xi}=20\) and \(\hat{\xi}=24\) in 90% of the analyzed data sets (*ξ*_{05} = 19, *ξ*_{95} = 24).

Test performance for *ξ*_{syn} = 15 (green lines) is somewhere in between the often optimal results for *ξ*_{syn} = 7 and the more indicative results for *ξ*_{syn} = 30. In short, optimal performance is achieved only for high population Fano Factors (*ρ* ≥ 3.75, rightmost black triangle in Fig. 6(c)), while the presence of higher-order correlations is reliably detected (\(\hat{\xi}\gg0\)) for a wide range of parameters (Λ > 70 Hz, Fig. 6(a), *T* ≥ 80 s, Fig. 6(b), and *ρ* ≥ 1.02 Hz, Fig. 6(c)).

The qualitative similarity between Fig. 6(a) and (b) result from the identical effect of the varied parameters, Λ and *T*, on the expected spike count *n* = Λ*T* (Fig. 6(a), (b), top *x* axis). As *n* affects the quality of the estimators *k*_{i} , both Λ and *T* influence test results in a similar manner. The similar *ρ*-dependence results from the fact that increasing *ρ* increases the probability for high-amplitude events *f*_{A}(*ξ*_{syn}), which simplifies their detection (Fig. 6(c)).

Changes in the bin width *h* (Fig. 6(d)) affect CuBIC’s performance for all values of *ξ*_{syn} in a very similar way. Increasing *h* decreases the lower percentile from *ξ*_{05} ≫ 0 at *h* = 0.1 ms to *ξ*_{05} = 0 for *h* = 100 ms, irrespective of *ξ*_{syn}. The upper percentiles *ξ*_{95} (upper lines) can even show non-monotonic behavior: for *ξ*_{syn} = 15 (green) and *ξ*_{syn} = 30 (yellow), *ξ*_{95} increases for values between 1 ≤ *h* ≤ 60 ms and decreases for *h* > 60 ms. The median of \(p(\hat{\xi})\), which lies roughly in the middle of *ξ*_{05} and *ξ*_{95}, however, remains approximately constant for *h* ≤ 60 ms, at least for *ξ*_{syn} = 15 and *ξ*_{syn} = 30 (green and yellow, respectively). Therefore, the average lower bound remains approximately constant as long as *h* ≤ 10 (*L* ≥ 10^{4}, top axis), but only the variability of \(\hat{\xi}\), i.e. the distance between *ξ*_{05} and *ξ*_{95}, increases for this range of *h*.

We wish to stress that the larger variability of \(\hat{\xi}\) for increased bin sizes is not so much due to the increased average spike count per bin *μ* = Λ*h*, but rather the resulting decreased sample size \(L=\frac{T}{h}\) (Fig. 6(d), top *x*-axis). The expected spike count *μ* = Λ*h* changes with the population rate Λ and the bin width *h* in the exact same manner. However, while increasing *μ* via Λ (Fig. 6(a)) from the default value Λ = 10^{3} Hz (crosses) to Λ = 10^{4} Hz does not alter test performance for *ξ*_{syn} = 7 (dark blue lines), increasing *μ* via the bin width from *h* = 1 ms to *h* = 10 ms (Fig. 6(d)) drastically reduces the lower percentile *ξ*_{05} (lower dark blue line in Fig. 6(d)). The decrease in sample size increases the variance of our test statistic (compare Eq. (7)) and therefore decreases test power.

#### 4.2.4 Summary of results

The simulation results of the previous section show that CuBIC reliably detects the presence of higher-order correlations for a wide range of parameter values. As CuBIC is not designed to directly estimate the order of correlation, the inferred lower bound might not always correspond to the maximal order of correlation present in a given data set. The most important parameter is the sample size *L*, which is required to be ≥ 10^{4} to reliably detect higher-order correlations (Fig. 6(d)), corresponding to e.g. an experiment of *T* = 50 s duration analyzed with a bin size of *h* = 5 ms. Furthermore, total spike counts of *n* = 10^{4} (e.g. *N* = 100 neurons firing at 1 Hz for *T* = 100 s; Fig. 6(a) and (b)) and a correlation parameter (population Fano Factor) in the order of *ρ* = 1.087 suffice for the test to reliably detect the presence of higher-order correlations (lower percentiles *ξ*_{05} > 0, green and dark blue lower lines Fig. 6(a–c)) if correlations in the data are of moderate order (*ξ*_{syn} = 7 and *ξ*_{syn} = 15, green and dark blue lines).

## 5 Discussion

The identification of active cell-assemblies in simultaneously recorded spike trains has been the topic of extensive research (Gerstein et al. 1985, 1989; Sakurai 1998; Martignon et al. 2000; Grün et al. 2002a, b; Harris 2005; Schrader et al. 2008). Analysis tools that go beyond pairs of neurons, however, face severe limitations due to limited sample sizes of typical electrophysiological recordings (Martignon et al. 1995, 2000; Brown et al. 2004). The cumulant based inference of higher-order correlations (CuBIC) presented in this study avoids the need for extensive sample sizes. This is achieved by a combination of three ingredients: (1) pooling, i.e. using the population spike count and its cumulants to measure correlations in the population; (2) using the compound Poisson process (CPP) as a parametric model for correlated spiking populations; and (3) exploiting the constraining relations among correlations of lower and higher order in large neuronal pools.

Initially, we presented CuBIC as a hierarchy of hypothesis tests \(H_{0}^{m,\xi}\), indexed by the order of the estimated cumulants *m* and the maximal order of correlation *ξ* allowed in the null. The algorithm of Fig. 3 combines these tests to infer a lower bound \(\hat{\xi}\) on the order of correlation in a given data set (see also Section 3.3). As free parameters, the algorithm requires upper bounds for *m* (*m*_{max}) and *ξ* (*ξ*_{max}) to be chosen by the experimenter. To chose *m*_{max}, recall that CuBIC is extremely sensitive for higher-order correlations already with *m* = 3 (Fig. 6). For the parameters analyzed here, tests with *m* = 4 increased \(\hat{\xi}\) only occasionally, as the smallest *ξ* that solved Eq. (9) typically yielded non-significant *p*-values (data not shown). Furthermore, the reliable estimation of high cumulants is known to require vast sample sizes. Therefore, although tests with *m* > 3 can in principle yield additional information, we do not expect that choosing the bound for *m* larger *m*_{max} = 4 improves test results in situations with sample sizes compatible with electrophysiological experiments. To chose the upper bound for *ξ*, note that *ξ*_{max} is also an upper bound for \(\hat{\xi}\). As the maximal order of correlation in the data has to be smaller than the number of recorded neurons *N*, we know that \(\hat{\xi}\leq N\) must hold. This suggests to chose *ξ*_{max} = *N*, as higher values of *ξ*_{max} cannot increase \(\hat{\xi}\). If the analysis was based on multi-unit activity and the number of recorded neurons is unknown, an a posteriori criterion on whether the chosen value for *ξ*_{max} was high enough can be obtained by inspecting tests with *ξ* = *ξ*_{max}. That is, if \(H_{0}^{m,\xi_{max}}\) was rejected for some *m*, hypothesis with *ξ* > *ξ*_{max} might also have been rejected, potentially resulting in higher \(\hat{\xi}\). Thus, if tests against \(H_{0}^{m,\xi_{max}}\) are retained for all *m*, the chosen *ξ*_{max} was high enough.

The procedure was calibrated with rather restricted model classes, namely amplitude distributions with two isolated peaks. Thus, the question arises how differences in the correlation structure would influence CuBICs performance. In the numerical examples of this study, CuBIC used only the first three cumulants of the data, which depend on the first three moments of the amplitude distribution (compare Eq. (3)). As a consequence, if the data under consideration had a different amplitude distribution, however with similar first three moments, CuBIC will produce similar results. If, for instance, the additional isolated peak at *ξ*_{syn} is exchanged by a somewhat distributed peak centered at *ξ*_{syn}, the inferred lower bound will be similar.

Importantly, CuBIC is not designed to directly estimate the “true” maximal order of correlation in the data. Furthermore, the neuron IDs that realize the correlations are currently not resolved. It is important to keep in mind, however, that approaches that pose such specific question to the data face severe limitations with respect to the size of the analyzed populations (e.g., Shlens et al. 2006; see also Section 5.1). Thus, we regard the less specific lower bound \(\hat{\xi}\) obtained by CuBIC as the price to pay when analyzing large populations. Renouncing the specific questions is precisely what allows CuBIC to detect higher-order correlations in data sets with previously unreached population size (*N*∼100) and sample sizes compatible with physiological experiments (see Figs. 4 and 6).

### 5.1 Model dependence of higher-order correlations

The role of correlations for cortical information processing has been lively debated (Abeles 1991; Shadlen and Newsome 1994; Riehle et al. 1997; Singer 1999; Abbott and Dayan 1999; Kohn and Smith 2005; Schneidman et al. 2006; Shlens et al. 2006; Laurent 2006; Pillow et al. 2008). While in the meantime pairwise correlations are widely accepted as a relevant feature of parallel spike trains, the relevance of higher-order correlations is controversial. In fact, several recent studies raised doubts about the importance of higher-order interactions. In two studies that explore multi-neuron firing patterns in primate (Shlens et al. 2006) and salamander (Schneidman et al. 2006) retina, data were collected for fairly long times (∼1 hour), and distributions of the binary patterns *P*_{N} for sub-populations with *N* ≤ 10 were estimated. Based on this, a surrogate distribution was generated that reproduces the pairwise correlations in the population, but disregards all higher-order parameters: the “maximum entropy distribution” *P*_{2}. A comparison of *P*_{N} with *P*_{2} provides estimates on the amount of information contained in the higher-order parameters. The results of both studies indicate that only a negligible fraction of the total information is contained in the higher-order parameters (less than 2% in primate retina according to Shlens et al. 2006, and less than 10% in salamander retina according to Schneidman et al. 2006), and a recent study reports similar values for neo-cortical networks (Tang et al. 2008). In other words: almost the entire information available in the distribution of multi-neuron firing patterns can already be inferred from pairwise correlations. So why bother estimating higher-order parameters?

The consequences of the maximum entropy results for the relevance of CuBIC are difficult to judge. First of all, the maximum entropy approach does not appear to be very sensitive for higher-order correlations. As shown by Shlens et al. (2006), additional higher-order coincidences with a rate of ∼0.3Hz among the *N* = 7 analyzed neurons would not have altered their results. This “coincidence rate”, however, lies in the parameter range of Set3 of Fig. 4 (here *ν*_{15} = 0.41Hz). In this range CuBIC detected correlations of order ≥ 13, illustrating the comparatively low sensitivity of the maximum entropy approach for existing higher-order correlations. Second, the combinatorial explosion of the parameter space limits maximum entropy approaches to only relatively small populations (\(N\lesssim10\)), unless additional assumptions are imposed on the data (Montani et al. 2009; Shlens et al. 2009). Theoretical evidence suggests, however, that the relevance of higher-order correlations increases with the size of the neuronal population (Roudi et al. 2009). In fact, using *N* = 24 unsorted multi-unit spike trains of the rodent whisker system, Montani et al. (2009) report a significant increase in information if additional to the pairwise interactions also triplet interactions are taken into account. Taken together, we firmly believe that the analysis of large groups with methods that are sensitive to higher-order correlations is inevitable to uncover their role in cortical computation.

*X*

_{i}(

*s*) = 1 if the

*i*th neuron has one or more spikes in the

*s*th bin, and

*X*

_{i}(

*s*) = 0 otherwise. Assuming stationarity and no memory, the multivariate distribution function of the binary pattern vector

**X**admits the representation

*θ*. For example,

*θ*

_{ijk}quantifies third-order interactions of the triplet (

*X*

_{i},

*X*

_{j},

*X*

_{k}). Importantly, these higher-order parameters are neither equivalent, nor related in any straightforward way to the cumulant correlations employed by CuBIC (see Section 2.1.1 and Appendix A). A major difference, for instance, results from the equivalence of cumulant-based correlations with the concept of coincident events addressed by the CPP (Theorem 1). Given that these coincidences are perfectly synchronized, changes in the bin width

*h*do not affect cumulant-based correlations. In contrast, as changes in the bin width alter the shape of

*p*

_{X}, the interaction parameters

*θ*inevitably change with

*h*. Observe that the bin size dependence of

*θ*does not result from the fact that larger bins measure correlations on a broader time scale as the time scale of correlation was assumed to be instantaneous, but is rather inherent to the exponential family. More generally, cumulant correlations determine to what extent expectation values of products factorize (e.g., Stratonovich 1967, and our Section 2.1.1), while the higher-order parameters in the exponential family measure to what extent the probability of certain binary patterns can be explained by the probabilities of its sub-patterns (e.g., Gütig et al. 2003). Taken together, the two frameworks are designed for different purposes, and can hence be expected to be sensitive to different features of the data. A better insight into their relationship presents an important path for future research (see also Staude et al. 2010, for a more detailed discussion).

Our main motivation to use cumulant correlations is that they enable us to analyze large populations of simultaneously recorded spike trains. As opposed to the *θ* parameters, cumulant correlations can be inferred from the population spike count *Z*,. Thereby, the multivariate problem to estimate correlations for all 2^{N} − 1 subgroups is transferred into a parsimoniously parametrized univariate problem with *N* parameters. The small sample size required by CuBIC and its ability to analyze populations of *N* > 100 neurons is a direct consequence of this property. Furthermore, the compound Poisson process presents a very intuitive interpretation of cumulant correlations (Theorem 1) that directly implements the concept of temporally coordinated spikes (see also Staude et al. 2008). And finally, the exponential family captures imprecise correlations only at the cost of ignoring spikes by clipping, as it requires the variables *X*_{i} to be binary (e.g., Del Prete et al. 2004), whereas cumulant correlations can be measured among arbitrary spike counts. We conclude that the adequate framework to analyze a given data set should be carefully chosen in view of the specific scientific question. For the detection of temporally coordinated spikes in large neuronal populations, we believe that cumulant-based correlations as exploited by CuBIC are well-suited.

### 5.2 Relation to other tools

While algorithms for the generation of correlated spike trains are becoming available (e.g., Kuhn et al. 2003; Niebur 2007; Staude et al. 2008; Brette 2009; Macke et al. 2008; Krumin and Shoham 2009), only few of these models can be exploited for data analysis. In an approach similar to ours, Ehm et al. (2007) derive estimators of the continuous time parameters (analogous to our carrier rate and amplitude distribution) from the Poissonized bin counts (here: “population spike count”) via Fourier inversion of the empirical characteristic function. Instead of inferring lower bounds as in CuBIC, “empirical de-Poissonization” presented by Ehm et al. (2007) thus directly estimates the orders of correlation in the data. However, the more explicit parameters appear to impose constraints on the data, such that particular care is required if the expected number of spikes per bin exceeds a critical value. A comparison of CuBIC and the empirical de-Poissonization to resolve their respective advantages and disadvantages is currently under way (Reimer et al. 2009).

Besides the higher-order approaches described above, the majority of available analysis tools for massively parallel spike trains either exploit pairwise statistics to infer inter-correlated groups by using some sort of clustering (Gerstein and Aertsen 1985; Gerstein et al. 1985; Berger et al. 2007; Grün et al. 2008b; Fujisawa et al. 2008), or test the occurrence of specific patterns against the assumption of complete (Grün et al. 2002a; Pipa et al. 2008) or partial independence (Berger et al. 2010). The three examples depicted in Fig. (4), however, illustrate the limitations of both approaches. First, the three data sets have identical second order statistics, making it impossible for pairwise approaches to differentiate between them. Second, the combinatorial explosion prevents the analysis of populations sizes of *N*~100 neurons with pattern-based approaches. And third, it is unclear whether either approach is sensitive enough to detect correlated groups when pairwise correlation are as low as discussed here.

A novel approach that is recently gaining attention starts from the analytically tractable generalized linear model (GLM), and estimates its parameters by maximum likelihood techniques (Paninski 2004; Truccolo et al. 2005; Okatan et al. 2005; Pillow et al. 2008). The power of this framework is that estimated parameters can be interpreted in terms of network properties like e.g. synaptic interactions between nerve cells, stimulus-response properties and the history dependence of individual spike trains. To understand how such anatomical and biophysical attributes of neuronal populations relate to higher-order correlations in the joint spiking of groups of neurons as detected by CuBIC is subject of current research.

Constraining relations among correlations of different order have also been studied by Amari et al. (2003), Johnson and Goodman (unpublished manuscript) and Benucci et al. (2004). The latter study defined a combinatorial algorithm to obtain upper bounds for certain observables. Closer to our approach is the study of Johnson and Goodman (unpublished manuscript), who derive explicit constraints among correlation coefficients of different orders within the CPP framework. However, both studies do not provide confidence intervals, hence are not applicable to data analysis in a straightforward way. Amari et al. (2003) investigated the relation between population variance and higher-order correlations in the information geometric framework (Amari and Nagaoka 2000) for the limit of large population sizes (*N*→ ∞). The result is similar to our findings in the sense that a high population variance implies higher-order correlations. However, results from large *N* provide only qualitative suggestions for data analysis.

### 5.3 Consequences of underlying model

An important ingredient of the present study is to use the CPP as a model for correlated spike trains. A consequence of this model-driven approach is that the reliability of obtained test results depends crucially on how well the CPP is suited to describe populations of spiking neurons in general, and assembly activity in particular.

#### 5.3.1 Stationarity

We presented the CPP as a model of a stationary population, i.e. where individual spike trains have constant firing rates. However, cortical spike trains typically undergo both stimulus-driven and internally generated changes in firing rate. To analyze also such non-stationary data sets, the relatively small required sample size enables us to use a “sliding-window” approach (see e.g., Grün et al. 2002b), where data are cut into quasi-stationary windows of e.g. 100 ms and the statistics is therein obtained over trials. For the case of 100 trials analyzed with a bin width of *h* = 1 ms, this would yield *L* = 10^{4} samples, which is perfectly in the required range for CuBIC to yield lower bounds close to the actual order of correlation (compare Fig. 6(d)). An alternative could be to directly include the non-stationarities in the model (Staude and Rotter 2009). Both approaches present important aspects of future research.

#### 5.3.2 Single-process properties

Recent studies of cortical spike trains criticize the use of Poisson processes as models for single spike trains (Amarasingham et al. 2006; Nawrot et al. 2008). Without doubt, inclusion of process types deviating from Poisson in the generating model presents a major challenge for the presented procedure. However, criticism against the Poisson model for cortical spike trains is typically motivated by either non-exponential interval statistics (e.g., Maimon and Assad 2009), or non-Poissonian spike count distributions for large bin sizes (*h* > 50 ms) and high firing rates under ’optimal’ stimulation (*λ* > 50 Hz; e.g. Amarasingham et al. 2006). Both of these aspects are of little impact if the counting statistics is sampled with small (*h* ≤ 10 ms) bin sizes. In such a scenario, and in particular for low firing rates (e.g. Lennie 2003; Lee et al. 2006; Olshausen and Field 2006; Maldonado et al. 2008), neurons will have at most one spike per bin, leading to almost identical counting statistics for Poisson processes and other process types (e.g. gamma processes; van Vreeswijk 2006; Nawrot et al. 2008). Note also that the CPP does not directly model individual processes as Poisson processes, but only assumes the carrier processes to be Poisson. Importantly, the latter can be a good approximation even if individual processes deviate from strict Poisson statistics. For instance, if background spikes are Poisson and high-amplitude events are very regular but have a very low rate (as in Fig. 4), the carrier process is still well approximated by a Poisson process. More generally, the sum of increasingly sparse point processes converge to a Poisson process (Daley and Vere-Jones 2005, but see Lindner 2006; Câteau and Reyes 2006). Although these aspects need to be considered and will be subject of further research, we therefore expect CuBIC to yield reliable results even if single processes deviate from the Poisson assumption (Staude et al. 2007).

#### 5.3.3 Precision of coincidences

Cortical correlations are reported on a variety of time scales, from precise correlations within a few few ms (Grün et al. 1999; Kohn and Smith 2005; Pazienti et al. 2008; Desbordes et al. 2008) to slow covariations of firing rates on a time scale of > 50ms (e.g., Kohn and Smith 2005; Smith and Kohn 2008). The perfect temporal precision of the joint spikes generated by the CPP may thus be questioned as a biologically plausible model of experimental spike trains. In the present study, the CPP was not intended to realistically model spike trains in continuous time, however. We presented the CPP strictly to interpret the correlated counting variables *X*_{i}, obtained with a given bin width *h*. Allowing a temporal jitter in the synchronous events, e.g. by jittering the coincident events during the inserting process (see e.g. Staude et al. 2008; Brette 2009), results in comparable counting variables *X*_{i} if this jitter is small compared to the applied bin size (Pazienti et al. 2008). Although its sensitivity for such imprecise coincidences remains to be assessed systematically, we expect CuBIC to tolerate imprecise coincidences with small but realistic jitter.

A somewhat different issue is the detection of correlations with systematic temporal offset, e.g. due to functional or “effective” connectivity (Aertsen et al. 1989; Fujisawa et al. 2008). As an extension of CuBIC to detect also non-zero-lag higher-order correlations, one may shift individual spike trains against each other and then analyze this shifted population (cmp. Grün et al. 1999). As tests for different shifts will generally not be independent, however, appropriate corrections that account for multiple testing have to be applied. Furthermore, the combinatorial explosion that emerges from shifting all spike train against each other with different delays requires smart algorithms to reduce the resulting computational cost.

#### 5.3.4 Positivity of the correlations

A limitation of the CPP is that the insertion of synchronous spike events models spike coordination, not spike avoidance. As a consequence, correlations in the CPP can only be positive (Staude et al. 2008; Brette 2009). The CPP cannot model negatively correlated spike counts, that is, troughs in the cross-correlation function. Whether or not this is relevant for CuBICs applicability depends crucially on the properties of electrophysiological data. The rare reports of negative correlations, especially among cortical spike trains (e.g., Nevet et al. 2007), however, makes us confident that this is not an important limitation.

## 6 Conclusions

Based on Hebb’s original ideas (Hebb 1949), the coordinated spiking of large neuronal groups is the conceptual foundation of various “brain theories” (e.g., Eggermont 1990; Abeles 1991). However, limitations of available analysis tools has rendered the exploitation of the increasing number of simultaneously observable neurons extremely difficult. The CuBIC method presented in this study is a novel procedure to detect higher-order correlations in massively parallel spike trains, which overcomes these limitations. In particular, the sample size required by our method lies in a range that is compatible with electrophysiological experiments. CuBIC assumes data to be stationary and single processes the sum of which is appropriately modeled by the CPP, and its extension to account for dynamically changing, non-Poissonian data is the most important aspect of future work. Taken together, we regard CuBIC as a unique opportunity to analyze large populations of neurons for coordinated spike timing.

## Acknowledgements

We would like to thank Abigail Morrison, Imke Reimer, and Stefano Cardanobile for helpful comments on earlier versions of this manuscript. Funded by NaFöG Berlin, the German Ministry for Education and Research (BMBF grants 01GQ01413 and 01GQ0420 to the BCCN Berlin and Freiburg, respectively), the IGPP Freiburg, the Stifterverband für die Deutsche Wissenschaft, the Initiative and Networking Fund of the Helmholtz Association within the Helmholtz Alliance on Systems Biology, and the German Research Foundation (DFG SFB 780).

**Open Access**

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.