Spike-triggered covariance: geometric proof, symmetry properties, and extension beyond Gaussian stimuli

The space of sensory stimuli is complex and high-dimensional. Yet, single neurons in sensory systems are typically affected by only a small subset of the vast space of all possible stimuli. A proper understanding of the input–output transformation represented by a given cell therefore requires the identification of the subset of stimuli that are relevant in shaping the neuronal response. As an extension to the commonly-used spike-triggered average, the analysis of the spike-triggered covariance matrix provides a systematic methodology to detect relevant stimuli. As originally designed, the consistency of this method is guaranteed only if stimuli are drawn from a Gaussian distribution. Here we present a geometric proof of consistency, which provides insight into the foundations of the method, in particular, into the crucial role played by the geometry of stimulus space and symmetries in the stimulus–response relation. This approach leads to a natural extension of the applicability of the spike-triggered covariance technique to arbitrary spherical or elliptic stimulus distributions. The extension only requires a subtle modification of the original prescription. Furthermore, we present a new resampling method for assessing statistical significance of identified relevant stimuli, applicable to spherical and elliptic stimulus distributions. Finally, we exemplify the modified method and compare it to other prescriptions given in the literature.


Introduction
Neurons in sensory systems are often exquisitely tuned to specific stimulus features. Thus, a first step in the characterization of their input-output transformation is to identify which aspects of the stimulus affect a neuron's activity level and which do not. As the space of possible stimuli is typically high-dimensional, an exhaustive exploration of all candidate stimuli appears impractical. But fortunately for neuroscientists, individual neurons often seem to be remarkably selective and only care about subspaces of low dimensionality. The identification of such low-dimensional spaces of relevant stimuli is a crucial challenge in sensory neuroscience.
In the simplest scenario, an analysis may aim at identifying a single relevant dimension in stimulus space, corresponding to a particular stimulus feature. This is suited, for example, for neurons whose response properties are well captured by their receptive fields. Neurons in the early visual system are often described by their spatio-temporal receptive fields (Hartline 1940;Kuffler 1953;Hubel and Wiesel 1962;Meister and Berry 1999;Reich et al. 2000) and neurons in the auditory system by their spectro-temporal receptive fields (Eggermont et al. 1983a, b;Kim and Young 1994;deCharms et al. 1998). A standard technique for assessing the receptive field from electrophysiological experiments is to measure the spike-triggered average (STA) under stimulation with a broad-band signal (de Boer and Kuyper 1968;Bryant and Segundo 1976;de Boer and de Jongh 1978;Eggermont et al. 1983a;de Ruyter van Steveninck and Bialek 1988;Chichilnisky 2001;Nykamp and Ringach 2002;Schwartz et al. 2006), typically white noise. The technique largely owes its immense popularity to its computational simplicity, obviating the need for complex parameter fitting. The analysis consists of collecting all stimulus segments that precede a spike and averaging them together. This amounts to correlating the measured spikes with the applied stimulus, so the method also falls under the name of "reverse correlation".
In many cases, however, a single stimulus feature is insufficient to describe a neuron's response characteristics. If the cell is sensitive to several features and pools them in a nonlinear fashion, its stimulusresponse relation may not be well captured by just the receptive field. A well-known example is the energy model of complex cells in visual cortex (Adelson and Bergen 1985), which comprises two spatial Gabor filters whose outputs are squared before summation. As the model responds equally to positive and negative visual contrasts, the STA is identical to zero. Other methods are thus required to characterize neurons with symmetric response characteristics. In other cases, the STA may provide an approximate model of the cell's stimulus-response relation, but adding further stimulus components considerably improves the accuracy of the model.
For these reasons, spike-triggered covariance (STC) analysis has emerged as a popular extension of the STA (Bryant and Segundo 1976;de Ruyter van Steveninck and Bialek 1988;Paninski 2003;Bialek and de Ruyter van Steveninck 2005;Brenner et al. 2000;Schwartz et al. 2002;Rust et al. 2004;Simoncelli et al. 2004). In STC analysis, the stimulus segments that precede a spike are characterized through a principal component analysis, which allows the extraction of mul-tiple relevant stimulus dimensions. The basic idea is to detect differences in variance between the distribution of spike-producing stimulus segments and the prior distribution of all stimulus segments. Although alternative techniques exist that are applicable under more general stimulation (Paninski 2003;Paninski et al. 2004;Sharpee et al. 2004;Pillow and Simoncelli 2006;Park and Pillow 2011) or in connection with additional postspike dynamics (Keat et al. 2001;Aldworth et al. 2005;Pillow et al. 2005Pillow et al. , 2008Dimitrov and Gedeon 2006;Gollisch 2006), STC analysis has retained considerable popularity, just like the STA, because of its relative computational simplicity.
The statistics of the applied stimulus play an important role for applying STA and STC analysis. For neurons whose firing probability depends on a single stimulus direction, the STA provides a consistent and unbiased estimator of the relevant direction when the probability distribution of all applied stimuli displays spherical symmetry (Chichilnisky 2001;Paninski 2003). This condition states that all stimulus segments that have the same magnitude (i.e. the same Euclidean norm) must also have the same probability of occurrence. If the distribution of stimuli is not spherically symmetric, the STA is typically biased, as it deviates in a systematic fashion from the relevant stimulus dimension. Moreover, as this bias does not depend on the amount of available data, the estimate provided by the STA is then not consistent. The simplest way to fulfill the criterion of spherical symmetry is to draw each stimulus component from the same Gaussian distribution. But there are, of course, many other ways to construct spherical distributions of stimuli.
Via a simple extension, it is straightforward to apply STA analysis also to stimuli with an elliptic distribution. Here, elliptic distribution refers to any distribution that can be obtained from a spherical distribution by a linear transformation that stretches or compresses individual directions in stimulus space. The spectrum of the stimulus is therefore non-white, and different stimulus components are correlated with each other. To apply STA analysis, the stretching transformation simply has to be "undone" after the spike-producing stimulus segments have been averaged (Theunissen et al. 2001;Paninski 2003;Schwartz et al. 2006).
Surprisingly, the requirements concerning the stimulus distribution are more restrictive for STC analysis, where stimuli need to follow not just a spherically symmetric, but a Gaussian distribution to guarantee that the analysis provides a consistent estimator of the relevant stimulus space (Paninski 2003;Sharpee et al. 2004;Simoncelli et al. 2004;Schwartz et al. 2006). Given the otherwise tight analogy between STA and STC analysis, this difference appears puzzling. For STA analysis, the requirement of a spherically symmetric stimulus distribution is best understood in a geometric picture of why the technique works (Chichilnisky 2001). The insight and intuition supplied by the geometric proof thus calls for a similar perspective on STC analysis.
Here, we provide such a geometric derivation for STC analysis, leading to a simple proof of why the technique works, that is, of the consistency of the method. Furthermore, the geometric approach highlights the importance of spherical symmetry also for STC analysis and suggests a simple modification of the procedure that makes it applicable to stimulus ensembles with general spherical symmetry, not necessarily Gaussian. We further extend this approach to arbitrary elliptic stimulus distributions, containing correlations between different stimulus components. To facilitate identification of relevant stimulus dimensions for finite data sets, we then introduce a new statistical test for significance of relevant stimulus dimensions. Finally, we compare the obtained prescription with others that have been used in the literature.

Linear-nonlinear models
As for many other analyses of neuronal stimulusresponse relationships, describing sensory stimuli as vectors in a (potentially high-dimensional) space of stimuli has provided a useful perspective for spiketriggered average and spike-triggered covariance analyses. We here denote a stimulus as a column vector s in an N-dimensional space, The individual components of s can represent, for example, the strength of stimulation at different points in time ( Fig. 1(a)), different spatial locations ( Fig. 1(b)), or a combination of the two (Fig. 1(c)). Of course, space could also be supplemented or substituted by any other relevant stimulus attribute, for example, spectral components. Pure temporal binnings ( Fig. 1(a)) represent the simplest scenario, when only the history of an otherwise one-dimensional stimulus needs to be taken into account. They are used, for example, when neurons in the visual system are stimulated with changing ambient light intensity that contains no spatial structure, or when an auditory neuron is analyzed for its responses can also be used to represent any other non-temporal aspect of the stimulus. Spatial and temporal dimensions may be combined into a unified spatio-temporal representation (c), for example, to study visual spatio-temporal receptive fields to the temporal modulation (i.e. the envelope) of a pure tone. The stimulus vector s is then defined at discrete times t, and the components of s(t) represent the stimulus strength within time bins of length t during the recent past, for example, by sampling time discretely, The dimension N of the vectors should be chosen large enough to encompass the relevant stimulus structure.
The methods of spike-triggered average and spiketriggered covariance constitute rigorous estimators of neuronal filtering characteristics when the spikegenerating process is well described by a linearnonlinear (LN) model. In such models, the stimulus s is first filtered by one or several linear filters k m . We denote the number of filters by M. Typically, there are far fewer filters than stimulus dimensions, M N. The filters are, just as the stimuli, represented by Ndimensional vectors Applying the filter k m to a stimulus s yields The filtered signals are then transformed into the probability of generating a spike in response to stimulus s, P(spike|s), where the variable "spike" here takes the value 1 for a generated spike or 0 for no spike.
The transformation occurs through a static nonlinear function ϕ with M input variables, According to Eq. (5), the stimulus affects the spike probability only through its projections onto the filters k m . The filters therefore demarcate relevant directions in stimulus space (Paninski 2003), corresponding to stimulus features that affect the spike probability. The subspace spanned by these filters, K = span(k 1 , k 2 , . . . , k M ), is called the relevant subspace. Any stimulus vector that is orthogonal to the relevant subspace does not affect the spiking probability because it does not affect the inputs into the function ϕ. The orthogonal complement of K therefore constitutes the irrelevant subspace K ⊥ . The aim of spike-triggered covariance analysis is to identify the relevant and the irrelevant stimulus subspaces. The stimuli s that enter the LN model come from a prior stimulus distribution P(s), typically determined by the experimenter when presenting sensory stimuli. Both STA and STC analysis rely on comparing this prior stimulus distribution to the distribution of stimuli that precede spikes, P(s|spike).
Often, the prior stimulus distribution is chosen to be Gaussian white noise with fixed variance σ 2 , The Gaussian white noise distribution has the remarkable property that, in addition to being spherically symmetric, it may be written as a product of distributions, one for each stimulus component, Thus, each stimulus component is independent of the others. If the spiking probability only depends on a few stimulus directions, the stimulus distributions P(s) and P(s|spike) only differ along these directions. Along any orthogonal stimulus direction, the two distributions coincide. The invariance along irrelevant directions forms the basis of spike-triggered analysis for Gaussian white noise stimuli: Relevant stimulus directions are identified as those where the distribution of spikegenerating stimuli differs from the prior distribution. The Gaussian white stimulus constitutes a special case of a distribution with spherical symmetry, for which the prior distribution P(s) depends only on the absolute value |s| = √ s T s of its argument, that is, For non-Gaussian stimuli, different stimulus directions are not independent of one another. As a consequence, the distributions P(s) and P(s|spike) not only differ inside the relevant space K, but typically also along the irrelevant directions in K ⊥ . In Fig. 2, the prior stimulus distribution and the spike-triggered stimulus distribution are shown for 2-dimensional toy examples. In each case, the spike probability only depends on one of the two stimulus components, as illustrated by the nonlinear functions ϕ in Fig. 2 Fig. 2(b)). When stimuli come from a spherically symmetric, non-Gaussian, annular distribution, however, the two distributions differ also along the irrelevant direction ( Fig. 2(c)). The annular shape of the prior distribution imposes a constraint, linking the values of relevant and irrelevant components. A change in variance along the relevant direction hence induces a change in variance along the irrelevant direction as well. Consequently, at first sight, it may seem that STC analysis would be inapplicable to these cases. Simply looking for directions in stimulus space along which the variance of P(s|spike) differs from the variance of P(s) would lead to the erroneous classification of the irrelevant direction as relevant. Below we show, however, that the more realistic case of higher-dimensional stimuli brings in additional structure not apparent in this 2-dimensional toy example. The clue lies in the fact that in all irrelevant directions, the variances of P(s) and P(s|spike) differ by exactly the same amount. This constancy typically makes the irrelevant directions distinguishable from the relevant ones, even in the non-Gaussian case.
3 Geometric picture of STC analysis for spherically symmetric stimulus distributions

Basic definitions
We first consider spike-triggered covariance analysis for stimuli that have a spherically symmetric prior distribution, Gaussian or not. Extensions beyond the spherical case are discussed in Section 4. To simplify the notation, we assume that the mean value of the prior stimulus distribution has already been subtracted from all stimulus vectors, that is, we choose the origin of the coordinate system so that the prior distribution of stimuli P(s) has zero mean, ds P(s) s = 0.
The prior covariance matrix C p of a spherically symmetric stimulus distribution is proportional to the unit matrix. Here, we set the units in stimulus space such that C p coincides with the identity matrix, where the product s s T is the matrix A neuron with a firing probability given by Eq. (5) is only sensitive to the projection of the actual stimulus s on the relevant space K. Covariance analysis provides a systematic procedure to find K, based on the first two moments of the distribution P(s|spike). The spiketriggered average s is the first moment of the distribution of spike-triggered stimuli where r is the total average spike probability per stimulus presentation, and the second equality in Eq. (12) derives from Bayes' rule. This rearrangement makes the dependence on the prior stimulus distribution P(s) explicit, which will turn out useful in the derivations below. Throughout this paper, all averages · are calculated over the distribution of spike-triggered stimuli, P(s|spike). When working with experimental data, the distribution P(s|spike) is not directly available. Therefore, one typically works with the sample STAˆ s , which is the average of all stimulus segments s(t spike ) that precede the measured spikes at times t spike , For large enough data sets, the law of large numbers ensures that the stimulus segments s(t spike ) sample the spike-triggered distribution P(s|spike) thoroughly, so thatˆ s approaches s , as defined in Eq. (12). The covariance of the distribution of spike-triggered stimuli, P(s|spike), is captured by the spike-triggered covariance matrix This matrix is typically estimated from experimental data by the sample covariance matrix Again, for large enough data sets, s(t spike ) samples the distribution P(s|spike) thoroughly, so thatĈ s approaches C s as defined in Eq. (15). Standard STC analysis is based on the fact that when the prior stimuli are drawn from a Gaussian white distribution and sufficient amounts of data are available, the diagonalization of C s yields two types of eigenvalues. Those corresponding to irrelevant directions are equal to 1, that is, to the variance of the prior stimulus distribution. The ones corresponding to relevant directions may have any (non-negative) value, depending on the variance along each direction.
Limited sampling adds noise to the eigenvalues so that those corresponding to irrelevant dimensions do not all lie exactly at unity, but scatter around this level. Statistical methods can then be used to assess whether deviations from unity significantly indicate the existence of a relevant direction (Touryan et al. 2002;Rust et al. 2005;Schwartz et al. 2006). A relevant stimulus direction with an eigenvalue that happens to lie very close to unity, however, may be missed by the method.
Even in the limit of infinite amounts of data, however, relevant directions could escape detection by the eigenvalue analysis of the STC matrix. A deviation from unity is a sufficient, but not a necessary condition for an eigenvalue to denote a relevant direction (Paninski 2003;Pillow and Simoncelli 2006); its eigenvalue may happen to lie exactly at unity. This can occur, for example, when the prior stimulus distribution is Gaussian and the nonlinearity ϕ is an exponential function of one of its arguments because exponential nonlinearities leave the variance along the corresponding relevant direction unchanged. This limitation of STC analysis reflects the fact that the method is based entirely on second-order statistics of spike-triggered stimuli. Typically, a simple remedy is thus to explicitly include the STA in the identification of the relevant stimulus space (Rust et al. , 2005Simoncelli et al. 2004;Schwartz et al. 2006), as relevant directions that do not show a difference in stimulus variance between prior and spike-triggered stimulus ensemble can be expected to show a difference in the stimulus average. These remarks also apply to STC analysis for non-Gaussian, spherically symmetric stimulus distributions, as discussed below. Therefore, the possibility that a rel-evant stimulus direction might not be revealed through the spectrum of eigenvalues must be kept in mind. Having said this, for simplicity we assume in the following that eigenvalues of relevant directions do not "by chance" coincide with the eigenvalues of irrelevant directions, as again, this can generally be picked up by analyzing the STA. In addition, the issue of limited sampling and significance testing will be put off until Section 5.

A geometric derivation
As a basis for applying spike-triggered covariance analysis to any stimulus distribution with spherical symmetry, including non-Gaussian stimuli, we show that the irrelevant space is spanned by eigenvectors of C s with the same degenerate eigenvalue. It then follows that the relevant space can be identified as the subspace that is spanned by eigenvectors of C s whose eigenvalues deviate from the baseline level of (typically a large number of) degenerate eigenvalues. In this derivation, we work directly with the probability distribution of spike-triggered stimuli and thus do not consider noise from finite sampling. We thereby provide a proof of consistency of the method, which means that it yields the correct relevant subspace in the limit of infinite data sampling.
The key point of the proof is to show that any vector of the irrelevant space is an eigenvector of C s . This statement is geometrically derived below and immediately implies that the whole irrelevant subspace K ⊥ is a degenerate eigenspace of C s , so that all stimulus vectors of the irrelevant space have the same eigenvalue: Consider two non-parallel vectors v 1 and v 2 that belong to the irrelevant space. According to the statement above, they must be eigenvectors of C s with eigenvalues α 1 and α 2 . Their sum lies also within the irrelevant space and is thus also an eigenvector. Let β be the To prove that each (non-zero) vector of the irrelevant space is an eigenvector of C s , we draw out an argument analogous to the geometric proof of consistency of STA analysis by Chichilnisky (2001). Specifically, we want to show that for any with a real eigenvalue λ. As a first step, we show that the spike-triggered average s belongs to K. It follows that s is perpendicular to v, so that the s s T v-term in Eq. (17) yields zero because s T v = 0.
To this end, we essentially repeat the argument of Chichilnisky (2001) and thus summarize the derivation here only briefly: For every stimulus s, a unique counterpart s * can be found by taking the mirror image of s with respect to the relevant subspace K (Fig. 3(a)). Concretely, with s K denoting the projection of s onto K and s K ⊥ = s − s K denoting the projection of s onto K ⊥ , we have The vectors s and s * have equal length, so their probabilities within the stimulus ensemble are the same, P(s) = P(s * ). Since their projections on K are the same, the associated spike probabilities are also equal, P(spike|s) = P(spike|s * ). Therefore, in calculating the spike-triggered average, s and s * are weighted equally in Eq. (12). Given that, by construction, the components of s and s * orthogonal to K are equal with opposite sign ( Fig. 3(a)), these components cancel out in the STA for all pairs (s, s * ). As a consequence, the spiketriggered average s has no component orthogonal to K and thus lies in the relevant subspace.
As the s s T v-term in Eq. (17) vanishes, we now have to show the eigenvalue property of v for the integral term of the equation. To do so, we use a geometric argument very similar to the one above for the STA. We consider a vector v from the irrelevant subspace K ⊥ ( Fig. 3(b)). Let us denote the hyperplane that is orthogonal to v by v ⊥ . Now, for every stimulus vector s, a unique vector s * can be found that is the mirror image of s with respect to the hyperplane v ⊥ , see Fig. 3(b). Assuming that v has unit length, s * is simply obtained as Again, s and s * have equal length so that P(s) = P(s * ). Also, the projections of s and s * on the relevant subspace K are identical because s and s * have the same projections on v ⊥ and because K is a subspace of v ⊥ . Thus, the spike probabilities for these two stimuli are the same: P(spike|s) = P(spike|s * ). We can therefore perform the integral in Eq. (17) over s * instead of over s, or alternatively, substitute ss T by (ss T + s * s * T )/2. Applying C s to the vector v then yields Investigating the terms ss T v and s * s * T v, we see that s T v and s * T v are equal in magnitude, but with opposite sign because of the mirror-image properties of s and s * , see Fig. 3 This vector is parallel to v; the components orthogonal to v cancel out. Since this argument holds for every s, C s v is proportional to v, which is exactly the condition for v being an eigenvector of C s , C s v = λv. We conclude that for a spherically symmetric stimulus distribution, an eigenvalue analysis of C s yields a set of degenerate eigenvalues whose eigenvectors span the irrelevant space. For non-Gaussian stimuli, the numerical value of this eigenvalue baseline generally cannot be predicted and depends on the details of the nonlinearity ϕ within the LN model (Paninski 2003). For Gaussian stimuli, on the other hand, the baseline level of irrelevant eigenvalues is always equal to the prior variance (here fixed at 1) because individual stimulus components are independent and are thus not affected by changes of variance in other directions (Paninski 2003).
As a consequence, STC analysis can be applied to the general case of spherically symmetric stimulus distributions, not necessarily Gaussian. The relevant space, however, must now be identified as the one spanned by the eigenvectors whose eigenvalues depart from the baseline level of irrelevant eigenvalues. Note that in practical applications, the degeneracy of irrelevant eigenvalues is broken up by finite sampling effects and the eigenvalues scatter around the baseline. As explained in Section 5, statistical methods can be used to test whether the scatter is consistent with pure finite sampling effects of otherwise degenerate eigenvalues.
An alternative derivation of the main result of this subsection is provided by group theory. The argument is based in the fact that the firing probability P(spike|s) remains invariant under any coordinate transformation operating only on the irrelevant subspace K ⊥ and leaving the relevant subspace K unchanged. There are many such transformations, since the irrelevant subspace K ⊥ is usually high-dimensional. All these coordinate transformations are called symmetries of the firing probability. The symmetries of the firing probability are also symmetries of the covariance matrix. As shown by many examples in quantum mechanics and solid state theory, the symmetry of a linear operator determines the degeneracy of its eigenvalues. Based on these ideas, in Appendix A, we rederive the main result of this subsection, using only symmetry arguments.

Disambiguation of relevant and irrelevant spaces
Relevant directions are associated with eigenvalues that pop out as outliers of the baseline degenerate spectrum. Therefore, they can be easily identified only when the irrelevant space is high-dimensional, so that the spectrum reveals a highly degenerate eigenspace. Fortunately, in most practical cases the dimensionality of the relevant stimulus space is considerably smaller than the dimensionality of the complete stimulus space. We thus generally search for a small number of relevant stimulus directions immersed in a much larger stimulus space.
When working with small-dimensional stimulus spaces, however, it may not be apparent from the eigenvalue spectrum alone which eigenvectors belong to the relevant and which to the irrelevant space. In the scenario of Fig. 2(c2), for example, the stimulus space has only two dimensions, and STC analysis therefore just gives two (different) eigenvalues. The question then arises of how to test whether one of these directionsor more generally whether a given subspace with degenerate eigenvalue-actually corresponds to the irrelevant space.
As an example, imagine that the spectrum of eigenvalues reveals two (small-dimensional) degenerate subspaces, and we wish to determine which is relevant and which (if any) is irrelevant. As a first attempt, one could investigate the nonlinearity along one stimulus direction belonging to the hypothesized irrelevant space. The nonlinearity can be obtained by evaluating the probability that stimuli having a given projection value along the selected direction produce a spike, irrespective of its projection on the relevant space. One may call such test an evaluation of the "marginal nonlinearity". In practical applications, probabilities are estimated from histograms (Chichilnisky 2001). When using Gaussian stimulus distributions, the marginal nonlinearity is (approximately) flat if the selected direction indeed belongs to the irrelevant space. For non-Gaussian prior distributions, however, the dependencies between different stimulus directions can cause a non-constant marginal nonlinearity even along irrelevant dimensions, and this method may thus not be conclusive.
As an alternative, we suggest to evaluate the "conditional nonlinearity", obtained in the following way: For each direction of the hypothesized relevant space, choose a fixed target value (for example, zero) to condition the nonlinearity. Then compute the nonlinearity along a direction of the hypothesized irrelevant space by using only those stimuli whose corresponding projections on the putative relevant directions lie in a small window around the target values. The conditional nonlinearity is largely unaffected by the dependence between relevant and irrelevant stimulus directions; it should therefore be nearly flat if indeed the hypothesis about the irrelevant space was correct. The method works well as long as the putative relevant subspace is low dimensional and sufficient data are available. A disadvantage is that the amount of required data increases exponentially with the dimensionality of the relevant subspace.
Note that one can construct special scenarios where even the conditional nonlinearity does not disambiguate which subspace is relevant and which is irrelevant. One example is shown in Fig. 2(c3) where the two dimensions x and y are connected through the stimulus distribution by x 2 + y 2 = 1 and the spike probability is a function of x 2 . Under the constraint of this particular stimulus distribution, this model cannot be distinguished from an equivalent model description where y is considered a relevant direction, with the spike probability a function of y 2 = 1 − x 2 . More generally, the disambiguation based on conditional nonlinearities fails whenever the prior stimuli are sampled from the surface of a high-dimensional sphere (introducing a constraint that lets the square of one component be expressed in terms of the other components) and the spiking probability depends on a quadratic form of some or all of the relevant stimulus components. Identification of the actual relevant directions, defined by the nonlinearity (Fig. 2(a2)) independently of the applied stimulus distribution, then has to rely on other sources of information, for example, prior expectations about which stimulus components should be relevant (typically the expectation that the relevant subspace is small-dimensional) or additional experiments performed with a different stimulus distribution. It is interesting to note that STA analysis, naturally, suffers from the same ambiguity in the considered scenarios. When the nonlinearity depends on a quadratic form of the inputs, for example, when it is an even function, the STA must be identical to zero.

Covariance analysis with or without subtracting the STA
Coming back to the geometrical derivation, note that the s s T -term in Eq. (17) played essentially no role in the proof. We had shown that applying this term to any irrelevant direction v just yields zero; the derivation that v is an eigenvector of the STC matrix is thus valid also if this term is left out when calculating the STC matrix. It follows that the STC method works independently of whether the STA is subtracted from the spike-triggered ensemble or not, for example when computing the sample covariance matrix, Eq. (16); in both cases, all irrelevant directions yield degenerate eigenvalues (see also the first example below). Furthermore, it also follows that the method works if the STA is projected out of the stimulus ensemble, so that only dimensions orthogonal to the STA are taken into account (Rust et al. , 2005Simoncelli et al. 2004;Schwartz et al. 2006). As the STA is part of the relevant subspace K, projecting it out simply reduces the dimensionality of the relevant subspace by one and does not affect the irrelevant subspace. The complete relevant subspace may be reconstructed by combining the STA with the relevant directions obtained from the reduced STC analysis. This approach can be useful to avoid the scenario where a relevant direction might not be detected by STC analysis alone because it happens to have the same eigenvalue as the irrelevant directions.

Examples
In this section, two examples are presented. The first one compares the results of covariance analysis with and without subtracting the STA in the spike-triggered covariance matrix, Eq. (16). The second one discusses the difference between Gaussian and non-Gaussian stimuli.
STC with and without subtracting the STA Covariance analysis can be equally performed with and without subtracting the STA in the calculation of the spiketriggered covariance matrix, as shown in Fig. 4.
In this example, there are two relevant directions, k 1 and k 2 (panel (b1)). The spiking probability ( Fig. 4(a)) is highest for stimuli whose projection on k 1 is large and whose projection on k 2 is large in absolute value, as reflected by the distribution of spike-generating stimuli (panel (b2)). The STA is proportional to k 1 . Figure 4(c) shows the eigenvalues and eigenvectors obtained when diagonalizing the covariance matrix C s with the STA subtracted. The largest eigenvalue (panel (c1)) represents the direction where the spike-generating stimuli have maximal variance, in this case, k 2 . The smallest eigenvalue corresponds to the direction with minimal variance: k 1 . The two relevant directions, k 1 and k 2 , are accurately captured by the relevant eigenvectors e 1 and e 2 , as shown in Fig. 4(c2).
In Fig. 4(d), we illustrate the diagonalization of the spike-triggered covariance matrix without subtracting the STA. The eigenvalues now represent the mean square projection of spike-generating stimuli along each direction. Two eigenvalues lie above the baseline level (panel (d1)). Although the eigenvalues are numerically different from those obtained in panel (c1), the eigenvectors coincide (compare panel (d2) with (c2)). The relevant filters, hence, can be obtained by diagonalizing C s either with or without subtracting the STA.
Comparing Gaussian and non-Gaussian stimuli In the example of Fig. 5, the difference between Gaussian and non-Gaussian prior stimuli is exemplified. Both applied stimulus distributions are spherically symmetric, and the eigenvalues of their prior covariance matrices are all equal to 1 (panel (a1)). The relevant space is spanned by the filters k 1 and k 2 , and these two vectors differ in their shape (panel (a2)) and frequency content (panel (a3)). The firing probability (panel (a4)) has rotational symmetry in the relevant space. When the stimulus is Gaussian (Fig. 5(b)), all irrelevant eigenvalues cluster around unity (panel (b2)). In contrast, for (b1) Relevant filters k 1 and k 2 . (b2) Prior (gray) and spike-generating (red) stimuli in the subspace spanned by k 1 and k 2 . The stimulus was Gaussian white noise with unit variance. (c1) and (d1) Eigenvalues of C s . The black line indicates the value 1. (c2) and (d2) Comparison between the relevant filters k 1 and k 2 and the eigenvectors e 1 and e 2 corresponding to the eigenvalues of matching color in (c1) and (d1). Number of analyzed spikes in each example: 5000 The fact that the baseline is below unity is actually a consequence of the shape of the prior stimulus distribution, which here has the form of a 20-dimensional spherical shell, and of the increased variance of the spike-triggered stimuli along the relevant stimulus components. Spikes only occur when the absolute value of the components along the relevant directions are large. Since the norm of each s is fixed, vectors with large relevant components necessarily have small irrelevant components. The degenerate eigenvalues at 0.8 < 1 reflect the reduced variance along irrelevant directions. However, if the nonlinearity of the model ϕ were changed so that spikes were only triggered by stimuli having small components along the relevant directions, the fixed stimulus norm would force these stimuli to have large irrelevant components. The baseline of the irrelevant eigenvalues would then be above unity. This argument holds for a shell-like prior distribution; for a different stimulus, say one where the radial component of the prior distribution is sharply peaked at the origin, the relationship between the variance along the relevant directions and the baseline of the irrelevant eigenvalues may be inverted: Increased variance along relevant directions corresponds to a baseline above unity; decreased variance, to a baseline below unity.
In both scenarios of Fig. 5, the two relevant stimulus directions are identified by the two outliers of the spectrum (panels (b2) and (c2)). Note that the obtained relevant eigenvectors and the original filters of the model do not match in a one-by-one fashion. The two pairs of vectors, however, span the same space, since each filter k m coincides with its projection k m on the space generated by e 1 and e 2 (panels (b4) and (c4)). The identification of the relevant space rather than of the individual model filters is, in fact, all that one can expect from STC analysis; in the expression of the firing probability, Eq. (5), the individual filters k m are not uniquely defined and could be exchanged for others that span the same relevant space, provided that the nonlinearity ϕ be appropriately adjusted. Thus, the expression of the firing probability used in Fig. 5 could have been formulated in a mathematically equivalent way, using a different pair of filters that span the same subspace.
In the present example, the identification of the subspace, but not the individual filters is particularly obvious because the firing probability has rotational symmetry in the two-dimensional relevant subspace. As discussed also in Appendix A, this symmetry leads to degenerate eigenvalues for the two relevant directions, as shown in panels (b2) and (c2). Therefore, the whole space generated by their linear combinations is an eigenspace of the covariance matrix.

Extension to elliptic stimulus distributions
In this section, we generalize the previous results to the case of elliptic prior stimulus distributions. Elliptic distributions represent a special case of non-white stimulus distributions. Individual stimulus components are now correlated, and the prior covariance matrix C p is no longer the unit matrix.
One way of dealing with an elliptic prior stimulus distribution when, in addition, the distribution is Gaussian has been pointed out by Bialek and de Ruyter van Steveninck (2005). When the eigenvalue analysis is carried out on the matrix C = C s − C p , relevant directions are marked by eigenvalues that deviate from the baseline of zero and are obtained from the corresponding eigenvectors after premultiplication with C −1 p . The correction with C −1 p undoes the correlations that are induced by the prior stimulus distribution. However, this method requires a Gaussian distribution of stimuli. In the following, we aim at deriving an analogous procedure only relying on the elliptic symmetry of the prior stimulus distribution.
An elliptic stimulus distribution is one that can be made spherical by linearly rescaling the components of the stimulus along appropriately chosen N orthogonal axes. The procedure is the same as the one needed to transform an ellipsoid into a sphere: Each of the principal axes of the ellipsoid is divided by its length. Obtaining an extension of STC analysis is then straightforward: Transform stimuli so that they assume a spherical distribution, apply STC analysis to the transformed stimulus distribution, and then transform back the obtained relevant and irrelevant directions to the original stimulus space. We now go through this procedure step by step in order to arrive at a condensed prescription.

Transforming to a spherical distribution
We first need to identify the principal axes of the prior stimulus distribution P(s). These are the eigenvectors of the prior covariance matrix C p . Because C p is, like all covariance matrices, symmetric and positivesemidefinite, C p can be transformed by an orthogonal transformation O into a diagonal matrix D with realvalued, non-negative diagonal elements, Let us assume for the moment that all diagonal elements of D are larger than zero so that D has full rank. We can then calculate D 1/2 by taking the square root of each diagonal element of D and D −1/2 by in addition taking the inverse. The prior distribution P(s) is called elliptic if it can be transformed into a spherical distribution by defining new rescaled coordinates This transformation maps the original space of vectors s to the symmetric space of vectors s . The matrices required to perform covariance analysis can also be transformed to the symmetric space. The transformed stimuli have a prior covariance matrix that is equal to the identity matrix The spike-triggered covariance matrix in the transformed stimulus space C s can simply be obtained by calculating the spike-triggered covariance matrix in the original space and then transforming appropriately, Equation (24) follows from the fact that s s In the symmetric space, the stimulus distribution is spherical, so the results of Section 3 are applicable. The irrelevant directions can be obtained as the eigenvectors of C s whose eigenvalues constitute the baseline degenerate spectrum. The relevant space is the orthogonal complement of the irrelevant space. We now return the relevant and irrelevant directions back to the original space. In order to obtain the transformation rules for the relevant directions, care has to be taken to preserve the scalar products. The conditional firing probability given by Eq. (5) must remain unchanged when transforming from s to s. We therefore require ϕ(s ) = ϕ(s). This condition is fulfilled if relevant directions in the transformed space, w , are connected to relevant directions of the original space, w, through the condition for all original stimuli s and their transformed versions s . The transformation properties of the relevant directions w are then defined in terms of their scalar products to stimulus vectors. In mathematical terms, this means that the relevant directions w do not transform as the original vectors s, but as dual vectors (in physics, the terminology of covariant and contravariant vectors is also used). Hence, the transformation rule for relevant directions is and the backward transformation is Note that Eq. (27) is not equivalent to Eq. (22). Also note that the obtained w are not necessarily orthogonal to each other, in contrast to the eigenvectors that are obtained for spherically symmetric stimulus distributions. However, the set of relevant directions is still linearly independent and thus spans a relevant subspace of the same dimensionality as the relevant directions w in the symmetric space. For completeness, we also provide the transformation properties of irrelevant directions. In the symmetric space, irrelevant directions are orthogonal to relevant ones, since relevant and irrelevant directions are eigenvectors of a symmetric matrix. In the original space, irrelevant directions must still be orthogonal to relevant directions: this orthogonality is what defines them, because it ensures that they do not contribute to any of the scalar products k T m s in Eq. (5). Orthogonality is guaranteed if irrelevant directions are transformed with the same prescription as the stimulus vectors. Thus, if v is an irrelevant vector, then and backward,  (28) and (30), respectively.
In passing, we mention that when the spike probability ϕ contains a single relevant direction k 1 , the transformation to the symmetric space is also applicable to the calculation of the STA. As shown in Appendix B, this procedure leads to the well-known recipe of estimating the single relevant direction by premultiplying the STA by the inverse of the prior covariance matrix: k 1 ∝ C −1 p s (Theunissen et al. 2001;Paninski 2003;Schwartz et al. 2006).

STC analysis directly in the original stimulus space
For convenience, we now reformulate the whole procedure using only quantities defined in the original space. Operationally, this can spare us from the need to transform forth and back to the symmetric space. We first note that the two matrices C −1 p C s and C s C −1 p both have the same eigenvalue spectrum as C s . In technical terms, they are both similar matrices to C s : If P is the change-of-base matrix needed to transform relevant directions to the symmetric space (P = D 1/2 O T , as stated in Eq. (27)), then, following Eq. (24), C −1 p C s and C s are related by a similarity transformation: C −1 p C s = P −1 C s P. Similar matrices share the same eigenvalues and have related eigenvectors: If in the symmetric space w is a relevant eigenvector of C s with eigenvalue λ, then in the original space, w = P −1 w = OD −1/2 w , as in Eq. (28), is an eigenvector of C −1 p C s with eigenvalue λ. Thus, relevant directions w can be identified from an eigenvalue analysis of C −1 p C s . In the same way, if Q transforms irrelevant directions (that is, Q = D −1/2 O T , as stated in Eq. (29)), then C s C −1 p = Q −1 C s Q, which means that C s C −1 p and C s share the same eigenvalues and have related eigenvectors. Thus, all irrelevant directions are eigenvectors of C s C −1 p with degenerate eigenvalues. Note that C −1 p C s and C s C −1 p are generally not symmetric, and thus the eigenvectors of C −1 p C s and of C s C −1 p , respectively, do not form orthogonal sets. Yet, each set of eigenvectors still provides a basis for the stimulus space because the eigenvectors of C s provide a basis and the transformations P and Q both have full rank.
Furthermore, relevant and irrelevant directions remain orthogonal to each other. To see this, note that the two matrices C −1 p C s and C s C −1 p are adjoint matrices, since they are real-valued and (C −1 p C s ) T = C s C −1 p . Adjoint matrices have the same set of eigenvalues, and moreover, their eigenvectors form dual bases. Thus, if w is an eigenvector of C −1 p C s with eigenvalue λ and v is an eigenvector of C s C −1 p with eigenvalue μ = λ, then v ⊥ w. Therefore, relevant directions are confirmed to be perpendicular to irrelevant directions.
In summary, the problem of identifying relevant and irrelevant directions for general elliptic stimulus distributions may be entirely solved in the original space. Relevant directions are obtained as eigenvectors of C −1 p C s corresponding to eigenvalues that differ from the degenerate baseline level. Irrelevant directions can be obtained from the orthogonal complement or as the eigenvectors of C s C −1 P corresponding to eigenvalues of the degenerate baseline level. The fact that we need two different matrices, C −1 p C s and C s C −1 p , reflects the different transformation properties of relevant and irrelevant directions. Using the matrices C −1 p C s and C s C −1 P serves as an alternative to the eigenvalue analysis of the transformed STC matrix C s , Eq. (24), and then transforming the obtained eigenvectors according to Eqs. (28) and (30). In fact, these two methods yield identical eigenvalue spectra and (transformed) eigenvectors. Note that all derivations above still work if C s is calculated without subtracting the STA.

Regularization
So far, we have assumed that the prior covariance matrix C p has full rank so that it can be inverted. However, if D has one or more vanishing diagonal elements, Eq. (22) is ill-defined. This happens when one or more stimulus directions have zero variance, so the prior stimuli do not cover all dimensions of the stimulus space. A typical example is given by low-pass filtered stimuli sampled at high frequency. If the prior stimulus lacks one or more dimensions, there is no way to extract information about the firing probability in the missing dimensions. The best we can do is estimate the filters without these dimensions. There are two ways to proceed. One is to simply eliminate the missing dimensions, that is, to work with a rectangular D matrix (more rows than columns). The transformation matrix P is therefore also rectangular, and the symmetric space has a lower dimensionality than the original space. The analysis can be carried out as before, only that when returning back to the original space, the number of relevant plus irrelevant directions is smaller than the dimensionality of the original space. The other alternative is to employ the full D matrix, but when inverting it, to set the infinite-valued diagonal elements of D −1 equal to zero. The matrix C −1 p = OD −1 O T is then defined as the pseudoinverse of C p .
This regularization procedure is also advised when some of the eigenvalues of C p are not necessarily zero, but much smaller than others. The corresponding dimensions are not well represented in the prior stimulus distribution and should thus be eliminated in the analysis to avoid noise amplification when inverting C p . Noise comes from limited sampling. Setting the diagonal elements of D −1 to zero when, for example, the corresponding diagonal elements of D are smaller than a certain fraction of the maximal eigenvalue (say 5 %) is a simple, yet effective way to regularize the prior stimulus distribution Felsen et al. 2005).

Examples of covariance analysis with elliptic stimulus distributions
Here we present two examples where the prior stimulus distribution is elliptic. The first one is a Gaussian stimulus, for which different dimensions are independent from one another. The second one is a hollow-ellipsoidlike stimulus distribution, where components are coupled. In both cases, we employ the same nonlinearity and relevant filters as in Fig. 5. Our aim is to compare the results obtained by diagonalizing C −1 p C s with those of C = C s − C p . As expected, the two methods are equivalent when the stimulus is Gaussian, but produce different results when applied to non-Gaussian elliptic stimulus distributions. Figure 6 shows the diagonalization of C −1 p C s and C when the stimulus distribution is elliptic and Gaussian. An elliptic prior stimulus distribution gives rise to a non-uniform spectrum of prior eigenvalues (panel (a1)). We choose an example where the filters k 1 and k 2 have different temporal (panel (a2)) and frequency (panel (a3)) characteristics. Specifically, k 1 has more power at lower frequencies compared to k 2 . Since the prior stimulus has an exponentially decaying spectrum, its variance in the direction k 1 is larger than in the direction k 2 . Thus, the prior stimuli (gray dots) occupy an elongated region in stimulus space (panel (a4)). The firing probability is spherically symmetric in the subspace spanned by k 1 and k 2 . Hence, the ratio of the density of red and gray dots in panel (a4) has circular contour lines. The transformation to the symmetric space contracts the elongated directions, resulting in a spherical prior distribution. As a consequence, the firing probability no longer looks spherically symmetric in the space spanned by k 1 and k 2 . Due to this anisotropy, the degeneracy of the two relevant eigenvalues of C −1 p C s is broken (panel (b1)). In Fig. 6(b1), the total number of eigenvalues (9) is smaller than the dimensionality of the stimulus space (20). In this example, the prior stimulus has a small variance in 11 stimulus dimensions. To avoid noise amplification, the 11 sub-represented dimensions were eliminated before starting the analysis. The alternative strategy would have been to work with the pseudoinverse C −1 p . In that case, the spectrum of C −1 p C s would still have shown 20 eigenvalues, but 11 of them would have corresponded to the regularized directions and would therefore have been equal to zero.

Example with a Gaussian elliptic stimulus distribution
When diagonalizing C −1 p C s , two eigenvalues deviate noticeably from unity (panel (b1)). For comparison, the eigenvalue spectrum of C s itself is shown as an inset of panel (b1). This spectrum is similar to the one of C p , reflecting the fact that the spike-triggered stimulus distribution is strongly affected by the shape of the prior distribution. The eigenvectors e 1 and e 2 for the distinct eigenvalues of C −1 p C s correspond to the elongated direction in panel (a4) (largest eigenvalue) and the perpendicular direction (second largest eigenvalue), respectively. The eigenvectors e 1 and e 2 are linear combinations of the filters k 1 and k 2 , but they do not coincide with them (panel (b2)). The filters k 1 and k 2 , however, coincide with their projections k 1 and k 2 on the space spanned by e 1 and e 2 (panel (b3)). This means that k 1 and k 2 span the same relevant space as e 1 and e 2 .
In Fig. 6(c), the results of diagonalizing C are displayed. Two eigenvalues are clearly above zero (panel (c1)). The associated eigenvectors e 1 and e 2 have a large fraction of their power in the low-frequency range, contaminated by the most represented direction in the prior stimulus. Consequently, they do not (a4) Prior (gray) and spike-generating (red) stimuli. The firing probability for this example was the same as the one used in Fig. 5. (b1) Spectrum of eigenvalues of C −1 p C s . Stimulus dimensions whose variance was less than 1.5 % of the maximal variance were projected out before the analysis. The black line indicates the value 1. For comparison, the inset shows the eigenvalues of C s . (b2) Comparison of the filters k 1 and k 2 to the eigenvectors e 1 and e 2 corresponding to the eigenvalues of matching colors in (b1). (b3) Comparison of the filters k 1 and k 2 to their projections k 1 and k 2 on the space generated by e 1 and e 2 . (c1) Spectrum of eigenvalues of C. The black line indicates the value 0. In the insets, the raw eigenvectors e 1 and e 2 are shown (black lines), together with their corrected versions e 1 and e 2 (colored lines). (c2) Comparison of the filters k 1 and k 2 to the corrected eigenvectors e 1 and e 2 . (c3) Comparison of the filters k 1 and k 2 to their projections k 1 and k 2 on the space generated by e 1 and e 2 . Number of analyzed spikes in each example: 5000 generate the same subspace as the filters k 1 and k 2 . In order to correct for the ellipticity of the prior stimulus distribution, the eigenvectors must be premultiplied by C −1 p (Bialek and de Ruyter van Steveninck 2005), thus defining the corrected relevant eigenvectors e 1 and e 2 . A comparison between the original eigenvectors e 1 and e 2 and their corrected versions e 1 and e 2 is shown in the insets of panel (c1). The correction procedure diminishes the low-frequency content of the filters. Although the corrected eigenvectors do not coincide with the individual filters k 1 and k 2 (see panel (c2)), they generate the same subspace, as evidenced by the excellent match between the filters k i and their projections k i on the space generated by e 1 and e 2 . Example with a non-Gaussian elliptic stimulus distribution In Fig. 7, covariance analysis is performed on stimuli consisting of a collection of vectors lying on the surface of a 20-dimensional ellipsoid. Each vector is initially constructed from a spherical Gaussian distribution and then normalized to have unit length. Then, one Fourier component is multiplied by a factor of 4. Thus, the final prior stimuli lie on the surface of a 20-dimensional ellipsoid, with a variance of 16 in two Fourier components (two components, corresponding to sine and cosine phases at this frequency) and unit variance in the remaining directions (panel (a1)).
The two relevant filters (panels (a2) and (a3)) and the spiking probability are identical to those used in previous examples (Figs. 5 and 6). In the present case, however, the prior stimulus has approximately constant variance throughout the frequency range covered by the two filters (panel (a3)). Thus, the relevant space is almost fully included in the subspace spanned by the short directions of the prior stimulus and is perpendicular to the two elongated directions. Consequently, the prior stimulus has approximately the same variance in the directions of the two relevant filters, as seen by the circular symmetry of the gray dots in panel (a4). Thus, when transforming to the symmetric space, the two relevant directions are scaled by the same factor. The spectrum of C −1 p C s , hence, shows two roughly degenerate eigenvalues (panel (b1)). The associated eigenvectors e 1 and e 2 do not necessarily coincide with the filters k 1 and k 2 (panel (b2)). However, k 1 and k 2 perfectly coincide with their projections k 1 and k 2 on the space spanned by e 1 and e 2 , verifying that the eigenvectors of C −1 p C s span the same space as the filters. The eigenvalue spectrum of C has four clear outliers (panel (c1)). In this example, the two largest eigenvalues correspond to eigenvectors that, when corrected with C −1 p , span the space generated by the filters k 1 and k 2 . The smallest two eigenvalues, however, are spurious.
If a stimulus is Gaussian, different components are independent from one another. Hence, the distribution of spike-triggered stimuli along irrelevant directions coincides with the prior distribution. In this context, it makes sense to compensate for the stimulus ellipticity by subtracting C p from C s : The variance of C vanishes along irrelevant directions. For non-Gaussian stimuli as the one of Fig. 7, however, different components are not independent from one another. Hence, the variance of the spike-triggered stimuli does not coincide with the prior variance along irrelevant directions. The subtraction C s − C p is therefore not able to counteract the elliptic nature of the stimulus distribution, and now C may have non-vanishing variance along irrelevant directions. Since the magnitude of the residual variance depends on the magnitude of the prior variance, the degeneracy of the irrelevant directions is broken.
In the example of Fig. 7, we chose a prior stimulus whose spectrum is rather peculiar, and therefore, the loss of degeneracy of the irrelevant directions is very pronounced. More typically, the spectrum of the prior stimulus may decay in a fairly continuous way. The eigenvalues of C associated with irrelevant directions, hence, also decay continuously. Whether they appear as clear outliers, or just as a weirdly shaped spectrum, depends on the exact numerical value of the prior spectrum, on the nonlinearity ϕ, and on the amount of collected data. In any case, they are not degenerate.

Significance testing
The identification of the relevant stimulus space is based on recognizing degenerate eigenvalues. Due to finite sampling, however, the spectrum of eigenvalues for irrelevant directions is never perfectly degenerate, but rather shows some scatter around the value that would be expected in the limit of infinite amounts of data. The question then arises whether the scatter observed in a given spectrum represents true differences in variance along relevant directions or instead results from statistical fluctuations along irrelevant directions. Figure 8 displays the spectra of the same model as in Fig. 5(c), but now varying the number of spikes included in the analysis. In panel (a1), only 23 spikes are employed, and there, it is not possible to determine by naked eye which eigenvalues belong to the degenerate baseline level and which are the outliers. In panel (a3), on the other hand, with 5,000 spikes, the task appears trivial. To see how statistical fluctuations in the spectrum depend on the amount of available data, a useful visualization is to plot the evolution of the spectrum with increasing number of analyzed spikes (Agüera y Arcas and Fairhall 2003; Agüera y Arcas et al. 2003), as done in panel (a2). The eigenvalues corresponding to the irrelevant subspace converge progressively, whereas the ones associated with relevant directions branch off and settle at a distinct level. For a more systematic analysis of whether individual eigenvalues indicate relevant directions or not, we need a statistical test for the significance of deviations from degeneracy.
In the case of a Gaussian prior stimulus distribution, such a test is typically performed by randomly shifting the spike times and thereby generating artificial spike trains that, by construction, contain no relevant directions (Touryan et al. 2002;Rust et al. 2005;Schwartz et al. 2006). Hence, the resulting spike trains The spectrum of ranked eigenvalues shows a smooth decay when only few spikes are analyzed (23 spikes). (a2) As more and more spikes are included in the analysis, eigenvalues of irrelevant directions converge and eigenvalues of relevant directions become distinctly separated from the baseline. (a3) In the limit of large data sets (here 5,000 spikes), the spectrum arrives at a clear distinction between relevant and irrelevant eigenvalues. (b) and (c) Comparison of the eigenvalues obtained from the actual spikes (black circles) to 95 % confidence intervals (gray area delimited by red lines) obtained from randomly rotating spiketriggered stimuli in the hypothesized irrelevant space. (b) 5,000 analyzed spikes. (c) 50 analyzed spikes. (b1) and (c1) Rotations are performed in the full 20-dimensional space. Eigenvalues lie outside the confidence interval. (b2) and (c2) After the stimulus component in the direction of the eigenvector of the first eigenvalue is projected out, rotations are performed in the remaining 19-dimensional space. The second eigenvalue still lies outside the confidence interval. (b3) and (c3) When the stimulus components in the directions of the first two eigenvectors are projected out, rotations are performed in the remaining 18-dimensional space. Now all remaining eigenvalues fall inside the confidence interval, so the corresponding stimulus space retains a degree of spherical symmetry that is compatible with the irrelevant space generate eigenvalue spectra that deviate from degeneracy only through finite-sampling effects that result from the number of analyzed spikes. The actual spectrum is then compared to the range of values obtained from the randomly shifted spike trains. Only outliers that significantly deviate from the resampled range qualify as eigenvalues associated with relevant directions.
The procedure has to be performed in a nested fashion because the finite-size effects depend on the dimensionality of the investigated stimulus subspace. First, the full stimulus space is tested. If its eigenvalue spectrum is found to be inconsistent with having no relevant directions, the stimulus direction corresponding to the eigenvalue that deviates most from unity is identified as a relevant direction and projected out from all stimuli. Next, the analysis is repeated in the reduced stimulus space. The procedure is iterated until the remaining eigenvalue spectrum is consistent with no further relevant directions.
For the case of a non-Gaussian stimulus distribution, this procedure is not directly applicable. The reason is, again, that relevant and irrelevant stimulus directions are not independent. Randomly shifting spike times creates a new ensemble of stimulus segments whose statistics are then compared to the spiketriggered stimulus ensemble. However, the statistics of such an artificial stimulus ensemble differ from the spike-triggered ensemble even along irrelevant directions. The simplest example is that the variance along irrelevant directions within the spike-triggered stimulus ensemble differs from unity (as seen in the baseline level of eigenvalues in Fig. 5(c2)), but for a random stimulus ensemble, this variance is equal to the prior variance, set to unity.
We therefore use a different resampling strategy to test whether the eigenvalue spectrum of a candidate subspace is consistent with a spectrum expected from an irrelevant space. To do so, we note that the distribution of spike-triggered stimuli retains the spherical symmetry inside the irrelevant subspace. If the candidate subspace is indeed the irrelevant subspace, the distribution of projections of the spike-triggered stimuli on the proposed subspace must be spherically symmetric, at least inasmuch as can be expected for the analyzed amounts of data. The null-hypothesis that we aim to test is thus whether the observed distribution of eigenvalues is consistent with spherical symmetry within the candidate subspace, given finite sampling. Therefore, resampling is carried out by randomly rotating each spike-triggered stimulus within this subspace. The random rotation of each spike-triggered stimulus can easily be obtained by taking the projection of the stimulus onto the candidate subspace, computing its vector length, and replacing the projection by a vector with the same length in a random direction within the candidate subspace. Practically, the random direction can be obtained, for example, by randomly drawing vector components from a Gaussian distribution and then normalizing the obtained random vector.
The resulting resampled stimulus distribution is, by construction, spherically symmetric in the investigated subspace, but retains the original distribution of absolute values. Therefore, if the candidate subspace is indeed an irrelevant subspace, then the eigenvalues of the resampled stimuli necessarily scatter around the same baseline level as the eigenvalues of the actual spike-triggered stimuli. We thus perform STC analysis on the set of rotated spike-triggered stimuli and repeat this procedure many times in order to determine the mean value of each of the ranked eigenvalues as well as confidence intervals.
Just like the resampling procedure that is based on shifting spike times, this analysis is performed in a nested fashion. The procedure is illustrated in Fig. 8(b) and (c) for the model used in Fig. 5(c) with a spherically symmetric stimulus distribution on the surface of a 20dimensional sphere and two relevant directions. In the first round, all spike-triggered stimuli are rotated in the full N-dimensional stimulus space, and we test the null-hypothesis that there are no relevant directions, so the entire spike-triggered stimulus distribution is spherically symmetric. If the eigenvalues do not lie within the pre-specified confidence limits, say 95 % confidence intervals, the hypothesis is rejected, as is the case of panels (b1) and (c1). The eigenvector whose eigenvalue deviates most from the confidence interval is then identified as a relevant direction, and it is projected out from all spike-triggered stimuli for further significance testing. In the second round, we test the null-hypothesis that there are no relevant directions in the remaining stimulus space, based on the remaining N − 1 eigenvalues. Now, stimulus lengths are calculated in the remaining (N − 1)-dimensional subspace, and eigenvalue spectra of randomly rotated vectors are obtained from random vectors with the correct vector lengths in an (N − 1)-dimensional space. The procedure is iterated until the remaining eigenvalues lie within the specified confidence intervals. Figure 8(c) shows that the method correctly identifies two relevant filters in our example, even when using as few as 50 spikes and having no obvious degenerate baseline.
The significance test for the sphericity of stimulus distributions can easily be extended to elliptic distributions. The null-hypothesis should now state that the distribution of spike-triggered stimuli in the investigated subspace displays the same elliptic symmetry as the prior distribution. To use the resampling procedure, one can thus simply determine the stimulus length of a spike-triggered stimulus after applying the whitening transformation, Eq. (22), and then perform the rotation, the subsequent eigenvalue analysis, and the elimination of identified relevant directions in the transformed, spherically symmetric space.

Discussion
Spike-triggered covariance analysis is generally used for identifying multiple relevant dimensions from Gaussian stimuli by finding differences in variance between the prior and the spike-triggered stimulus distributions. It has been noted that in this form, the analysis is not applicable to non-Gaussian stimuli (Paninski 2003;Simoncelli et al. 2004;Schwartz et al. 2006), in contrast to spike-triggered average analysis. Here, we have provided a geometric picture of STC analysis and have thereby shown that STC analysis is applicable to general spherically symmetric distributions when the criterion for identifying relevant directions is modified. The modified criterion consists of detecting eigenvalues of the STC matrix that differ from the common baseline of degenerate eigenvalues, even if this baseline does not correspond to the variance of the prior distribution. Moreover, we have shown that the new approach can also be extended to elliptic stimulus distributions. We thus conclude that the consistency of STC analysis requires special symmetries in the prior stimulus distribution (spherical, or more generally, elliptic). Gaussianity, instead, is not indispensable.

Non-Gaussian stimuli in practice
Non-Gaussian spherical or elliptic stimulus distributions are, of course, not nearly as frequently encountered in experimental situations as Gaussian distributions, primarily because time series of Gaussian stimuli s can be obtained in a continuous fashion by drawing new stimulus components s(t) from a Gaussian distribution. No such continuous generation of stimuli with a spherical, non-Gaussian distribution is possible. Application of the extended method may thus become useful when experiments are performed with stimuli that do not contain temporal dimensions, for example, when the components of s represent spatial stimulus elements (cf. Fig. 1(b)). In this case, general spherical distributions allow more flexibility than Gaussian stimuli. They may serve, for example, to provide contrast normalization for each presented stimulus or to use stimulus distributions that drive the investigated neurons more efficiently than a Gaussian stimulus does (Ringach et al. 1997). For the visual system, an interesting scenario might be the investigation of flashed images (Gollisch and Meister 2008b) or images that are presented in a saccade-fixation context (Segev et al. 2007). Other relevant scenarios may come from experiments where the temporal domain is intrinsically less relevant because of slower data acquisition. This may occur, for example, when analyzing data from calcium imaging, voltage-sensitive-dye imaging, or similar experiments. Here, spike-triggered analyses might be transformed into fluorescence-triggered analyses: Instead of selecting stimulus segments based on whether they elicited a spike or not, all stimulus segments are considered, but weighted by the elicited response strength, i.e. the fluorescence signal. This is analogous to working with trial-averaged firing rates or intracellularly measured synaptic currents (Demb 2008;Schwartz and Rieke 2011). As the temporal resolution of fluorescence imaging experiments is often not sufficient for analyzing temporal stimulus integration characteristics, it makes sense to present stimuli in a one-by-one fashion instead of a continuously updating time series.
Finally, even temporal stimulus attributes may be analyzed with non-Gaussian spherical stimulus distributions if a continuous stimulus update is not required. One such scenario occurs when individual stimulus segments are locked to external events such as a saccade (Geffen et al. 2007) or a contrast switch (Baccus and Meister 2002).

Connection to Wiener series
STA and STC analyses are closely connected to the theory of Wiener series (Wiener 1958), which can be used to model the stimulus-response relation of a neuron by the first few terms of a functional expansion. For Gaussian white noise stimulation, this expansion can be systematically obtained by the Lee-Schetzen method, which derives the kernels of the Wiener series from various crosscorrelations between the stimulus and the response (Lee and Schetzen 1965), similar to the reverse correlation in STA and STC analysis. In fact, the first-order kernel of the Wiener series corresponds to the STA, and the second-order kernel of the series is obtained from the STC matrix.
For non-white Gaussian stimuli, this analogy still holds. The crosscorrelation method for obtaining the kernels of the Wiener series has been extended analogously to the derivation shown here in Section 4: apply a whitening transformation, obtain the Wiener kernels in the transformed space, and translate the results back to the original space, which in the case of Wiener series means to prepend the kernel operations with the whitening transformation (Lee and Schetzen 1965;Schetzen 1974).
Furthermore, efforts have been made to extend the method of Wiener series expansion or develop analogous functional expansion strategies for various types of non-Gaussian stimuli, such as spike-like inputs (Krausz 1975), noise stimuli with discrete input levels (Marmarelis 1977), superpositions of sinusoids (Victor and Knight 1979), and certain nonlinearly transformed Gaussian stimuli (Schetzen 1981). However, comparing these approaches to the treatment of non-Gaussian inputs in Section 3 highlights an important difference between the functional series models and the LNmodel-based approach as in the present work. For the latter, the goal is generally to find the filters k m that describe the system's response according to Eq. (5), independently of the applied stimulus. This means that ideally the exact same relevant space should be obtained if the system is probed with different stimulus distributions. By contrast, the functional series models typically aim at minimizing the quadratic error between the predicted and the actual response for a given stimulus and a given order of the series expansion. The optimal kernels, which achieve this minimal error, may well depend on the applied stimulus, in particular for non-Gaussian stimuli where the constraints on the prior stimulus distribution might actually be exploited for response prediction. It is thus not surprising that analogies between Wiener series and STA and STC analysis for non-Gaussian stimuli appear less straightforward; yet, further exploration of the relation between these approaches for different stimulus distributions should prove a promising route for future investigations to arrive at a deeper understanding of the scope of these methods.

A diversity of approaches in STC analysis
STC analysis has been formulated in several different versions, both in this work and in several previous studies. These different approaches are, of course, related to each other, though not always equivalent. In the following, we discuss some of their connections.
STC with and without subtracting the STA Not surprisingly, the STA always lies in the relevant subspace, as shown in Section 3.2. This means that STC analysis can be performed with the actual covariance matrix of spike-triggered stimuli as well as with a variant of this matrix where the STA is not subtracted. The latter might be referred to as a matrix of second moments rather than an actual covariance, and it has also been termed "non-centered spike-triggered covariance matrix" (Cantrell et al. 2010). While both variants generally identify the correct relevant subspace, the obtained eigenvalue spectra and individual eigenvectors are typically different, as exemplified in Fig. 4. Comparing the spectra of STC analysis with and without subtracting the STA may thus serve as a simple consistency check and flag certain cases where one of the approaches fails to identify all relevant directions because the corresponding eigenvalue happens to coincide with the baseline of the irrelevant spectrum. As mentioned earlier, on the other hand, such a scenario is typically also picked up by explicitly including the STA when determining the relevant subspace, or more generally, by using an estimator that directly combines information from changes in stimulus mean and variance .

Relevant directions as eigenvectors of C
In several previous studies (Brenner et al. 2000;Fairhall et al. 2006;Maravall et al. 2007), covariance analysis was based on diagonalizing the matrix C = C s − C p . For spherical stimulus distributions with unit variance of each stimulus component, C p is the identity matrix. The eigenvectors of C thus coincide with those of C s , and the eigenvalues are shifted downwards by one unit. The relevant and irrelevant directions of C, hence, coincide with those of C s .
For a Gaussian elliptic stimulus distribution, C still identifies the relevant subspace correctly (Bialek and de Ruyter van Steveninck 2005); the obtained relevant eigenvectors simply need to be premultiplied by C −1 p to correct for the correlations of the prior stimulus distribution. The eigenvalue spectrum and the individual eigenvectors now typically differ from those obtained with the procedures discussed in Section 4. Yet, the relevant spaces obtained with both methods coincide. Interestingly, we have seen above that relevant and irrelevant directions transform differently.
The difference also appears when diagonalizing C, since irrelevant directions must be premultiplied by C p , and not by C −1 p . This ensures that relevant directions remain orthogonal to irrelevant directions.
The investigation of elliptic stimulus distributions through C is only guaranteed to work with Gaussian stimulus distributions. As shown in Fig. 7, elliptic non-Gaussian stimulus distributions can lead to spurious eigenvalues deviating from the zero baseline level with this approach.
Relevant directions as stimulus directions of modif ied variance A relevant eigenvector w fulfills the equation C −1 p C s w = λw. With a little bit of algebra, it follows that the associated eigenvalue is λ = w T C s w / w T C p w . This expression represents a ratio of variances (Schwartz et al. 2006): the variance of the spike-eliciting stimuli to the variance of the prior stimuli, both measured along the direction of w. Thus, searching for directions in stimulus space where this ratio is "unusual", meaning that it differs from the baseline of variance ratios of irrelevant directions, serves as a way to identify relevant stimulus directions. This procedure is equivalent to the eigenvalue analysis of C −1 p C s . Covariance analysis reveals changes in variances and is thus sensitive only to modifications up to the second moment of the probability distribution; higher order effects are not considered. Conversely, if for Gaussian stimulation the variance is altered in direction k m , then for sure this direction belongs to K. Hence, the detectability by covariance analysis is a sufficient, but not a necessary condition, for a direction to be relevant in the sense of Eq. (5).

Further characterization of the relevant space
As mentioned before, Eq. (5) defines the relevant space K unambiguously, but different sets of the individual filters k m may be chosen without affecting the final spike probability. Even so, one may sometimes be interested in distinguishing between different directions inside K. In some cases, additional structure within the relevant subspace may suggest a particular choice of the filters. As an example, cluster analysis of spiketriggered stimuli has been used to find filters within the relevant subspace that likely match actual physiological pathways (Fairhall et al. 2006;Geffen et al. 2007;Gollisch and Meister 2008a).
Alternatively, relevant directions may be distinguished based on the magnitude and sign of the change in variance along each direction. The latter has been used to classify relevant directions as either excitatory or suppressive, depending on whether the variance of spike-triggered stimuli is increased or decreased compared to the prior stimulus (Schwartz et al. 2002;Rust et al. 2004Rust et al. , 2005Simoncelli et al. 2004;Schwartz et al. 2006). The classification into excitatory and suppressive stimulus directions through the magnitude of the eigenvalue makes sense only for relevant directions that are perpendicular to the STA. The STA itself, which usually functions as an excitatory stimulus direction, can be associated with an increase or decrease in variance, depending on the nonlinearity ϕ. Therefore, in those studies, the STA is typically projected out from each stimulus vector in order to then determine relevant directions orthogonal to the STA.
When the stimulus is not Gaussian, however, the size of an eigenvalue also reflects the effect of the interference between different stimulus directions imposed by the constraints of the prior stimulus distribution. For example, changing the nonlinearity along one relevant direction typically affects also the eigenvalues of other relevant directions. It is thus less straightforward to distinguish between excitatory and suppressive directions depending on the size of the eigenvalue. Yet, for most practical purposes, a distinction based on whether eigenvalues of relevant directions lie above or below the baseline level of irrelevant directions should provide a useful classification in terms of the excitatory or suppressive nature of relevant directions, given the constraints of the particular prior stimulus distribution.
Further characterization of the relevant space can come from observing degeneracies in the eigenvalue spectrum. These reflect fundamental properties of the firing probability, at least as long as the prior stimulus distribution is spherically symmetric. Even within the relevant space, eigenvalue degeneracies may be informative. A degeneracy in two or more relevant directions implies that the firing probability is endowed with additional symmetry properties: The variance is equally altered in several relevant directions. The eigenspaces associated with those symmetries are fundamental characteristics of the firing probability. Examples of such degeneracies in the relevant space have been found. Modeling studies have shown that resonator neurons are sometimes only selective to the stimulus frequency, but not to its phase (Mato and Samengo 2008). In such cases, covariance analysis detects a degenerate two-dimensional relevant eigenspace, generated by two periodic eigenvectors with a 90 • phaseshift. Linear combinations of these two eigenvectors generate a periodic stimulus with arbitrary phase. Experimental studies of visual neurons in the fly (Bialek and de Ruyter van Steveninck 2005) and complex cells in mammalian visual cortex (Touryan et al. 2002;Rust et al. 2005) have reported similar degeneracies; the relevant space contained degenerate, two-dimensional eigenspaces, characterized by well-defined location, orientation, and frequency, with arbitrary phase.

Conclusion
Here we have provided a geometric proof of consistency of spike-triggered covariance analysis. The geometric approach has led to an extension of the tech-nique to arbitrary (non-Gaussian) elliptic stimulus distributions. For spherical distributions, irrelevant directions typically constitute a large degenerate eigenspace of the spike-triggered covariance matrix. Relevant directions are detected as the eigenvectors whose eigenvalues depart from the baseline degenerate level. In contrast to the Gaussian case, the value of irrelevant eigenvalues is not known a priori; it depends on the nonlinearity ϕ. For elliptic stimulus distributions, STC analysis can be appropriately modified to account for the correlations in the stimulus. This can be achieved by performing eigenvalue analysis on a matrix equal to the product of the inverse of the prior covariance matrix and the spike-triggered covariance matrix.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix A
Using group-theoretical arguments, here we prove that, for spherical stimulus distributions, the irrelevant subspace K ⊥ is an eigenspace of C s . The prior distribution P(s) only depends on the length of s and therefore remains invariant under any orthogonal transformation of the s-space, that is, any transformation that preserves the lengths of all vectors. Orthogonal transformations fulfill the condition O T = O −1 . In addition, any transformation O i acting only on the irrelevant subspace K ⊥ also leaves the spike probability ϕ(k T 1 s, . . . , k T M s) invariant: There are many such orthogonal transformations O i . For example, all rotations whose rotation plane is contained in the irrelevant space and all reflections that invert stimulus directions within the irrelevant space leave ϕ invariant. These transformations, together with the ones obtained by combining them, constitute a group: the symmetry group of P(s|spike). More formally, these transformations define a representation of this symmetry group. Since there is no proper subspace of K ⊥ that remains invariant under the action of all the O i of the group, the irrelevant space is an irreducible space of the representation. The transformations O i only operate in the irrelevant space and thus leave the relevant vectors invariant. In particular, the STA is left unchanged: Starting from the integral in Eq. (12) and introducing an orthogonal change of variables s = O i s , we arrive at Using this same change of variables in the calculation of ss T , we see that Therefore, the matrix C s commutes with O i , In group theory, Schur's lemma states that if an operator commutes with all the matrices of an irreducible representation, then, inside the irreducible space of the representation, the operator is proportional to the unit matrix (Wiegner 1959;Tinkham 1964). Therefore, K ⊥ must be an eigenspace of C s . To illustrate the application of Schur's lemma in this case, we note that if v ∈ K ⊥ is an eigenvector of C s with eigenvalue λ, then Eq. (34) implies that O i v is also an eigenvector of C s with eigenvalue λ. By appropriately choosing O i , any vector v ∈ K ⊥ whose length is equal to |v| can be written as v = O i v. Therefore, the whole of K ⊥ is an eigenspace of C s . The same reasoning can be applied to the STC matrix when the spike-triggered average is not subtracted.
As a final remark, we point out that if P(s|spike) also contains a symmetry group inside the relevant space and if the associated irreducible space has dimension larger than 1, degeneracies also appear in K. Of course, P(s|spike) might have no symmetry in the relevant subspace. But if, for example, ϕ(k T 1 s, . . . , k T M s) only depends on (k T 1 s) 2 + (k T 2 s) 2 + . . . (k T M s) 2 , then when the prior stimulus distribution is spherical, also the relevant directions have degenerate eigenvalues. In Fig. 5, for example, the relevant eigenvalues of C s are degenerate. This is not the case in Fig. 4, where the firing probability is not symmetric. In Fig. 6, instead, the firing probability is indeed symmetric, but the prior stimulus is not. Thus the degeneracy with respect to the relevant eigenvectors of C −1 p C s is broken. In Fig. 7, the degeneracy is recovered, since the prior stimulus distribution is elliptic, but is (almost) spherical in K.

Appendix B
In Section 4, we extended STC analysis to general elliptic stimulus distributions. This extension was based on a transformation to a spherically symmetric stimulus distribution, for which the previously derived results were applicable. Here we provide the analogous extension for the STA.
When the spike probability ϕ contains a single relevant direction k 1 , the STA is proportional to k 1 if the stimulus distribution is spherical (Chichilnisky 2001). If the stimulus distribution is elliptic, new variables s can be defined via Eq. (22), such that the prior stimulus distribution P(s ) is spherical. In the transformed stimulus space, the STA s = ds P(s |spike) s is then proportional to the filter, In order to obtain the relation of STA and filter in the original space, we note that, because the integral in Eq. (35) can be transformed using s = D −1/2 O T s and ds P(s |spike) = dsP(s|spike), the transformation rule for the STA is the same as for individual stimuli, Eq. (22), Furthermore, the backward transformation for k 1 is that of a relevant stimulus direction, Eq. (28),