Introduction

Information about the external world is available for the central nervous system (CNS) only via the spike trains arriving from various sense organs. Hence, a basic task of any CNS is to extract from these sensory spike trains information about events in the outer world (Bialek et al. 1991). Sensory perception, however, has to overcome the high variability in the spike trains that represent one and the same sensory object. Two major noise sources contribute to overall variability: (1) extrinsic noise: masking by ambient noise or the interference from other objects distort the signal even before it arrives at the sense organs (Richards and Wiley 1980; Römer and Lewald 1992; Brumm and Slabbekoorn 2005); and (2) intrinsic noise, due to stochastic events involved in the generation of neuronal signals (Borst and Theunissen 1999; Azouz and Gray 1999; Faisal and Laughlin 2007; Faisal et al. 2008), becomes evident as trial-to-trial variability in the spike responses if an identical stimulus is presented repeatedly.

A priori, we cannot predict whether the effects of intrinsic neuronal noise are negligible compared to an overwhelming influence of external signal degradation.

We decided to systematically change the external noise in a model system to quantify its relative contribution to spike train variability and study its effect on behavior, as we reasoned that invariant object recognition in the presence of noise must be related to the degree of invariance of the spike trains that represent the object. An acoustic mate recognition system was chosen, which, on the one hand, guarantees that the recognition task is highly relevant, and, on the other hand, allows for a direct comparison of neuronal with behavioral performance. Our model species, the grasshopper Chorthippus biguttulus, uses simple, stereotypic sound signals (‘songs’) to attract and identify sexual partners. The decisive cues for song recognition reside in the temporal pattern of amplitude modulations, the signal’s envelope (von Helversen 1972; von Helversen and von Helversen 1997, 1998; Schmidt et al. 2008).

Amplitude modulations of acoustic signals are particularly vulnerable to distortion in complex environments, as sound reverberates, taking many paths to reach the receiver, which results in unpredictable amplitude modulations and, in particular, distortion of the fast modulations (Richards and Wiley 1980). In addition, sound signals may be masked by other sound sources, making external signal degradation potentially severe (Richards and Wiley 1980). Hence, the contribution of a multitude of noise sources in the outer world may conceivably exceed that of intrinsic ones.

We tested to what degree extrinsic and intrinsic noise affect the neuronal representation of sound patterns by successively degrading the envelope of a well-accepted female song with random amplitude fluctuations (Fig. 1a). The original and the degraded stimuli were then used both in behavioral experiments and during in vivo neurophysiological recordings from the metathoracic ganglion. The metathoracic ganglion houses an important stage of auditory information processing, comparable to the auditory brain stem nuclei of vertebrates (Stumpner and Ronacher 1994). More than 50 receptor neurons per ear converge onto 10–15 local interneurons that are connected to 15–20 ascending interneurons (Römer and Marquart 1984; Stumpner and Ronacher 1991) which carry the auditory information to the brain. We recorded spike trains from identified neurons belonging to these stages of auditory processing. Distinct cell types could be identified as individuals on the basis of their characteristic morphology and physiology.

Fig. 1
figure 1

Stimuli and principle of data evaluation. a The upper panel shows the oscillogram of the original female song which contained 12 similar syllables (s) separated by pauses (p); each syllable consisted of six sound pulses (see enlargement). Lower panel the envelope of two song syllables before and after the addition of envelope noise at −3 and 3 dB NSR. b Amplitude spectra of the song envelopes shown in a (amplitude in arbitrary units). c Examples of spike train patterns in response to 12 syllables of a female grasshopper call for various levels of noise. Upper panel the spike responses of TN1, a tonic local interneuron. Lower panel AN12, a phasic ascending neuron. d Distance matrix of the spike trains of TN1. Metric distances between spike trains are color coded from blue to red (zero to maximum distance in percent). Square blocks along the diagonal contain the ‘intrinsic’ distance values for each degradation level (e.g. x 0 distances between spike trains in response to the original song). Squares along the right column indicate ‘extrinsic’ distance values (e.g. y 1 the distances between spike trains in response to the original song and the first degradation level)

The responses of auditory neurones to an incoming communication signal provide the neuronal basis for recognition. Hence, the difference between a too strongly corrupted and an acceptable signal should become evident in the dissimilarity between the respective spike trains. We applied a spike train metric (van Rossum 2001) to quantify dissimilarities between spike trains produced in response to the original signal and the degraded signals. Based on this dissimilarity measure, we used a cluster algorithm (Slonim et al. 2005) and a discrimination analysis to define a neuronal critical degradation level (nCDL) for each neuron, which was then compared to the behavioral song recognition performance of the animals. This neuroethological approach allowed us a direct comparison of neuronal and behavioral performance under the influence of external and intrinsic noise.

Materials and methods

Animals

Behavioral experiments were performed on adult males of C. biguttulus. Electrophysiological experiments were performed on adult individuals of C. biguttulus and Locusta migratoria. Males and females were used for electrophysiology as no sex-specific differences were found for auditory neurons in the metathoracic ganglion (own data and Stumpner personal communication). The locusts were obtained from a commercial supplier, C. biguttulus were caught in the field or F1 reared from our own breeding stock.

Acoustic stimuli

To compare neurophysiological with behavioral data, we used an acoustic recording of a C. biguttulus female which contained 12 similar syllables separated by pauses (p) of around 20 ms; each syllable consisted of six sound pulses (see Fig. 1a). The song duration was 1.2 s. The decisive cues for signal recognition reside in the temporal pattern of amplitude modulations of a broadband carrier signal (von Helversen and von Helversen 1997). Hence the envelope of the original song (main energy between 10 and 100 Hz) was degraded by adding white noise with 1 kHz cut-off frequency to the signal’s envelope (see lower traces in Fig. 1a). The envelope noise was identical across subsequent trials for a given degradation level (“frozen noise”), whereas for each successive level a new stochastic degradation was generated. Signal degradation was performed in 3 dB steps relative to the variance of the original envelope. The noise-to-signal ratio (NSR) is given in decibels by NSR = 10 log (noise variance)/(original song variance). Consequently, a degradation level of 0 dB refers to a noise-to-signal ratio of one. The resulting envelopes were filled with the typical carrier frequency spectrum of female songs (von Helversen and von Helversen 1997). The overall stimulus intensity was kept constant, which meant that the relative contribution of the signal to the envelope spectrum was progressively reduced by noise. The upper panel in Fig. 1b shows the envelope’s spectrum of the original song consisting of several major frequencies between 10 and 100 Hz. In the degraded signals, the amplitudes of these frequency components were reduced relative to the amplitudes of the noise frequencies (0–1,000 Hz) (Fig. 1b, middle and lower panel). Degrading the signal envelope adds power to the pauses—leaving the signal energy constant also tended to subtract power from the syllables. Nevertheless, since sensory systems act on a logarithmic scale, this procedure had only a minor effect: Computing the integral of the original syllables and the most degraded syllables reduced the total energy by only −1.2 dB. The main peaks within the syllables were reduced by −0.76 dB. These differences are small relative to the standard deviations of the intensity response curves of neurons (Weschke and Ronacher 2008). Measuring the ratios of signal and noise in terms of sound amplitudes is not equivalent to using the variance of the sound pressure level as we have done and implies that the signal is more strongly corrupted at any given dB level in this study than in others (Fig. 1a); the approximate conversion factor is slightly more than two (Bracewell 1999).

Neurophysiology: stimulation

Acoustic stimuli were stored digitally and delivered by custom-made software (LabVIEW, National Instruments). Following a 100-kHz D/A-conversion (PCI-MIO-16E-1, National Instruments), the stimulus was routed through a computer controlled attenuator (PA5; Tucker-Davis Technologies, Gainesville, FL, USA) and an audio amplifier (Mercury 2000; Jensen, Pulheim, Germany). Acoustic stimuli were broadcast unilaterally by speakers (D-28/2, Dynaudio, Skanderborg, Denmark) situated a distance of 30 cm from the preparation. All experiments were performed in a Faraday cage lined with foam prisms to reduce echoes. Each stimulus was repeated 10 times. Sound intensity was calibrated with a ½-inch microphone (type 4133, Brüel & Kjær) and a measuring amplifier (type 2209, Brüel & Kjær), positioned at the site of the preparation. Neurons were preferably stimulated at 60 dB (see behavioral experiments), except for 32 cells with higher threshold (27 cells at 70 dB, 5 cells at 80 dB) or non-monotonous intensity functions (9 cells at 50 dB). Each noisy signal was presented 10 times to the animal while recording the spike trains of identified neurons. For the computation of neuronal critical degradation levels only those cells were used, which were tested with 60 dB stimulus intensity.

Neurophysiology: intracellular recordings

The conventional experimental procedures are described in detail elsewhere (Vogel and Ronacher 2007). Head, legs, wings and gut were removed and the animals were fixed ventrally onto a freestanding holder. The thorax was opened dorsally and the torso was filled with locust Ringer solution. During the experiments, the preparation was kept at a constant temperature (30 ± 2°C). Intracellular recordings were obtained from auditory neurons within the metathoracic ganglion using thin-walled glass capillaries, the tips of which were filled with a 3–5% solution of Lucifer yellow (Sigma–Aldrich, Taufkirchen, Germany) in 0.5 M LiCl. Recorded cells were considered as auditory neurons if they showed a reliable response to an acoustic search pulse. Neural responses were amplified (Bramp-01; npi electronic, Tamm, Germany) and recorded by a data-acquisition board (PCI-MIO-16E-1; National Instruments, Munich, Germany) at a sampling rate of 20 kHz. After completion of the stimulation protocol, the dye was iontophoretically injected into the recorded neuron. The thoracic ganglia were removed, fixed in 4% paraformaldehyde, dehydrated, and cleared in methylsalicylate. The stained cells were identified according to their characteristic morphology (Römer and Marquart 1984; Stumpner and Ronacher 1991). Online Resource 1 summarizes the cells recorded. Recordings covered neurons (N = 89) belonging to at least 14 neuron types and three different processing levels.

The nomenclature of the cell types results from the distinct level of auditory computation to which they belong to: (1) receptor cells: REC; (2) local interneurons: LN, they receive direct input from the receptor cells and their arborizations are confined to the thoracic and abdominal ganglia; (3) ascending interneurons: AN, they do not receive direct input from the receptor cells but from local interneurons and their axons provide direct input to the supraoesophageal ganglion (Marquart 1985; Stumpner 1988). The local neurons were further classified according to their branching pattern into 2a) SNxy = monosegmental neurons with branching patterns just within the metathoracic ganglion. 2b) BSNxy = bisegmental neurons with branching patterns within the meta- and meso-thoracic ganglion. 2c) TNxy = T-shaped neurons with branching patterns going down into the abdomen and also up into the prothoracic ganglion.

As the carrier frequency spectrum of the female song had its main energy component around 8 kHz, only low frequency receptors were investigated in this study. The experimental protocol complied with German laws governing animal care.

Determining metric distances between spike trains

Spike times at a temporal resolution of 0.05 ms were extracted from the digitized recordings by applying a voltage threshold. The spike trains were convolved with a filter function to compute a metric distance between pairs of spike trains (van Rossum 2001). This approach is closely related to using the spike train metric by Victor and Purpura (1997), as shown by (Houghton and Sen 2008). As a filter we used an α-function, which mimics an excitatory postsynaptic potential (Machens et al. 2003). The width of the filter function was set by a time constant τ. By varying τ one adjusts the temporal resolution with which the spike trains are evaluated. For large τ (e.g. >500 ms) the metric glosses over the precise timing of spikes in favor of emphasizing the differences in the total spike count, while using a small τ (~5 ms) makes the distance measure depend on the temporal pattern of spikes. Using this spike train metric, we quantified how external noise warps the spike train pattern.

The spike train metric served as the basis for the supervised clustering of spike trains. For instance (Wohlgemuth and Ronacher 2007) used a supervised cluster algorithm (Machens et al. 2003) to quantify how well the spike trains of auditory neurons in L. migratoria discriminate sinusoidal stimuli. For sinusoids, the temporal resolution τ that yielded the highest discrimination performance increased significantly from receptor neurons to ascending neurons. To test whether this result holds also for our stimulus ensemble, we determined the cell-specific optimal time constant for each neuron. Online Resource 2A shows the discriminability between the stimuli (original song + different degradation levels) as a function of the time constant τ for a sample of cells (for details of the classification algorithm, see Machens et al. 2003 and Wohlgemuth and Ronacher 2007). The box plots in Online Resource 2B illustrate the distributions of optimal τ values for receptors (median = 5 ms), local neurons (median = 3 ms), and ascending neurons (median = 8 ms). A Kruskal–Wallis test indicated significant differences between the computational levels within the metathoracic ganglion (χ² = 34.9, p < 0.001); Mann–Whitney post hoc comparisons showed no significant differences between receptor cells and local interneurons (Z = −1.678, p = 0.094, two-way), in accordance with the data of Wohlgemuth and Ronacher (2007). For ascending neurons, the optimal τ values were significantly higher than for receptors (Z = −3.568, p < 0.001) and local neurons (Z = −5.521, p < 0.001). The broad maxima in the graphs of discrimination performance versus τ (see Online Resource 2A), however, indicate that the specific choice of τ will only make a minor difference, as long as τ lies within a range of 2–10 ms. Nonetheless, for comparison purposes we computed the spike train distances at three different time resolutions: τ = 3, 5 and 10 ms.

Neuronal critical degradation level

To estimate a neuronal critical degradation level, we applied two distinct procedures. The basic idea of both is to obtain an estimate of how much a spike train can deviate from the “original” spike train pattern (i.e. the spike pattern in response to the original song) and still be classified as similar enough to represent the original stimulus. Obviously, the intrinsic variability, i.e. the differences between spike trains in response to repeated presentations of the same, original stimulus (see x 0 in Fig. 1d) sets the minimum margin for discriminating between the intrinsic and ‘extrinsic’ distance distributions (differences between spike trains in response to the original stimulus and successive degradation levels, see y 1 − y n in Fig. 1d). We estimated the likelihood, given one randomly sampled intrinsic distance and one randomly sampled extrinsic distance, that the larger distance was the extrinsic one (Ronacher and Krahe 2000). We set a minimum criterion of 95% correct classification; the critical degradation level was defined as the level at which correct classification crossed this criterion. This procedure corresponds to a significance level of p = 0.05. A Wilcoxon test based on repeated, random subsampling of the data set yielded identical results (data not shown). As an alternative threshold criterion, we accepted an error rate of 7%, which corresponds to a d′ of 1.8 in signal detection theory (Klump and Baur 1990).

We also took a different approach, using an unsupervised information-based cluster algorithm [for details, see (Slonim et al. 2005)] to discriminate between intrinsic and extrinsic distances. This algorithm selects cluster boundaries that maximize the average similarity—here, the metric distance between spike trains—within each class. Spike trains were assigned to only two classes: the ‘orig’ class in which the spike trains are on average more similar to the ‘original’ spike trains; and a ‘noise’ class that groups together the more strongly degraded spike trains. The boundary between the two classes was defined by the point at which the probability of belonging to a class dropped to 50%. If increasing noise led to a steady progression of spike trains being assigned to the ‘noisy’ class, the boundary was termed simple. But this was not always the case. We therefore applied a consistency test: we checked for the consistency of the cluster boundaries by randomly choosing five (out of ten) spike trains for each stimulus class and repeating the cluster analysis; this procedure was repeated 100 times. If fewer than half of the randomly subsampled spike train sets yielded a simple boundary after clustering, the boundary was deemed unreliable. For 2 out of 22 local neurons and for 4 out of 19 ascending neurons this was the case. The mean standard deviation of the simple boundaries found in this consistency test was 1.5 dB for receptor cells, 1.7 dB for local interneurons and 1.9 dB for ascending neurons, compared to a step size of 3 dB between noise levels. To compute neuronal critical degradation levels for responses to distinct song components the spike train was segmented into distinct pause (20 ms duration), syllable (80 ms duration) or syllable onset (20 ms duration) windows to whose positions the corresponding median latency was added.

Behavioral tests

Males of C. biguttulus were used for the behavioral tests, in which the same stimuli were used as in electrophysiology. The playback experiments were performed at 30 ± 2°C in an anechoic chamber. Whenever a male started to sing, a model song was broadcast from a loudspeaker situated laterally. Each song was presented at least 10 times in pseudo-random order at an intensity of 60 dB SPL. The phonotactic behavior of the animal, whether it turned towards the loudspeaker or not, was monitored visually and journalized. The response probability to the different types of song models was calculated as the percentage of turning responses relative to the total number of stimulus presentations. We interpolated the degradation level that caused a decrease of turning response to 50% (Fig. 4a; arrow), which is an established threshold value for go/no-go tasks and corresponds to a d′ of 1.8 when the false alarm rates are low (≤3.8%) (Klump and Baur 1990; Dooling and Okanoya 1995). This procedure allowed us to determine the critical stimulus degradation above which the males ceased to turn, implying that signal recognition was severely impaired (Ronacher et al. 2000).

Stimulus reconstructions

The linear filter that minimizes the mean square distance between the stimulus and the reconstruction from the spike train is simply the spike-triggered average of the stimulus, once one corrects for the spike train’s correlation with itself (Gabbiani et al. 1996). In applying such a linear reconstruction filter to these data, there is a risk of overfitting; while the grasshopper song syllables are not identical, they are highly stereotyped and unlike the Gaussian random stimuli often used to test linear reconstruction. We restricted the time window of the reconstruction filter to approximately one syllable period into the future and into the past, so as to limit the degree of overfitting. We used Gabbiani and Koch’s definition of the coding fraction as the ratio of the variance of the reconstruction to the variance of the signal. The coding fraction is a number bounded between zero and one (for perfect reconstruction).

Results

Neurophysiology

We investigated neurons of the first three consecutive processing stages in the auditory pathway of grasshoppers: auditory receptors, local neurons confined to the thoracic ganglia, and neurons ascending to the brain. To test how long the neuronal representation of the female song remains stable with respect to signal degradation, the envelope of a female song was progressively corrupted with random amplitude fluctuations (Fig. 1a, see “Materials and methods”). Spike raster plots of two representative neurons are shown in Fig. 1c. TN1 (upper panel) is included among the local neurons on the basis of its primary-like tonic responses, while AN12 (lower panel) is a phasically responding ascending neuron. For each recorded cell, distances between all pairs of spike trains were computed, as displayed in the color-coded matrix of Fig. 1d. Repeated presentation of the same stimulus did not evoke identical spike trains, as revealed by the non-zero distances between spike trains in response to 10 trials of the same stimulus; the square blocks along the diagonal (x 0, …, x n ) display the trial-to-trial variability due to ‘neuronal noise’. In the following, we term this type of noise ‘intrinsic’ to contrast it against the effects of external signal degradation. However, this does not mean that this type of noise originates only in the neuron recorded from. Rather it represents the combined effect of a multitude of stochastic events in upstream neurons, beginning with sensory transduction, spike generation in sensory neurons, and synaptic transmission. For the majority of recordings (~80%), there was no significant dependence between ‘intrinsic’ noise and degradation level on a single cell level (see Online Resource 3).

As the signal was progressively degraded, the mean distance between spike trains in response to the uncorrupted stimulus (x 0) and in response to a degradation level (y i ) increased. Figure 2a shows spike train distances for a sample of four neurons (to allow for a comparison between neurons that differed in spike rates, the distances were standardized by the respective mean spike rates). The dramatic change in the envelope of the auditory stimulus seen in Fig. 1a belies the increase in spike train distances, which turned out to be quite modest by comparison. Distances between spike trains in response to the original song and spike trains in response to 9 dB were always lower than those between ‘orig’ spike trains and Poisson spike trains with the same firing rates (not shown). Figure 2b shows the ratios of the mean distance at the highest, 9 dB, signal degradation (y 7 in Fig. 1d) and the intrinsic distance for the original song (x 0) for receptor cells, local and ascending neurons. For almost all neurons, the spike train distances due to external signal degradation underwent a less than twofold increase relative to the average intrinsic distance (ratio < 2). For many ascending neurons, increasing the external noise led to almost no change in the spike train distances, yielding a distance ratio of close to unity (open columns in Fig. 2b).

Fig. 2
figure 2

Influence of signal degradation on spike train dissimilarities. a Mean distances between spike trains in response to the original song and progressively corrupted songs. Ordinate: spike train distances (in arbitrary units); abscissa: degradation level (in dB); ‘orig’ denotes the uncorrupted stimulus. The distance curves are shown for a receptor cell (REC), a local neuron (TN1) and two ascending neurons (AN3, AN6). For each curve representative standard deviations are shown. The value at ‘orig’ corresponds to the ‘intrinsic’ noise (i.e. trial-to-trial variability). b Histogram showing the ratios of the mean distance at 9 dB signal degradation and the mean intrinsic distance. Results of receptor cells, local and ascending neurons are shown as black, gray and open columns (see inset). c Comparison of the trial-to-trial variability (i.e. intrinsic distances for the ‘orig’ stimulus) between processing stages (medians and interquartile ranges; whiskers indicate 10 and 90 percentiles). d The slopes of the distance curves indicate the sensitivity of the neuronal spike pattern to extrinsic noise. N = 13 (REC), 42 (LN) 34 (AN)

The metric distances between the spike trains in response to repeated presentations of the uncorrupted song are a measure of the intrinsic neuronal noise, while the slope of the linear regression of distance plotted versus degradation level quantifies the influence of external signal degradation. We compared the ‘intrinsic distances’, i.e. the measure for trial-to-trial variability, and the slopes for the populations of receptors, local neurons, and ascending neurons using the spike train metric with a time constant of τ = 5 ms. Figure 2c shows the intrinsic distances of cells recorded in C. biguttulus and L. migratoria. Trial-to-trial variability increased significantly between neurons on consecutive processing stages (χ² = 34.13; p < 0.001; Kruskal–Wallis test). Mann–Whitney post hoc comparisons showed significant differences between receptor cells and local interneurons (Z = −2.95; p < 0.01, two-way), receptor cells and ascending neurons (Z = −4.95; p < 0.001, two-way) and between local interneurons and ascending neurons (Z = −4.31; p < 0.001, two-way). The low intrinsic noise of receptors resulted from the precise locking of spikes to the distinct pulse structure of the uncorrupted stimulus. Local neurons did not sustain this high level of precision—in spite of potentially sampling across a population of 60–80 receptors. In accordance with previously published results (Neuhofer et al. 2008), no significant differences between the two species in terms of the intrinsic noise were observed (see Online Resource 4A: REC: p = 0.231; LN: p = 0.119, AN: p = 0.695, Mann–Whitney U test).

In the female song used here, sound syllables alternate with distinct pauses, and this syllable-pause structure is an important feature for song recognition (von Helversen and von Helversen 1997; Ronacher and Krahe 1998). In their spike trains many ascending neurons predominantly reflected the coarse syllable-pause structure, while not responding reliably to the sound pulses within a syllable. Given that the spikes in response to syllable onsets tend to be more reliable (Avissar et al. 2007), some of the increase in intrinsic variability may arise from the occasional, rather irregular spiking of ascending neurons during the later part of the syllables (see Fig. 1c). We therefore restricted the metric analysis to spikes produced in response to the first 20 ms of each song syllable. Remarkably, also under this restriction, the differences across computation levels remained significant (p < 0.0001; Kruskal–Wallis test; REC–LN: p = 0.003; REC–AN: p < 0.0001; LN–AN: p < 0.001; Mann–Whitney U post hoc comparison).

The slopes of the distance curves, which reflect the influence of external noise, decreased markedly from receptor cells to ascending neurons (Fig. 2d, χ² = 24.920; p < 0.001, Kruskal–Wallis test). There was no significant difference between the slopes of receptor cells and local interneurons (Z = −0.541; p = 0.589; Mann–Whitney post hoc comparison, two-way) but local neurons exhibited significantly higher slopes than ascending neurons (Z = −4.969; p < 0.001), and the difference between receptor cells and ascending neurons was significant as well (Z = −2.567, p < 0.001). Again, there were no significant differences between the two species investigated (see Online Resource 4B: REC: p = 0.513; LN: p = 0.187, AN: p = 0.217, Mann–Whitney U test). For the metric analysis, we decided to use a uniform τ = 5 ms, since we were interested to compare the coding capacities of different computation stages in the presence of noise. A control analysis with τ = 3 and 10 ms again yielded similar results (Online Resource 5), confirming that a temporal resolution of τ = 5 ms was a reasonable compromise. These data indicate that the trial-to-trial variability increases at higher processing stages, whereas the relative influence of external signal degradation decreases.

Two factors contribute to the distance between two spike trains, each with different consequences for stimulus coding. The timing of spikes between the spike trains can shift; on the other hand, spikes in one train are not matched by spikes in the other—some spikes are ‘missing’. Both sources of variation lead to non-zero distances between spike trains. If the metric distances are computed at a high temporal resolution (τ = 5 ms), spike time jitter and differences in spike count contribute to the distance. As a next step, we focused on the potential effects of signal degradation on spike count. For this analysis, the original song was segmented into syllables and pauses (see Fig. 1a) and the mean spike rate for each segment was computed. Figure 3 shows how the strongest degradation (9 dB) changed the spike rate within syllables and pauses, relative to the original stimulus. For receptor cells and local neurons, the spike rate during the syllables remained unchanged (REC: t = −0.242, p = 0.813; LN: t = 0.391, p = 0.698, one sample t test; the Kruskal–Wallis test showed no significant deviation from a normal distribution), whereas the spike count increased significantly within the pauses (REC: t = 8.224, p < 0.001, LN: t = 7.699, p < 0.001; one sample t test). For ascending neurons, we observed a small but significant decrease of spike count within the syllables (t = −2.489, p < 0.05) and additional spikes during the pauses (t = 4.057, p < 0.001).

Fig. 3
figure 3

Influence of signal degradation on spike count during syllables and pauses. The box plots show the differences in firing rate (between responses to the call degraded with 9 dB of noise and the response to the original song) within the syllables and within the pauses. Positive changes in the mean firing rate reflect additional spikes, whereas negative changes reflect the suppression of spikes with the degraded signal

Behavior

Figure 4a shows the phonotaxis responses of seven males, representative for a total of 59 individuals. The uncorrupted song (labeled ‘orig’) reliably provoked a behavioral reaction, inducing phonotaxis every single time. Hence, if a male ceased to respond at a certain signal degradation level, we can safely conclude that the animal failed to recognize the corrupted female signal. For each male we determined a ‘critical degradation level’ as the noise at which the response fell to 50% (see arrow in Fig. 4a). The behavioral critical degradation levels (bCDLs) of all males are shown in Fig. 4b: the majority of males (57.6%) ceased to respond between −6 and 0 dB and the median critical degradation level was −1.6 dB. Using a less conservative threshold criterion of 25% response resulted in a shift of the median by 2 dB to higher degradation levels (data not shown).

Fig. 4
figure 4

Robustness of signal recognition measured in behavioral tests with degraded female songs. a Phonotaxis responses of seven male grasshoppers. Abscissa: degradation levels in dB; ‘n’ indicates pure noise. Ordinate: the percentage of phonotactic turning responses; 100% means that the male reacted to every stimulus presentation. Behavioral critical degradation levels (bCDL) were determined at the intersection of each curve with the 50% response level (see arrow). b bCDL of 59 animals

Comparing neuronal with behavioral performance

Neurons differed in their robustness to noise, typically maintaining the key features of the spike train patterns up to a certain degradation level. Even in response to the original song, quite some trial-to-trial variability was observed, which is reflected in the set of ‘intrinsic’ distances between the spike trains. As the level of signal degradation increases, a point will be reached at which the corresponding spike train distances will differ significantly from these ‘intrinsic’ distances. At this point, the overlap between the distance distributions, taken from the outer right column of the distance matrix (see blue frame in Fig. 5a), falls below a fixed fraction. This fraction, typically 5%, defines threshold discriminability. Figure 5b shows the percentage of cells whose spike train patterns were distinguishable with p = 0.05 from the spike trains in response to the original stimulus, plotted as a function of degradation level. The number of receptor cells whose spike trains surpass threshold discriminability increased steeply between −9 and −3 dB and then leveled off. The local neurons showed the strongest increase up to 0 dB. The behavioral data (bCDL 50%) also exhibits a steep increase of cumulative bCDL between −9 and 0 dB. In contrast, fewer than 35% of the ascending neurons exhibited a significant difference even at the highest degradation level (Fig. 5b, see also Fig. 2a). Figure 5a shows the distance matrix of a local interneuron (SN3), which did not achieve the strict criterion of 95% discrimination performance. Nevertheless, the distance matrix of this cell shows a distinct block-like structure, such that the distances between spike trains in response to −3 to 9 dB degradation group together. To take advantage of this distinct response property of many neurons, we applied a second, more sensitive classification procedure to determine critical classification boundaries, going beyond solely using the outer right-hand column of the distance matrix. The unsupervised information-clustering algorithm (Slonim et al. 2005) uses the data of the whole matrix (see stippled frame in Fig. 5a) to cluster the spike train distances into two distinct classes. Figure 5c plots the mean probability, as determined by the clustering algorithm, that the spike trains of this SN3 neuron belong to the class of the uncorrupted stimulus (closed circles) or the ‘noise’ class (open circles). For these recordings, the mean probability of a spike train belonging to the first class dropped to 50% between −6 and −3 dB. Spike trains in response to the original song and the first and second degradation level were assigned to the ‘orig’ class, spike trains in response to higher degradation levels were assigned to the ‘noise’ class. For the local interneuron (SN3) in Fig. 5a and c, the intrinsic and extrinsic distance distributions overlapped considerably, even at the highest noise level, and simply comparing the distances failed to reach the 95% criterion in this case. Only the information-clustering algorithm managed to classify the spike trains reliably. Other examples that exhibited a simple boundary between the two classes are shown in Online Resource 6. But there were also cells that exhibited ambiguous classification boundaries. Examples of such cells are shown in Online Resource 7. We performed a conservative analysis, for which stringent requirements were applied to eliminate cells with ambiguous boundaries (see “Materials and methods”). Figure 5d shows the percentage of neurons whose responses were assigned to the ‘noise’ class as a function of the degradation level. Between −6 and 3 dB there was a steep increase in the number of receptor cells; the same held true for ascending neurons. For local neurons, this steep increase was shifted to 3 dB higher degradation levels.

Fig. 5
figure 5

Two approaches to determine a neuronal critical degradation level (nCDL). a Distance matrix of the spike trains of a local neuron, SN3. Metric distances are color coded from blue to red (zero to maximum distance). b Cumulative nCDLs as revealed by the discrimination analysis. The colored curves (see inset) indicate the percentage of cells (ordinate) for which the distances x0 and yn (abscissa) could be discriminated with an accuracy rate of at least 95% (closed circles, solid lines) or 97% (open circles, dashed lines). For receptor cells there was no difference between 95 and 97%. Different processing stages (receptors, local and ascending neurons) are shown separately. Behavioral CDLs are drawn in gray (bCDL data from Fig. 4). c Result of the information-based clustering algorithm for the spike trains of SN3 recording. Along the ordinate, the graph displays the mean probability that the responses to a distinct degradation level (abscissa) belong to the ‘orig’ class (containing the spike trains in response to the original song) or to the ‘noise’ class (containing the spike trains in response to the remaining degradation levels). The arrow indicates the neuronal critical degradation level (nCDL). d Frequency of assignments to the ‘noise’ class (ordinate) in dependence of different degradation levels (abscissa). Different levels of computation are drawn separately. Sampling size for b and c: REC = 7, LN = 22, AN = 19

Animals may ignore many temporal details in sensory spike trains that nonetheless contribute to the metric spike train dissimilarity. The metric analysis does not select stimulus features that could be relevant to the animal and that could potentially be more robustly represented in the spike train, even at high degradation levels. To account for the possibility that a neuronal recognition network in the grasshopper’s brain may, in fact, discard some spikes when it attempts to match an incoming spike train to an internal template, we restricted the analysis to specific domains of the stimulus. Important cues for signal recognition are the pauses, syllable onsets, and syllables (Balakrishnan et al. 2001). Thus, we isolated the corresponding response components in time and repeated the analysis. This manipulation had a strong impact on the spike train distance matrices (shown in Online Resource 6). Isolating the pauses in particular revealed a stronger separation between the degradation levels in the distance matrix, with a clear demarcation line between −3 and 0 dB. Isolating the onsets yielded a border between 3 and 6 dB). According to this analysis, syllable and, most notably, syllable-onset responses were more robust to noise, on average, than responses to pauses, at least for receptors and local neurons (Fig. 6). For the pause response of the receptor cells there is a steep increase of nCDLs starting already at the lowest degradation level whereas the onset response was shifted by at least 3 dB to higher degradation levels (Fig. 6a, d). For the local neurons, the difference between pause and onset responses was less pronounced (Fig. 6b, e). These results suggest that masking the syllable-pause structure might have been a critical factor for song recognition in noise. The discrimination performance based on response components of ascending neurons, however, did not differ from the total response (Fig. 6c). Although the masking of pauses had a strong effect on spike train distances of receptor cells and local neurons, for ascending neurons the degradation of the syllable pauses yielded a minor contribution to spike train dissimilarities (Fig. 6f, compare with Fig. 3).

Fig. 6
figure 6

Neuronal critical degradation levels for isolated response components (i.e., total response, syllable response, pause response and onset response—see inset). ac Discrimination analysis. The colored curves (see inset) indicate the percentage of cells (ordinate) for which the distances could be discriminated with an accuracy rate of at least 95%. a Receptor cells, b local and c ascending neurons are shown separately. To ease comparison, the cumulative percentages of bCDLs are indicated in gray. df Cluster analysis. Frequency of assignments to the ‘noise’ class (ordinate) according to the information-based clustering algorithm in dependence of different degradation levels (abscissa). d Receptor cells, e local and f ascending neurons. Sampling size for b and c: REC = 7, LN = 22, AN = 19

Stimulus reconstruction from noisy spike trains

As the spike train pattern becomes corrupted by noise, it should become more difficult to reconstruct the song from the spike trains. This was indeed the case. Figure 7a illustrates the coding fraction for four different cell types as a function of the noise level, where we compare the reconstruction of each degraded envelope with the attempt to reconstruct the original signal by itself. Noise changes the envelope significantly (Fig. 1a), progressively masking the original signal’s periodic structure and leading to lower coding fractions. The repetition of rhythmic elements in the stimulus facilitates reconstruction, which is why reconstructing the original stimulus in the presence of noise is as successful as reconstructing the true, noisy envelope, irrespective of the degradation level (compare stippled vs. continuous lines in Fig. 7a). The optimal linear reconstruction filters for the original and the corrupted envelope signal, while highly different across cell types, change little as the noise degradation level increases, as shown in Fig. 7b. Receptors and TN1 cells respond to each sound pulse in a syllable, which leads to a single, large positive peak in the filter before the spike occurred (dashed line). The AN12 responds to syllable onsets, so that the filter’s shape mirrors a single syllable. Correlating coding fractions with spike train distances in Fig. 7c reveals the impact of intrinsic variability on the coding fraction of the uncorrupted stimulus. At all processing levels reconstruction quality correlates negatively with intrinsic distance (REC: r = −0.575; n.s. (L.m.); LN: r = −0.873; p < 0.001 (C.b.), r = −0.821; p < 0.001 (L.m.); AN: r = −0.811; p < 0.001 (C.b.), r = −0.469; p > 0.05 (L.m.); Pearson’s correlation). Receptor cells exhibited significantly higher coding fractions (median = 0.78) than local interneurons (median = 0.66) and ascending neurons (median = 0.31) (REC–LN n.s., LN–AN: p < 0.001). These results demonstrate that the high intrinsic variability of many ascending neurons has a dramatic effect on the amount of information that can be linearly decoded from the spike trains.

Fig. 7
figure 7

a Linear reconstruction of the uncorrupted song envelope (solid lines) or the noisy stimulus envelope (dashed lines). Reconstruction performance was quantified as the fraction of envelope variance explained by the reconstruction (coding fraction). Four cell types are illustrated with different colors and symbols (see inset). b The optimal reconstruction filters depended on the neuron. Red lines indicate the filter for the original song envelope, while the green lines represent the filter for reconstructing the noisy envelope for a degradation level of 6 dB (shown in green). c The intrinsic variability (abscissa) was inversely correlated to the coding fraction (ordinate). Linear regression lines for the different computation levels and the two species are shown in different colors

Discussion

In this study we quantified the influence of spike train variability on signal recognition in a small neuronal network. In order to compare neuronal with behavioral performance we used a behaviorally relevant stimulus, the song of a C. biguttulus female. The crucial cues for recognizing grasshopper calls lie in the envelope of the auditory signal, so we specifically degraded the envelope of this stimulus. Using a spike train metric we quantified the trial-to-trial variability of the neuronal responses and compared this ‘intrinsic’ variability to the changes of spike trains caused by external signal degradation.

Impact of trial-to-trial variability and external signal degradation

The metric analysis yielded two surprising results: (1) a rather high trial-to-trial variability, even in the absence of external signal degradation (Fig. 2c), and (2) a small contribution of external signal degradation to the overall dissimilarity between spike trains (Fig. 2b).

The trial-to-trial variability increased at consecutive processing stages (Fig. 2c), reaching its highest level in ascending neurons—which is difficult to reconcile with their role as the information bottleneck towards the brain and the reliability of behavioral responses in the absence of external noise (see Fig. 4a). This increase persisted even after restricting the analysis to the spikes produced at syllable onsets. These results indicate that the addition of noise during synaptic transmission and variations in other cell intrinsic properties (Croner et al. 1993; Faisal et al. 2008) outweigh any noise reduction possibly achieved by averaging over a set of presynaptic neurons.

Experiments with artificial song models also reported a rise in spike train variability at each stage of processing (Vogel et al. 2005; Wohlgemuth and Ronacher 2007). This led to speculations that spike timing decreases in relevance between local and ascending interneurons, as has been suggested for some vertebrate sensory systems (Kara et al. 2000; Reich et al. 1997). This view may be oversimplified, however, as the set of ascending neurons in the grasshopper is heterogeneous (Stumpner and Ronacher 1991, 1994). Indeed, our results show that at least some ascending neurons show high discrimination performances, and that their spike train patterns allow the stimulus to be reconstructed with remarkable temporal precision, even though the reconstruction is based on many fewer spikes than for receptors or TN1 (see AN12 in Fig. 7a).

Unexpectedly, external signal degradation had a very small effect on spike train distances. For more than 90% of the local neurons and of the receptors, the spike train distances observed at the highest degradation level increased by less than twofold over those distances observed due to trial-to-trial variability. The distance ratio was even smaller at the level of ascending neurons (open columns in Fig. 2b). The small slopes of the distance curves (Fig. 2a, d) seem to suggest that the responses of ascending neurons are invariant to external noise. On the other hand, any such putative invariance must be reconciled with the high trial-to-trial variability of ascending neurons, which will occlude fine temporal features potentially present in an ascending neuron’s spike train pattern. A factor contributing to the ascending neurons’ relative insensitivity to external noise lies in their more specific response to distinct features of communication signals, such as onsets or gaps (Ronacher and Stumpner 1988). Some of these features may have been less affected by our degradation procedure. Indeed, most ascending neurons reflect only the coarse syllable-pause structure in their spike trains without resolving the fine envelope structure within syllables. We therefore performed an additional analysis in which we evaluated spike count differences separately for syllables and pauses (Fig. 3). According to this analysis, the spike count within syllables was hardly affected at all, while high degradation levels increased the spike count during the pauses, thus destroying the species-specific syllable-pause pattern. This loss of the syllable-pause pattern is probably a crucial factor for an animal when it decides to accept or reject a corrupted signal (von Helversen and von Helversen 1997; Ronacher and Krahe 1998). The pauses, however, were much more strongly affected by signal degradation in receptors and local neurons than in ascending neurons, as evidenced by both the spike counts in the syllable and pause segments and stimulus reconstruction from the spike trains. Since the grasshopper brain has to interpret the information distributed among ascending neurons (Krahe et al. 2002; Stumpner and Ronacher 1991) it remains to be seen whether the postsynaptic cells located in the brain maintain this invariance to external noise.

Comparison of neurometric and behavioral performance

A crucial question for signal recognition in noise is: how much can the spike patterns change, yet still represent the same object? Our experimental design allowed us to tackle this question by comparing the neuronal performance with the behavior of the animals towards the progressively degraded signals. We applied two types of data evaluation to measure the neuronal performance. To enable recognition, one can imagine that an incoming communication signal within the brain is compared—based on the spike responses of a set of ascending neurons—to a stored innate template. We postulate that the trial-to-trial variability of spike trains of auditory neurons defines a broad ‘zone of acceptance’ around the innate template: given the high importance of mate identification, the animals should not discard spike trains that result from the correct signal but deviate from the template only because of the intrinsic noisiness of spike trains. Motivated by this hypothesis, in our first analysis we determined, separately for each neuron, the degradation level at which the spike train distances became statistically distinguishable from the ‘intrinsic’ distances produced by the uncorrupted stimulus on different presentations of the same. For the majority of recorded receptor neurons, the difference between intrinsic and extrinsic distances became significant around −6 dB (57%; see Fig. 5b). Beyond this noise level, grasshoppers stopped performing phonotaxis (see Fig. 4). At −6 dB degradation, the probability of incorrect rejection of the original song became <5% based on a typical receptor spike train; summing over the set of receptor neurons could make this error rate even smaller. Local neurons do not just relay the spike trains from the receptors, but process the incoming information. Discrimination became reliable at between −3 and 0 dB in more than 50% of local neurons (Fig. 5b). However, ascending neurons behaved differently. For only about one in four ascending neurons could one reliably discriminate the noisiest stimulus from the original stimulus based on the metric comparison to a template spike train, and at 0 dB only ~10% of ascending neurons allowed for a significant discrimination from the original spike train (Fig. 5b).

If we consider again the hypothesis of the “zone of acceptance”, our results imply that:

  1. 1.

    a zone of acceptance based on the trial-to-trial variability of receptor cells and local interneurons could explain the behavioral data relatively well (see Fig. 5b).

  2. 2.

    In contrast, a zone of acceptance based on the trial-to-trial variability of the ascending neurons cannot explain the behavioral data. Only the spike trains from a minority of ascending neurons allow one to discriminate between extrinsic and trial-to-trial variability. Taking an average across ascending neurons would predict a critical degradation level for the behavior that is higher than what is actually observed.

There are two possible reasons for these contradictory results:

  1. 1.

    The “zone of acceptance” is determined by only a few ascending neurons which exhibit relatively low trial-to-trial variability. However, considering the high variability between specimens of different cell types makes this explanation rather unlikely.

  2. 2.

    At the level of the ascending neurons, the intrinsic variability is no longer crucial for signal recognition because upon this computation level the system relies on other representational codes, such as a rate code or a sparse code for certain features, for example.

According to this possibility, we applied another, more sensitive analysis to define neuronal critical degradation levels, based on an information-based clustering of the whole distance matrix (Fig. 5c, d). This approach does not account for the trial-to-trial variability of a neuron, but rather relies on distance clusters throughout the distance matrices for the determination of a neurometric critical degradation level.

In this analysis, most receptor cells exhibited their nCDLs between −6 and 3 dB, while for local interneurons this range was shifted by more than 3 dB to higher degradation levels. A clear nCDL could be assigned to 68% of the ascending neurons. For these ascending neurons that did exhibit an unambiguous nCDL, the decision boundaries lay between −9 and 0 dB, approaching the distribution of behavioral CDL’s. From the perspective of data analysis, the more complex, but also more sensitive, clustering approach proved to be better for discrimination purposes at the level of ascending neurons. However, even with this method, about 30% of the ascending neurons did not yield an unequivocal nCDL. This does not necessarily mean that these cells do not contribute to signal processing. In the visual cortex (V1) of marmosets, deCharms and Merzenich (1996) found cells which showed, at the single cell level, almost no stimulus specific responses. Nonetheless, the relative timing of action potentials between different cells revealed the occurrence of the stimulus. Such a population code could also exist between different ascending neurons, particularly as these cells are more strongly correlated in their firing patterns than interneurons or receptor neurons (Vogel and Ronacher 2007).

In vertebrates, the debate continues as to whether behavior can be explained by those single neurons that are optimally tuned to the task or to the stimulus—or whether behavior is encoded in an average across a neuronal population (Parker and Newsome 1998). Stüttgen and Schwarz (2008), for instance, argue that stimuli corrupted by noise cannot be decoded from single neurons, but that spike time coincidences across small numbers of neurons may be sufficient. Many psychophysical studies in humans and in animals put forward the ‘lower envelope principle’, which posits that the performance of the best neurons determines behavior. Wang et al. (2007) investigated the song discrimination performance of zebra finches and auditory neurons in Field L, also using a spike train metric and a supervised cluster algorithm (van Rossum 2001; Machens et al. 2003). They found that the performance of the “best” field L neurons matched the behavioral performance. Similar results were obtained by Narayan et al. (2007), who investigated the song discrimination performance of zebra finches in the presence of noise. Avoiding the need to estimate the probability of correct discrimination and substituting an innate for a learned behavior, we ask a related question in the grasshopper C. biguttulus. The correspondence between behavioral and neuronal critical degradation levels observed in Fig. 5d suggests that it may not be the most noise tolerant neuron or the entire set of all neurons, but an interplay of various ascending neurons that sets the behavioral response. This deviation from the lower envelope principle might be a characteristic for neuronal networks of restricted size.