Introduction

Many natural sounds are harmonic in nature. When an object, or the air column within it, vibrates, the spectrum of the sound it makes is characterized by a series of partials which are at, or near, integer multiples of a single frequency, called the fundamental (F0; Kinsler et al. 2000). Such harmonic series normally have a very strong perceptual pitch corresponding to the F0. These pitch cues are used by animals and humans for grouping and segregation of sounds (Bregman 1990; Darwin 1997; Darwin and Carlyon 1995). Although harmonic stimuli are the most ubiquitous pitch-generating natural sounds, other stimuli such as amplitude modulated tones and iterated ripple noise also generate weaker pitch percepts.

In this paper, we have two objects. The first is to determine the extent to which harmonic stimuli are represented in the temporal response and to what extent in the rate response of midbrain auditory neurons. The second is to determine the extent to which binaural signals are integrated before pitch cues are extracted.

We consider first the changes in coding of pitch stimuli between the auditory nerve and the inferior colliculus (IC). In general, below the IC, pitch information is encoded in the temporal patterns of responses but not in the mean discharge rate. The responses across the auditory nerve to pitch-generating stimuli are found to preserve the temporal features required for human pitch perception (in dial-anesthetized cats Cariani and Delgutte 1996a,b). Individual auditory nerve fibers phase lock to components of the stimulus near their characteristic frequency (CF; Horst et al. 1986, 1990; Sinex et al. 2003); however, the post stimulus time histograms contain a component at the F0 of the stimulus because of amplitude modulation of the near-CF responses (Horst et al. 1986). Similar responses are found in the cochlear nucleus (e.g., Palmer and Winter 1992; Sinex 2008), with primary-like units being most like auditory nerve and chopper neurons responding less to stimulus components and more to the envelope (Sinex 2008). In the IC, Palmer et al. (1990) found an ongoing response which was phase locked to the fundamental frequency of synthetic vowels, and phase locking to amplitude modulation has been recorded up to 600 Hz (Batra et al. 1989; Heil et al. 1995; Krishna and Semple 2000; Langner and Schreiner 1988; Nelson and Carney 2007; Rees and Moller 1987; Rees and Palmer 1989). The phase locking of IC units to pure tones appears to be limited to 600 Hz in cat (Kuwada et al. 1984) and 1 kHz in guinea pig (Liu et al. 2006).

Up to this point, we have been considering temporal responses. However, in the IC, rate tuning to modulation frequency emerges (e.g., Krishna and Semple 2000; Langner and Schreiner 1988; Rees and Moller 1983; 1987; Rees and Palmer 1989; Schreiner and Langner 1988). It has even been suggested that a modulation map exists in the IC (Langner et al. 2002; Langner and Schreiner 1988) which would allow a place map of pitch (or at least periodicity).

We consider now the evidence for binaural integration of stimuli before pitch processing. Humans have the ability to integrate harmonics which alternate between the ears into a single percept with a pitch corresponding to the fundamental of the entire complex if the harmonics are peripherally resolved (Bernstein and Oxenham 2003; Houtsma and Goldstein 1972). Additionally, it is possible to binaurally extract a pitch from dichotic stimuli, which have no pitch when the signal to either ear is presented alone (e.g., Akeroyd and Summerfield 2000a,b). These data suggest that pitch percepts might be generated after integration of the information from both ears (c.f. Bilsen et al. 1998; Zurek 1979). However, if the harmonics are peripherally unresolved, then the pitch corresponds to the repetition period of the monaural envelope (Bernstein and Oxenham 2003; Carlyon et al. 2001) which suggests the pitch may be determined before binaural integration.

Bernstein and Oxenham (2003) used dichotic harmonic stimuli which would be expected to have different periodicities and hence pitches if they were integrated binaurally before or after extraction of the fundamental. The dichotic stimuli had even harmonics played to one ear and odd harmonics to the other. By definition, in the region of resolved harmonics, there will be little peripheral interaction between each harmonic, so in order to determine a pitch, some form of across-channel combination of information is required (Goldstein 1973; Meddis and Hewitt 1991a,b; Terhardt 1974; Terhardt et al. 1982). If this happened monaurally, then a doubling of pitch might be expected since the components are spaced at 2F0. However, if the combination is based upon a binaural combination of the harmonics from each ear, then a pitch equal to the fundamental would be expected since the component spacing is F0. Psychophysically, it was found that the pitch equaled F0, thus suggesting that harmonics were combined binaurally before pitch was computed (Bernstein and Oxenham 2003)

In the region of unresolved harmonics, there will be significant interaction between components, and the repetition period of the envelope will provide a pitch cue (Assmann and Summerfield 1990; Licklider 1951; Meddis and Hewitt 1991a,b; Schouten 1940, 1970). Since the spacing of components at each ear is 2F0, then the period of the envelope will also be 2F0, so we would expect a pitch of 2F0 to be perceived whether or not binaural interaction occurred, as was found psychophysically (Bernstein and Oxenham 2003).

In this experiment, we compared the representation of diotic stimuli, comprising all harmonics played to both ears; dichotic stimuli, comprising even harmonics played to one ear and odd to the other; and alternating phase stimuli, where all harmonics were played to both ears, but the starting phase was alternated between sine and cosine. The perceived pitch of this last stimulus doubles as its constituent harmonics become unresolved (Shackleton and Carlyon 1994), so it provides a useful monaural comparison for any dichotic effect.

We studied processing in the IC because it is a site of convergence of lower pathways and thus a site at which integration of monaural and binaural pitch cues should be measurable. We found that peripherally unresolved harmonics tended to give a periodicity characteristic of the processing of the envelope of the waveform at a single ear consistent with the psychophysics of unresolved harmonics (Bernstein and Oxenham 2003; Carlyon et al. 2001). However, we cannot make any definitive statement about the processing of resolved harmonics.

Methods and stimuli

We recorded from the central nucleus of the right IC of six pigmented guinea pigs weighing 440 to 770 g. All experiments were carried out in accordance with the UK Animal (Scientific Procedures) Act of 1986. Animals were anesthetized with urethane (1.3 g/kg i.p., in 20% solution in 0.9% saline) and Hypnorm (Janssen: 0.2 ml i.m., comprising fentanyl citrate 0.315 mg/ml and fluanisone 10 mg/ml). To reduce bronchial secretions, atropine sulfate (0.06 mg/kg s.c.) was administered at the start of the experiment. Anesthesia was supplemented with further doses of Hypnorm (0.2 ml i.m.), on indication by pedal withdrawal reflex. A tracheotomy was performed, and core temperature was maintained at 38°C via a heating blanket and rectal probe. Heart rate was monitored using a pair of electrodes in the skin on either side of the animal’s thorax. Animals were artificially ventilated with pure oxygen to keep the end-tidal partial pressure of CO2 between 24 and 36 mmHg. The animals were placed inside a sound-attenuating room in a stereotaxic frame in which hollow plastic speculae replaced the ear bars, allowing sound presentation and direct visualization of the tympanic membrane. A craniotomy was performed over the position of the IC. The dura was reflected and the surface of the brain covered by a solution of 1.5% agar in 0.9% saline. Recordings were made using a linear array of eight glass-insulated tungsten electrodes (Bullock et al. 1988), nominally spaced at 200 μm, advanced through the intact cerebral cortex by a piezoelectric motor (Burleigh Inchworm IW-700/710, Scientifica, Uckfield, UK). Extracellular action potentials were amplified and filtered between 300 Hz and 3 kHz (RA16AC, RA16PA, 4xRA16BA, Tucker-Davis Technologies, Alachua, FL, USA). Responses were collected using Brainware (v7.43, Jan Schnupp, Oxford University). The location of recordings in the central nucleus of IC (ICC) was indicated by the combination of stereotaxic coordinates, the physiological profiles of recordings on the approach to ICC, and the physiological profile of the recordings within ICC (well tuned, short latency responses with a monotonically increasing CF as depth increased). In a number of experiments, location was also confirmed by the recovery of electrolytic lesions.

Stimuli were delivered to each ear through sealed acoustic systems comprising custom-modified Radioshack 40-1377 tweeters joined via a conical section to a damped 2.5-mm diameter, 34-mm long tube (M. Ravicz, Eaton Peabody Laboratory, Boston, MA), which fitted into the hollow speculum. The output was calibrated a few millimeters from the tympanic membrane using a Brüel and Kjær 4134 microphone fitted with a calibrated 1-mm probe tube. The maximum output was within 2 dB of 120 dB sound pressure level (SPL) up to 1 kHz, and then varied smoothly between 100 and 120 dB up to 50 kHz. Stimuli were not corrected for the variation in calibration across frequency.

Stimuli were digitally synthesized (RP2.1, Tucker-Davis Technologies) at 50 kHz sampling rate and output through 24-bit sigma-delta digital-to-analog converters. Stimuli were of 100-ms duration, switched on and off simultaneously in the two ears with cosine-squared gates with 2 ms rise/fall times (10% to 90%). A response area was first obtained using tonal stimuli (0 to 100 dB in 5 dB steps, 200 Hz to 20 kHz in 0.1 octave steps) followed by a rate vs. level function (0 to 100 dB in 5 dB steps) for harmonic complexes with a 100 Hz F0, presented to the left, right, and both ears. The characteristic frequency and threshold were estimated from the response area as the frequency which elicited a response at the lowest level.

Once these preliminary data had been obtained, the main experiment was run in a single block comprising 100 repeats of all seven fundamentals and six condition combinations in random order. Stimuli consisted of harmonic series containing all of the harmonics up to 10 kHz with F0s from 50 to 400 Hz in half-octave steps. Harmonics were summed in sine phase, unless otherwise stated. The level of individual harmonics was 50 dB SPL, yielding a total level of between 61 and 73 dB SPL depending upon condition. Six conditions were presented (Fig. 1): (1) contralateral, all harmonics in the left ear; (2) ipsilateral, all harmonics in the right ear; (3) diotic, all harmonics in both ears; (4) dichotic 1, even harmonics in the left ear and odd harmonics in the right; (5) dichotic 2, odd harmonics in the left ear and even harmonics in the right; and (6) alternating phase, harmonics alternating between sine (even harmonics) and cosine (odd) phase presented to both ears. In general, however, there was little response to the ipsilateral stimulus alone, and the responses to the dichotic 2 stimuli were very similar to those to the dichotic 1 stimuli, so these results will not be shown in this paper.

Fig. 1
figure 1

Waveforms (left column) and spectra (right column) of the stimuli in the six conditions used in the experiment. The waveform and spectrum presented to the contralateral (left) ear is shown in the top line of each panel and that presented to the ipsilateral (right) ear in the bottom line. All stimuli were in sine phase except for the alternating phase where the alternation between sine and cosine phase is represented in the spectrum by displacements above and below the axis.

Data were analyzed using peristimulus time histograms (PSTHs) calculated between 5 and 105 ms after stimulus onset with 0.2 ms bins and the Fourier transform of the PSTH (yielding a bin width of 10 Hz and 2,500 Hz Nyquist frequency). Rate information was obtained from the zero frequency component of the Fourier transform, which is equal to the spike count obtained conventionally when suitably scaled. Vector strength (Goldberg and Brown 1969) was also obtained from the Fourier transform of the PSTH by dividing the amplitude at each Fourier frequency by the amplitude at zero frequency. In this paper, we report only vector strengths which were statistically significant (p < 0.001; Rayleigh test of uniformity; Buunen and Rhode 1978; Mardia 1972).

The autocorrelation function (ACF) of the spike trains or all-order interval histogram (e.g., Cariani and Delgutte 1996a,b) was calculated for spikes between 5 and 105 ms after stimulus onset with 0.2 ms bin widths. A shuffled autocorrelation function (SACF; Joris 2003; Joris et al. 2006; Louage et al. 2004) was computed over the same interval but with 0.04 ms bin widths. The SACF was normalized as suggested by Louage et al. (2004), so that a value of 1 would be expected in the absence of temporal structure. The SACF generally showed the same features as the ACF but was considerably smoother despite the narrower binwidths because of the larger number of intervals used in its construction. Because of this similarity, only the SACF will be shown in this paper.

Results

Complete data sets were obtained from 85 multi-unit clusters. Clusters ranged in CF from 0.2 to 14.7 kHz; 29 had CFs below 1 kHz. It was remarkable how homogeneous the responses of all clusters were to these stimuli. The most salient details of the responses of these clusters will be described briefly below for a single cluster, which can be taken as a description of all clusters’ responses. Examples of the responses to the diotic stimulus for a cluster with a CF of 1.6 kHz are shown in Figure 2A–D. There was a sustained response at low fundamentals but more adaptation for higher fundamentals. At the higher fundamentals, the population exhibited either a low sustained response, like Figure 2D, or responded only at the onset. During the sustained response, the cluster phase locked at F0. This is visible in the PSTHs of Figure 2 and is shown by the clear peaks at F0 in the Fourier spectra (arrows in Fig. 2E–H) and 1/F0 in the SACF (thick arrows in Fig. 2I–L).

Fig. 2
figure 2

Response measures of a representative cluster with a CF of 1.6 kHz and spontaneous rate of six spikes per second. Peristimulus time histograms with a bin width of 0.2 ms are shown for the diotic stimulus for F0s of A 50 Hz, B 100 Hz, C 200 Hz, and D 400 Hz. EF Fourier spectra of the PSTHs in AD from 5 to 105 ms. The Nyquist frequency of the analysis was 2,500 Hz, but only components up to 1,000 Hz are shown. The magnitudes are normalized by dividing by the DC component, so the values are equivalent to vector strength measures. The dots show significant points (Rayleigh coefficient >13.8). The bold arrows show the peak corresponding to the frequency of the fundamental. IL Normalized shuffled autocorrelation functions (SACF: Joris 2003; Joris et al. 2006; Louage et al. 2004) computed from spike intervals between 5 and 105 ms after stimulus onset. The bin width was 0.04ms. The light arrows show the autocorrelation lag and corresponding frequency of subsidiary peaks.

Many clusters also showed evidence of a temporal response at frequencies higher than F0. There are a great many components at multiples of F0 in the Fourier spectra (Fig. 2E–H), although these should be interpreted with care. Because of nonlinearities in the generation of the PSTH, the Fourier transform of the PSTH will contain components which are not in the stimulus, so even if the cluster only responded to a single stimulus component, we would expect harmonics in the Fourier transform of the response.

The responses of the example cluster to stimuli with a fundamental of 50 Hz are shown in Figure 3. The responses to the stimulus played to the contralateral ear only are similar to those played diotically (compare PSTHs 3 A, B, spectra 3 E, F and SACFs 3 I, J). The dichotic and alternating phase stimuli, however, generate PSTHs with peaks at half the period of those to the diotic stimulus (compare Fig. 3C, D with Fig. 3B). This behavior is illustrated more clearly in the spectra (Fig. 3F–H), where the response at F0 (black arrow) drops relative to the nearly constant response at 2F0 (grey arrow) in the dichotic and alternating phase conditions (Fig. 3G, H). The SACFs (Fig. 3J–L) show the same effect but in a complementary form. The response to the diotic stimulus has many intervals corresponding to F0 (black arrow, Fig. 3J) but none corresponding to 2F0. However, the dichotic and alternating phase stimuli generate many intervals corresponding to 2F0 and an approximately equal number corresponding to F0. This behavior suggests that this cluster is responding primarily to the envelope of the stimulus at the contralateral ear.

Fig. 3
figure 3

As Figure 2 but showing responses to the contralateral, diotic, dichotic, and alternating phase stimuli at an F0 of 50 Hz. The black arrows show the peak corresponding to F0. The grey arrows show the peaks corresponding to 2F0.

Temporal properties within the population

The PSTHs in Figure 2 showed phase locking which was maintained up to 400 Hz. This was true for about 40% of clusters in the population as a whole, whereas in most the phase locking declined with increasing F0. The highest F0 at which there was significant phase locking is plotted as a function of CF in Figure 4. For contralateral and diotic stimuli (Fig. 4A), the majority of clusters phase locked to F0s at and above 282.8 Hz, and only one did not phase lock to any F0 (plotted at zero F0). Many clusters phase locked significantly to the highest F0 we used, i.e., were at ceiling. Although the upper limit of locking was at ceiling for all CFs, the lower limit increased with increasing CF. For dichotic and alternating phase stimuli (Fig. 4B), the highest F0 locked to was generally lower, as indicated by far fewer points at the ceiling of 400 Hz.

Fig. 4
figure 4

The highest F0 at which significant phase locking was measured for A contralateral and diotic stimuli; and B dichotic and alternating phase stimuli. The two curved lines represent limits of resolvability (see “Discussion”). To the upper-left, harmonics around the CF would be resolved, whereas to the lower-right, harmonics around the CF would be unresolved. Between the lines, the resolution would be ambiguous.

The extent to which the envelope of the stimulus is represented in the PSTH can be determined from the Fourier transform of the PSTH. The spectra in Figures 2 and 3 showed strong components at both F0 and 2F0 for the monaural and diotic stimuli. The strong component at 2F0 would be expected to inevitably accompany a response at F0 because the Fourier transform is a representation of the shape of the PSTH which is, to some extent, a half-wave rectified version of the band-pass filtered stimulus. However, for the dichotic and alternating phase stimuli, there was no response at F0 and a strong component at 2F0. This absence of a response at F0 cannot be explained in terms of distortions inherent in the formation of the spectrum of the PSTH and thus reflects a real effect. The vector strength of locking to these components is shown in Figure 5, with locking at F0 shown as filled circles and locking at 2F0 as open circles. There was a wide range of vector strengths, with the highest values, and greatest range occurring in response to the contralateral and diotically presented lower F0s. The average vector strength decreased as F0 increased. There was no noticeable trend for vector strength to change as a function of CF. There was a significant overlap in the spread of phase locking and no apparent difference in mean phase locking between F0 and 2F0 in the contralateral and diotic conditions. However, for the dichotic and alternating phase conditions, the strength of locking at F0 (filled circles) was generally very low, while that at 2F0 (open circles) was generally higher except at the highest F0s. This is consistent with phase locking to the envelope at a single, dominant ear only, rather than to a binaurally integrated stimulus.

Fig. 5
figure 5

Strength of phase locking (vector strength) at F0 (filled circles) and 2F0 (open circles) for four types of stimuli (rows) and F0 (columns). The vertical dashed lines mark the boundaries between resolved (R) and unresolved (U) harmonics around F0 (see Figure 4 and “Discussion”).

The Fourier transform of the PSTH shows the degree to which the envelope of the stimulus is represented in the envelope of the PSTH. However, a more direct measure of the temporal coding is provided by the autocorrelation of the spike trains. The SACFs shown in Figures 2 and 3 show peaks at 1/F0 for monaural and diotic stimuli and to both 1/F0 and 1/2F0 for dichotic and alternating phase stimuli. In an autocorrelation analysis of a stimulus with period 1/f, we expect intervals at all integer multiples of the period (n/f); so the peak at 1/F0 in dichotic and alternating phase conditions is the second order response to a period of 1/2F0 (i.e., 2/2F0). Thus, the Fourier and autocorrelation analyses are complimentary. If the response is predominantly at F0, then we expect components at both F0 and 2F0 in the Fourier analysis but only at 1/F0 in the SACF, whereas if the response is predominantly at 2F0, then we expect a component only at 2F0 in the Fourier analysis but at both 1/F0 and 1/2F0 in the SACF. The magnitudes of the peaks corresponding to F0 and 2F0 in the SACF are plotted in Figure 6. In Figure 6, a value of 1 corresponds to no temporal structure (Louage et al. 2004). For monaural and diotic stimuli in Figure 6, the magnitude of the peak corresponding to 2F0 (open circles) was generally very low, whereas for all F0s apart from 400 Hz, the peak corresponding to F0 (filled circles) was much higher. For dichotic and alternating phase stimuli, the magnitude of the two peak heights was more nearly equal. Taken together, all of these results are consistent with phase locking to the envelope at a single, dominant ear only rather than to a binaurally integrated stimulus.

Fig. 6
figure 6

Magnitude of SACF peak at 1/F0 (filled circles) and 1/2F0 (open circles) for four types of stimuli (rows) and F0 (columns). The vertical dashed lines mark the boundaries between resolved (R) and unresolved (U) harmonics around F0 (see Fig. 4 and “Discussion”).

Tuning of the discharge rate to F0

In the preceding section, we considered the temporal responses to harmonic stimuli and showed evidence for a doubling in the frequency of response for dichotic stimuli. However, as discussed in the introduction, within the IC, a rate representation of periodicity begins to emerge. In this section, we describe how the rate response of clusters changes with changing F0 and between diotic and dichotic stimuli. The rate responses of the example cluster described earlier are shown in Figure 7. The response to diotic stimuli had a peak at an F0 of 282.8 Hz (defined as the best F0). The responses to the dichotic and alternating phase stimuli also showed a maximum firing rate, but the best F0 was shifted down an octave, to 141.4 Hz. These results are consistent with the response being determined by the envelope of the stimulus at the contralateral ear. If the frequency of the stimulus envelope determines the rate response rather than the F0, then the maximum of the rate response will be shifted down an octave when plotted against F0, since the envelope frequency in the dichotic conditions is 2F0.

Fig. 7
figure 7

Firing rate computed from 5–105 ms after stimulus onset as a function of frequency for the diotic, dichotic and alternating phase stimuli. The dashed line shows the spontaneous rate. The cluster is the same as shown in Figures 2 and 3.

Figure 7 showed an example of tuning of the discharge rate to F0. Most clusters showed such clear tuning, with a best F0 between 70.7 and 282.8 Hz. The best F0 is plotted in Figure 8A for the contralateral and diotic stimuli and in Figure 8B for the dichotic and alternating phase stimuli. Clusters that had a best F0 at or below 50 Hz or at or above 400 Hz would not have been distinguishable from low-pass or high-pass clusters and are plotted at each end of the distribution. Some clusters responded best at both low and high F0s, with a dip between them; these band-reject clusters are plotted as BR in Figure 8. Some clusters showed negligible or nonsystematic modulation across F0; these are plotted as None. A number of clusters showed double peaks in the rate response systematically across conditions, with the higher frequency peak an octave above the lower. These clusters were plotted according to the lower frequency peak. For contralateral and diotic stimuli, the modal best F0 was at 141.4 Hz, whereas for dichotic and alternating phase stimuli, it was at 70.7 Hz. This finding demonstrates that, like the example in Figure 7, the best F0 for dichotic responses tended to be an octave lower than for diotic responses.

Fig. 8
figure 8

Histograms of the best F0; defined as the frequency of the peak in the discharge rate vs. F0 plot (e.g., Fig. 7). If the cluster was low-pass, then it is shown as having a best F0 <=50 Hz, if it was high pass then it is represented as >=400 Hz. Units which had maxima at both low and high frequencies are represented as BR (band-reject), whereas clusters which showed no F0 tuning are represented as None. A Contralateral and diotic stimuli. B Dichotic and alternating phase stimuli.

It is conceivable (although unlikely) that, for individual clusters, the rate tuning for different stimuli may have been different. To check that this was not the case, the dichotic best F0s are plotted against the diotic best F0s for each cluster individually in Figure 9A. The diagonal solid line represents equality, whereas the dashed line represents the dichotic F0 being half the diotic F0. The points fall mostly on the dashed line, with a scattering of a few points away from it. In other words, the population trend described above, with dichotic responses being tuned to half the F0 of diotic responses, also tends to hold for individual clusters. A similar plot for the alternating phase stimulus is shown in Figure 9B. While there are more points falling away from the octave relation line, the majority are still on or near it. In other words, the response to alternating phase stimuli also tends to peak an octave below that for diotic stimuli for each cluster individually as well as across the population. In both plots, the cluster of points on the equal F0 line at 50 and 400 Hz are potentially an artifact of the analysis since we did not use a dichotic or alternating stimulus at 25 Hz F0 or a diotic stimulus at 800 Hz which would have been necessary to test for an octave relationship. There are no striking differences in best F0 as a function of CF (Fig. 10), although there is a very weak tendency for tuning to higher best F0s to occur at higher CFs (compare density of points in Fig. 10A). Since these data are the same as those plotted in Figure 8 in a different form, it is no surprise that Figure 10B looks very like Figure 10A plotted an octave lower in best F0, i.e., that tuning to lower F0s occurs more often for the dichotic and alternating phase stimuli.

Fig. 9
figure 9

Comparison of the best F0 in the diotic condition with A the dichotic conditions and B the alternating phase condition. The solid diagonal line represents equality, whereas the dashed line represents the diotic best F0 being at half the F0 of the other. To prevent too many points being plotted on top of each other, the F0s are randomly jittered within a rectangular probability distribution of ±15 Hz. Note that points plotted at 50 Hz or 400 Hz represent clusters with best F0s below or above 50 Hz or 400 Hz, or low-pass or high-pass clusters, respectively.

Fig. 10
figure 10

The frequency of the best F0 as a function of characteristic frequency for A contralateral and diotic stimuli; and B dichotic and alternating phase stimuli. The two curved lines represent limits of resolvability (see Fig. 4 and “Discussion”).

Monaural balance of responsiveness

It has already been mentioned that ipsilateral responses were generally weaker than those to contralateral stimuli. The ratios of the firing rate in response to ipsilateral stimuli relative to contralateral are shown in Figure 11. The contralateral response completely dominates for CFs greater than 2 kHz, whereas the balance is less extreme but still favoring the contralateral side for lower CFs. The marginal histogram in Figure 11 confirms that most clusters are more strongly driven by contralateral than ipsilateral stimuli.

Fig. 11
figure 11

The ratio of the mean firing rate in response to harmonic ipsilateral and contralateral stimuli. Different F0s are marked by different symbols. To the right is a histogram of the ipsilateral/contralateral ratio averaged across all F0s and CFs

Discussion

We have measured the responses of clusters of neurons in the IC to harmonic series where all the harmonics are played to each ear or alternate harmonics played to alternate ears. In the first section of the discussion, we compare these results to earlier studies that predominately used amplitude modulated tones. In the second section, we contrast the rate and temporal representations of pitch cues in the IC. Finally, we compare the current results with the available psychophysics. We found no evidence for combination of binaural cues for pitch in the IC, but we did not sample enough conditions with resolved harmonics to determine whether there were differences in the processing of resolved and unresolved harmonics as seen in the psychophysics.

Comparison with earlier studies

Virtually all clusters phase locked diotically at F0, with some locking up to the highest F0 tested (400 Hz; Fig. 4). This is consistent with the ability of IC units to phase lock to pure tones up to 600 Hz in cat (Kuwada et al. 1984) and 1 kHz in guinea pig (Liu et al. 2006). It is also consistent with the frequency range of phase locking to the envelope of sinusoidally amplitude modulated (SAM) tones and noise (256–600 Hz; Batra et al. 1989; Heil et al. 1995; Krishna and Semple 2000; Langner and Schreiner 1988; Muller-Preuss et al. 1994; Nelson and Carney 2007; Rees and Moller 1987; Rees and Palmer 1989). We found that the best F0 in the rate tuning for contralateral and diotic presentation was 141 Hz. This is at the upper end of the range of best rate modulation frequency tuning found using SAM (30–160 Hz; Heil et al. 1995; Langner and Schreiner 1988; Nelson and Carney 2007; Rees and Moller 1987). The balance between low-pass and band-pass tuning changes as a function of level (Krishna and Semple 2000; Rees and Moller 1987; Rees and Palmer 1989) and age (Heil et al. 1995), so it is possible that a small disagreement between the modal best F0 for harmonic stimuli and SAM might be because of differences in effective level. Krishna and Semple (2000) argue that rate modulation transfer functions are shaped by inhibitory side bands, so the difference in number and amplitude of components between harmonic stimuli and amplitude modulation may account for the slight difference in modal best F0, although later modeling (Nelson and Carney 2007) downplays the role of inhibition in shaping rate modulation transfer functions.

Sinex et al. (2002, 2005) and Sinex and Li (2007) reported that IC units in chinchillas only responded strongly when one of the partials in a harmonic complex was mistuned or when two harmonic complexes with different F0s were presented. However, they did not report many responses to purely harmonic series, and they used a fixed F0 of 250 Hz, an F0 to which many of the clusters in our sample did not phase lock as well as to lower F0s. Additionally, they used long stimuli, up to 500 ms, and tended only to analyze the last 400 ms. Since our stimuli were 100 ms long, the reports analyzed the responses over different ranges of response duration.

Pitch cue representation in the IC

In this paper, we are primarily concerned with how cues to the pitch of harmonic series are represented in the IC and with evidence for their integration across ears. Historically, models of pitch perception polarized into those which concentrated on determining the pitch from the pattern of components which were resolved in the auditory periphery (so-called place models; e.g., Goldstein 1973; Terhardt 1974; Terhardt et al. 1982) and those which extracted the pitch from the temporal envelope of unresolved harmonics (so-called temporal models; e.g., Schouten 1940, 1970). More recently, models have been proposed which combine information about the timing of spikes across frequency from both resolved and unresolved harmonics (e.g., Assmann and Summerfield 1990; Licklider 1951; Meddis and Hewitt 1991a,b; Slaney and Lyon 1990). Since the fidelity of phase locking declines as the auditory system is ascended, it is likely that at some point in the auditory system such a temporal representation will be converted into a place representation, with the spatial pattern of neuron firing rates representing the pitch. It has been claimed that such a map exists within the IC (Langner et al. 2002; Langner and Schreiner 1988). We found clusters which showed tuning to F0, so they are candidates for such a map (Fig. 7); however, the maximum of the distribution was around 140 Hz for diotic stimuli (Fig. 8), so it is unlikely that they can be involved in the representation of all musical pitches (middle A is 440 Hz, with the highest pitches being about 5 kHz). Most of the analyses we have presented are of the temporal intervals present within individual frequency channels. These analyses have shown that F0s can be represented in the firing pattern of individual clusters up to 400 Hz. The representation may exist at higher F0s, but we have no data on this; even so, it is unlikely that the representation will exist any higher than the limits of pure tone phase-locking in the IC of 1,000 Hz (Liu et al. 2006 and previous section).

We have not analyzed the across-CF combination of activity by combining temporal activity across clusters (e.g., like in a summary autocorrelation: Meddis and Hewitt 1991a,b). However, IC neurons exhibit excitatory and inhibitory receptive fields which are more complex and wider than the auditory nerve (e.g., Egorova et al. 2001; Ehret and Schreiner 2004; Le Beau et al. 2001; Le Beau et al. 1996; Ramachandran et al. 1999) so there is scope for across-CF and across-ear interaction within individual IC clusters. The degree to which we found across-ear interactions is discussed in the following sections.

Evidence of monaural dominance

The most striking feature of our results is the comparison between diotic stimuli and dichotic stimuli. The frequency of the temporal response of clusters to dichotic stimuli was consistently twice that of the response to diotic stimuli, and the best F0 of each cluster’s rate tuning for dichotic stimuli was consistently half that for diotic stimuli. Both of these results are consistent with the cluster responding predominantly to the envelope of the stimulus at a single ear. The response may be the result of the input to the IC cluster being identical from each ear (see below for a discussion of this), or it may be that the IC cluster is only receiving input from a single ear. Figure 11 shows that the balance of excitation is from the contralateral ear, so it is perhaps not surprising that the responses show a response that is characteristic of a single-ear input. However, it is possible that there may be inhibitory, facilitory, or subthreshold inputs from the ipsilateral ear. The evidence for contralateral ear dominance would be more convincing if it was strongly correlated with the evidence for envelope processing. The ratio of phase locking to 2F0 relative to locking to F0 is plotted against the ratio of ipsilateral to contralateral firing rates in Figure 12. Since a weak F0 response (and hence a large 2F0/F0 ratio) indicates strong locking to the envelope at a single ear, we would expect a strong correlation with the ipsilateral/contralateral firing rate ratio if this effect could be explained purely in terms of monaural dominance. The correlation is not strong, so our results cannot be explained purely in terms of contralateral ear dominance.

Fig. 12
figure 12

The ratio of 2F0/F0 phase locking is plotted against the ratio of the ipsilateral/contralateral firing rates (c.f. Fig. 11) for different F0s. A Diotic stimuli; B dichotic stimuli.

Relationship to human psychophysics and resolution of harmonics

If components of a stimulus are unresolved, then several components will interact within an auditory nerve response area, and the output will be amplitude modulated. The frequency of this amplitude modulation will be equal to F0 for our diotic stimuli and will be equal to 2F0 for the dichotic and alternating phase stimuli. To a crude approximation, the auditory nerve outputs will resemble band-pass-filtered and half-wave-rectified versions of the waveforms shown in Figure 1. Importantly, the envelopes for even-harmonic and for odd-harmonic stimuli will be very similar. It is therefore unlikely that any processing could reconstruct a binaural response which did not reflect the periodicity of the auditory nerve outputs. In other words, for responses to unresolved harmonics, our results are largely predictable from filtering on the basilar membrane and auditory nerve firings. Whether there were binaural interactions or not would not affect this result. The question then becomes: to what extent are the responses reported here to resolved or unresolved harmonics?

There are many definitions of harmonic resolution (see Bernstein and Oxenham 2003; Shackleton and Carlyon 1994 for a discussion of a few). However, for the purposes of this paper, if the individual components of a harmonic stimulus interact significantly within the response area of an auditory nerve fiber so that the firing rate becomes amplitude modulated, then the stimulus is unresolved at the CF of that fiber. This definition is consistent with that presented by Shackleton and Carlyon (1994), who derived a rule for determining whether harmonics were resolved or not, specifically that stimuli are unresolved when more than 3.25 harmonics occur within the 10-dB bandwidth of an auditory filter (1.8 times the equivalent rectangular bandwidth, ERB) and resolved when two or fewer are within the filter, with a region of ambiguity between. Guinea pig auditory nerve bandwidths and their behavioral tuning curves (Evans 2001; Evans et al. 1992) are wider than those obtained psychophysically from humans, so there is a potential problem in comparing our data with human psychophysics. However, Evans (2001; Evans et al. 1992) derived a formula for the guinea pig ERB as a function of CF (ERB = 0.3 CF0.56), which we can use in the rule given above to estimate whether stimuli are resolved at the CF of the clusters we recorded from. These resolution limits are plotted on Figures 4, 5, 6, and 10, where it can be seen that our data should encompass responses to both resolved and unresolved harmonics. There is no obvious difference in the results between resolved and unresolved regions. This result would not be expected from the human psychophysics (Bernstein and Oxenham 2003), where a doubling of pitch for unresolved relative to resolved harmonics was found, and it was suggested this was due to binaural integration for resolved harmonics but not for unresolved harmonics. The fact that we find no difference between resolved and unresolved harmonics suggests our results may not be due to resolvability but may well be due to a lack of binaural integration for pitch at the level of the auditory midbrain. However, it is only at an F0 of 400 Hz where this comparison can be made over a significant number of clusters, and at this F0, the phase locking to the stimulus is lower anyway. Therefore, it is not clear whether we have sufficient data to comment unequivocally upon whether there is a difference between the processing of resolved and unresolved harmonics within the IC.

We have no real answer therefore to the question of whether processing of resolved harmonics could proceed monaurally and then be combined or could be based on a binaural integration. The psychophysical results for resolved harmonics (Bernstein and Oxenham 2003; Houtsma and Goldstein 1972) and the fact that pitch can be perceived in stimuli where the pitch cues are generated by binaural processing (e.g., Akeroyd and Summerfield 2000b) all lead to the conclusion that “binaural” and “monaural” pitch cues can be integrated into a single percept (Akeroyd and Summerfield 2000a). The data we have presented here provide no evidence to suggest that this occurs at the level of the IC. Further research is clearly required to determine how and where this occurs within the auditory system.