INTRODUCTION

Zwicker (1964) described an auditory illusion that can be induced by stimulation of the ear with a notched noise (NN), i.e., a broadband noise containing a suppressed frequency band. This auditory sensation is similar in quality to a sinusoidal tone and its pitch always falls within the suppressed frequency band of the noise. From this point of view, this auditory illusion, frequently called the “Zwicker tone” (ZT), has been considered to be an example of a “negative afterimage” in the auditory modality. Nevertheless, one can note that the term “negative afterimage” is improper in the sense that the ZT does not have a “noise quality” similar to a narrowband noise corresponding to the “image” of the spectral gap.

The physical characteristics of the noise required to induce a ZT have been well described, as well as the psychoacoustical properties of the ZT. Typically, a NN with a suppression band 1/2 octave wide induces a ZT (Zwicker 1964). However, Wiegrebe et al. (1996) showed that a ZT could also be induced with a NN containing a 1-octave-wide suppressed band centered on 4 kHz. For a Gaussian noise-like stimulus, a gap at least 1 critical band wide (1/4 octave) is needed near 1 kHz, while at a center frequency of 4 kHz, a less than 0.5-critical-band (Bark) gap width (1/8 octave) is sufficient (Zwicker and Fastl 1999). The psychoacoustical properties of the ZT strictly depend on the physical characteristics of the NN. For instance, the pitch increases with the level of the NN (Zwicker 1964; Zwicker and Fastl 1999). Taking into account that the slope of the masking pattern at the lower edge of the gap becomes flatter with increasing level, Zwicker and Fastl (1999) suggested that the pitch of the ZT corresponds to the crossing point of the masking pattern and the hearing threshold. Furthermore, the probability of hearing a ZT is a function of the overall noise level; it reaches an optimum level at about 43 dB SPL. Finally, the duration of the ZT is related to the duration of exposure to the inducer (Zwicker 1964; Zwicker and Fastl 1999). For a noise duration of 500 ms or 10 s, the ZT may last 200–300 ms or 2 s, respectively, while for a noise duration of 1 min, the ZT may last for about 10 s.

The ZT is an interesting phenomenon, provoking basic questions about the percept and its related neural activity in the central auditory system. For instance, the ZT provides a tool to understand how a neuronal signal can be generated and erroneously interpreted as a real sound. In addition, a ZT is perceived only after the NN, i.e., in silence. In other words, whereas the ZT is induced by the NN presentation, its perception is not related to a concurrent acoustic stimulus. In this context, it has been proposed that the ZT provides a model for transient tinnitus (Hoke et al. 1996, 1998; Noreña et al. 2000, 2002a). The hypothesis that the ZT is a model of transient tinnitus implicitly assumes that the NN presentation induces similar changes in the auditory system as those that are related to the generation of transient tinnitus. Noreña et al. (2000, 2002a) suggested that both illusory percepts could be generated by a discontinuity in the afferent inputs that induces an imbalance between excitation and inhibition at a central level. Such a discontinuity at the peripheral level could result from a hearing loss (associated with tinnitus in about 90% of cases; Sirimanna et al. 1996) or, in the case of the ZT, simply from the spectrum of the NN. In this sense, the NN is hypothesized to induce a “functional deafferentation” (Pantev et al. 1999; Noreña et al. 2000, 2002a).

The neurophysiological mechanisms underlying the ZT are not known. However, Wiegrebe et al. (1996) suggested that the ZT is likely not generated in the cochlea. They showed that a NN does not modify the amplitude of a spontaneous otoacoustic emission with frequency in the notch. This argues against the involvement of the cochlear amplifier in the generation of the ZT. Furthermore, the perception of the ZT is not modified by a concurrent very-low-frequency stimulus, which biases the position of the basilar membrane (Wiegrebe et al. 1996). On the other hand, the ZT cannot be induced when the low-pass component and the high-pass component of the NN stimulus are presented to different ears (Krump 1993, cited by Wiegrebe et al. 1996). This suggests that the ZT is generated either in the monaural stations of the central auditory system or before binaural integration has taken place.

Regardless the level at which the ZT is generated and its neurophysiological origins, the ZT should be associated with an aberrant neural activity accounting for its perception. Few studies have focused on the potential electrophysiological correlates of the ZT at the cortical level. Hoke et al. (1996) investigated the mechanisms underlying the ZT percept in humans using magnetoencephalography (MEG). They found that the N1m-off response evoked by a white noise (WN) burst immediately followed by a tone pip with the same frequency as the ZT was similar to that elicited by the ZT-inducing NN alone. The authors suggested that the NN might have induced a decreased activity in neurons with characteristic frequencies (CFs) outside the notch as a result of neural adaptation. This would result in a relative enhancement of the activity of neurons with CFs within the notch that would be further enhanced by reduced lateral inhibition. The local and relative enhancement of these neurons following the presentation of the ZT-inducing NN would thus be comparable to that produced by an external tone and would induce the same kind of percept. This hypothesis assumes that there are neurons that respond only to differences in activity between the neuronal populations with CFs inside the notch and those with CFs outside the notch.

In another MEG study, Pantev et al. (1999) focused on the central changes induced by an exposure to a notched stimulus (filtered music) presented for a few hours a day during three days. The tone burst-evoked cortical response amplitude (NTm component) decreased when the frequency region corresponding to the gap was stimulated, whereas the response was unchanged when the frequency region close to the gap was stimulated. The authors suggested that the CF of neurons corresponding to the notch had shifted toward the edge frequencies of the notch so that fewer neurons could respond to the probe tone. These proposed central changes in tonotopic organization induced by a NN are supposed to be similar to those following cochlear damage (Rajan et al. 1993). The results of Pantev et al. (1999) then corroborate the assumption that long-lasting presentation of a NN induces a “functional deafferentation.”

On the other hand, some studies focused on the potential perceptual changes concomitant to the ZT. The behavioral thresholds for frequencies falling within the notch measured after a NN were significantly improved compared with those measured after WN (Wiegrebe et al. 1996; Noreña et al. 2000). However, thresholds for frequencies on the outer edges of the NN were increased (Noreña et al. 2000). In addition, it has been shown that the loudness of low-level tones with frequency falling within the notch increased after presentation of a NN compared with that obtained after a WN (Noreña et al. 2002a). It has been suggested that the amount of spontaneous activity in the central auditory system prevents the detection of a stimulus at near-threshold level (Siebert 1965) and decreases the loudness of a low-level stimulus (Zwislocki 1965). Thus, the internal noise or spontaneous activity (SA) might be decreased after a NN in the frequency region corresponding to the gap. Also, it is possible that the neurons with CFs falling inside the notch could be more responsive after the presentation of a NN accounting for the improved hearing threshold.

In summary, it is still unclear whether the ZT is mediated by the adaptation of neurons excited by the NN (Hoke et al. 1996, 1998), results from an imbalance between excitation and inhibition induced by the NN (Pantev et al. 1999; Noreña et al. 2000, 2002a), or is the consequence of both mechanisms. In addition, the psychoacoustical changes during the ZT perception (Wiegrebe et al. 1996; Noreña et al. 2000, 2002a) suggest the paradoxical conclusion that the ZT is associated with a decrease in central SA. The present study is the first attempt to characterize the response properties of neurons during and after the presentation of a NN in primary auditory cortex using multielectrode arrays. We focused on two aspects of a neuron’s response properties, namely, the firing rate and the temporal correlation of discharges of simultaneously recorded units, to gain insight into the neurophysiological mechanisms of this auditory illusion.

METHODS

The care and the use of animals reported in this study was approved (#BI 2001-021) by the Life and Environmental Sciences Animal Care Committee of the University of Calgary. All animals were maintained and handled according to the guidelines set by the Canadian Council of Animal Care.

Animal preparation

All animals were deeply anesthetized with the administration of 25 mg/kg of ketamine hydrochloride and 20 mg/kg of sodium pentobarbital, injected intramuscularly. A mixture of 0.2 ml of acepromazine (0.25 mg/ml) and 0.8 ml of atropine methyl nitrate (25 mg/ml) was administered subcutaneously at approximately 0.25 ml/kg body weight. The tissue on the right side of the skull overlying the temporal lobe was removed and two 8-mm holes (centered approximately 9 mm posterior and 21 mm ventral from bregma, and approximately 5 mm posterior and 17 mm ventral from bregma) were trephined in the skull. The holes were enlarged with small bone rongeurs where required to ensure that the primary auditory cortex (AI) was fully exposed. The dura was then cut back and a photo taken of the exposed cortical surface such that the brain surface vascular pattern could be used as spatial reference for locating electrode placements. The exposed cortical surface was covered with light mineral oil to prevent the tissue from drying. Throughout the experiment, light anesthesia (sufficient to ensure that pinna reflexes were absent) was maintained with ketamine hydrochloride (5–10 mg/kg/h). The acepromazine/atropine methyl nitrate mixture was administered approximately every 2 h to control fluid secretion in the airways. The body temperature was monitored and maintained around 37°C with a thermostatically controlled heating blanket. Following the experiment, the animals were administered a lethal dose of sodium pentobarbital.

Acoustic stimulus presentation

Stimuli were generated in MATLAB® (The Mathworks Inc., Natick, MA, USA) and transferred to the DSP boards of a TDT-2 (Tucker Davis Technologies, Gainesville, FL, USA) sound delivery system. Acoustic stimuli were presented in an anechoic room from a speaker (Fostex RM765, flat ≤12 kHz then 3 dB/octave roll off to 25 kHz, measured at the cat’s head) placed approximately 30° from the midline into the contralateral field, about 50 cm from the cat’s left ear. Calibration and monitoring of the sound field was accomplished with a condenser microphone (Bruel & Kjaer 4134) placed above the animal’s head, facing the speaker and a measuring amplifier (Bruel & Kjaer 2636).

The characteristic frequency and tuning properties of individual neurons were determined using gamma-tone pips (Eggermont 1998). These tone pips, with a half-peak-amplitude duration of 15 ms and a gamma-function-shaped envelope, were presented at a rate of 1/s in a pseudorandom frequency order at fixed intensity level. The stimulus ensemble consisted of five identical sequences of 81 tone pips covering 5 octaves (with a tone separation of 1/16 octave), from 625 Hz to 20 kHz or from 1250 Hz to 40 kHz. The intensity series generally covered the range from 75 dB SPL to threshold in 10 dB steps.

After the frequency-tuning properties of the cells at each electrode were determined, WN bursts and NN bursts were presented. The bursts of noise were each 1 s long, followed by 2 s of silence. Three NN bursts, differing by the width of their missing frequency band, were presented. The widths of the missing frequency band were 1/3 octave, 1/2 octave, and 2/3 octave for NN1, NN2, and NN3, respectively. The NNs were generated by digitally filtering a Gaussian noise with a finite impulse response filter (sampling rate = 100 kHz, order = 4096). A subset of the filtered stimulus (the steady-state portion) was windowed in such a way as to eliminate the transient effect of the filter. The noise bursts were presented at an intensity level between 45 and 65 dB SPL and in the following order: WN–NN1–NN2–NN3. This sequence was presented 50 times. The center frequency of the NNs was chosen after the determination of the characteristic frequency of the neurons.

Recording and spike separation procedure

Two arrays of 8 electrodes (Frederic Haer Corp., Bowdoinham, ME, USA), each with impedances between 1 and 2 MΩ, were used. The electrodes were arranged in a 4 × 2 configuration with interelectrode distance within rows and columns equal to 0.5 mm. Each electrode array was oriented such that all electrodes were touching the cortical surface and then were manually and independently advanced using a Narishige M101 hydraulic microdrive (one drive for each array). The signals were amplified 10,000 times using a Frederic Haer Corp. HiZ × 8 set of amplifiers with filter cutoff frequencies set at 300 Hz and 5 kHz. The amplified signals were processed by a DataWave multichannel data acquisition system. Spike sorting was done offline using a semiautomated procedure based on principal component analysis (Eggermont 1990) implemented in MATLAB. The spike times and waveforms were stored. The multiunit data presented in this article represent only well-separated single units that, because of their regular spike waveform, likely are dominantly from pyramidal cells (Eggermont 1996). Thus, contrary to the common use of the term multiunit as a cluster of not-separated units, here the separated single-unit spike trains were added again to form a multiunit spike train that likely consists of contributions from only one type of neuron. Our procedure thus eliminated potential contributions from thalamocortical afferents or interneurons.

Data analysis

To assess frequency-tuning properties, the peak number of action potentials in a 5-ms bin of the poststimulus time histogram (PSTH) over the first 100 ms for each gamma-tone presentation was estimated. The peak counts for three adjacent frequencies were combined in order to reduce variability and divided by number of stimuli and presented as a firing rate per stimulus. This resulted in 27 frequency bins covering 5 octaves so that the final frequency resolution for determining the CF was approximately 0.2 octave. The results were calculated per stimulus intensity and were combined into an intensity–frequency–rate profile from which tuning curves, rate–intensity functions, and isointensity rate–frequency contours could be derived (Eggermont 1996) using routines implemented in MATLAB. The frequency-tuning curve was defined for a firing rate at 25% of the maximum peak-firing rate. This was about 10–20% above the background firing rate, but, as the latter was dependent on the level of stimulus-induced suppression, the tuning curve criterion based on a percentage of peak firing rate was preferred over that based on increase over background activity.

PSTH and cross-correlation coefficient functions were calculated during and after the presentation of noise bursts for time windows (TWs) of 500 ms. Such a large TW was chosen for two reasons: First, the spontaneous firing rate in auditory cortex in the ketamine-anesthetized cat is low — on average 1.5 spikes/s (Eggermont and Komiya 2000) to 4/s (Zurita et al. 1994). Second, whereas the neural activity is acutely enhanced at the onset of a stimulus, and potentially at the offset of the stimulus, the neural activity may be suppressed for several hundred milliseconds to below the SA level following the onset response. Consequently, and because we wanted to focus on the properties of discharges of neurons in auditory cortex during a stimulation with a NN — or WN — and after this stimulation (namely, where the firing rate is potentially low), we used a relatively large analysis window of 500 ms. However, choosing a large TW is at the expense of temporal accuracy. Furthermore, because we presented NN for a relatively short duration (1 s), the risk of missing an effect of a NN is high because the neural changes may fall into two consecutive TWs and be less detectable. In order to avoid this undesirable effect of data sampling, we calculated PSTHs and peak cross-correlation coefficients for 26 sliding TWs, each 500 ms wide, but shifted the starting time of the window by 100 ms over the 3 s of recordings. It is important to consider that this method results in an oversampling of the data, but differences between the control condition and the NN condition became clearly visible in the figures. The oversampling effect was addressed in the statistical analysis by a Bonferoni correction.

The peak cross-correlation coefficients were obtained from the following equation (Abeles 1982; Eggermont 1992):

R AB is the peak count in the cross-correlation histogram. The mathematical expectancy of a coincidence under the assumption of independence of two spike trains is given by E = (N A N B)/N, where N A and N B are the number of spikes in trains A and B, respectively, and N is the number of bins. Under the assumption of Poisson-distributed spike trains, the standard deviation of the distribution is equal to the square root of E. Since ρ is equal to the Z-score divided by the square root of the number of bins, Eq. (1) results. Finally, the peaks of cross-correlation coefficients were considered significant if the value exceeded the baseline by 3 standard deviations (SD). For small firing rates (<10 sp/s) and small bin sizes, the standard deviation of ρ is equal to (N)−0.5, where N is the number of bins in the recording (Eggermont 1992). Only ρ values more than 3 SD above zero were used in this study.

In this study, as we have mentioned before, we were especially interested in the neurophysiological correlates of the ZT and assume those to be the differences in the neural responses following a NN compared with those following a WN (control condition). It is important to note that the interneuronal variability in terms of PSTH (and in cross-correlation coefficients) is relatively high, during the stimulus presentation as well as after it. For instance, the mean of the peak PSTH calculated for a 500 ms TW and for the WN condition is 6.93 sp/s, SD = 9.12 sp/s, range = 0.08–58.5 sp/s). If we consider the absolute values of PSTH and ρ, this variability could bias the detection of a potential difference in neural properties during and following a WN or a NN. Indeed, for a neuron with a low FR, for instance 2 sp/s for the WN condition, an increase in FR to 4 sp/s for the NN condition represents a change of a factor of 2. On the other hand, for a neuron with a high FR, for instance 10 sp/s for the WN condition, the same absolute amount of increase in FR, namely, 12 sp/s for the NN condition, represents only a change of a factor of 1.2. In averaging the absolute values of PSTH, the interneuronal variability (independent of the effect of NN) is neglected, and, consequently, the strong effect of the NN condition on the low-FR neuron is underestimated. In addition, the absolute values of ρ and PSTH data are not normally distributed. Consequently, for these two latter reasons, we used the ratio NN/WN for statistical comparison. These ratios were then statistically compared to a ratio of 1 with a two-tailed t-test. We expected an effect of the NN on neural response properties different from that induced by a WN during and after stimulation. First, the ratios for TWs with a start time between 0 and 0.5 s (six different TWs covering 1 s, namely, the entire stimulus duration) were statistically compared to 1. The data for TWs with start time between 0.6 and 0.9 s were not included in the statistical comparison because they correspond to conditions both during and after the stimulus presentation. Second, psychophysical studies in humans reported that the ZT lasts several hundred milliseconds after a presentation of a NN (Zwicker 1964). We then assumed that, in our study (NN is presented during 1 s), a potential effect of NN on cortical neurons should not last more than 1 s. Consequently, the ratios of data for TWs with start time between 1 and 1.5 s (six different TWs covering 1 s) were statistically compared to 1. As mentioned before, the oversampling effect related to our approach was addressed in the statistical analysis by a Bonferoni correction. The statistical threshold (p = 0.05) was divided by the number of comparisons, i.e., 12. As a result, the ratios were considered significantly different from 1 only when p < 0.0042. Because the amount of overlap between successive TW is important (80%), the FR and ρ values for successive TW are strongly correlated. The correction we applied (dividing the ρ value by the number of comparisons), assuming independence between comparisons, is then very conservative.

At the perceptive level, the psychoacoustical properties of a ZT induced after the presentation of a NN are well defined (Zwicker 1964; Wiegrebe et al. 1996; Zwicker and Fastl 1999). The pitch of the ZT is always located in the frequency range of the notch (see Introduction). It is thus likely that a NN affects principally the neurons with CF falling in or nearby the frequency range of the notch. In this study, the neurons were then grouped according to whether their CF was in the notch or in the frequency band outside the notch. However, the different noises were presented at an intensity level above the threshold of neurons, where the tuning curve is more or less broadened at the level of the noise. Consequently, while a neuron can have a CF outside the notch, the excitatory tuning curve can overlap with the notch of the noise at the intensity level of the noise; these neurons could then be affected by the NN. Thus, we considered as “In” the notch those neurons for which the CF is within a 1-octave-wide frequency band around the notch center frequency (Nc). Neurons with CFs outside this frequency band are considered as “Out” of the notch. Finally, the group “Out” was divided into two groups according to the frequency distance between the CF of neurons and Nc. The “Outfar” neurons have their CF more distant from Nc than 1.5 octaves, whereas the “Outclose” neurons have their CF within 1.5 octaves from Nc. This classification of neurons according to the relation of their CF and Nc was independent of the notch width. In the case of cross-correlation calculations and because this calculation implies two separate electrodes, we have three different groups: (1) “In–In group,” where neurons of both electrodes have their CF within a 1-octave-wide frequency band centered on Nc; (2) “In–Out,” where units of one electrode have a CF within the 1-octave-wide frequency band centered on Nc and the other units of the other electrode have a CF outside the latter frequency band; (3) “Out–Out,” where neurons of both electrodes have their CF outside the frequency band centered on Nc.

All statistical analyses were performed using Statview 5® (SAS Institute Inc., Cary, NC). Illustrations were made with SigmaPlot® and Powerpoint® (Microsoft, Redmond, WA).

RESULTS

The data presented in this study are based on a set of 77 different recordings (different locations) from the primary auditory cortex of ten cats (age 4–9 months, mean = 5.9 months, SD = 1.5 months).

Poststimulus time histograms

Figure 1 shows representative dot displays of spiking activity at four different electrodes (same array) in the primary auditory cortex of one cat during and following a stimulus presentation. Each dot display shows multiunit activity (MU) for the four noise conditions, i.e., WN, NN1, NN2, and NN3, from the bottom to the top (as indicated at the right of the figure) and separated by dotted lines, respectively. In this example, the notch center frequency (Nc) was equal to 3 kHz, and the CFs of the multiunit recordings were equal to 3104, 2102, 3104, and 1621 Hz for electrodes 3, 4, 7, and 8, respectively. Units for electrodes 3 and 7 then corresponded to the “In” group, whereas units for electrodes 4 and 8 corresponded to the “Out” group. The four noise stimuli, responses separated by dotted lines, lasted until the 1-s mark and were followed by 2 s of silence. One notices the clear onset response followed by a suppression of activity compared with the spontaneous level for electrodes 3, 4, and 7, that resumes about 250 ms after the end of the noise burst (at the 1-s mark). The activity for electrodes 3 and 4 is largely single unit, whereas that for electrodes 7 and 8 has two equally responsive units, indicated by different symbols.

Figure 1
figure 1

Multiunit dot displays for WN and NN presentation from four different electrodes inserted in the primary auditory cortex of the same cat. Different units are indicated by different symbols.Units for electrodes 3 and 7 had CFs corresponding to the “In” group, whereas units for electrodes 4 and 8 had CFs corresponding to the “Out” group. The noise burst duration is 1 s followed by 2 s of silence. Each dot display shows the neural responses during and after the noise presentation for the four noise conditions, i.e., WN, NN1, NN2, and NN3, from the bottom to the top and separated by dotted lines, respectively. For each electrode there is an acute onset response and a decrease of neural activity thereafter. A normal amount of SA is recovered about 500 ms after the end of the noise (electrodes 3, 4, and 7).

Figure 2 shows the corresponding PSTH calculated in 20-ms bins for each electrode shown in Figure 1. The sharp onset response occurs at about 20 ms after stimulus onset. Following the onset response, a decrease in firing rate (FR) is clearly visible for electrodes 3, 4, and 7. About 0.5 s after the end of the stimulus the FR has recovered to normal baseline level. This observation is consistent with previous results of Eggermont (1994) who reported the same time constant for the neurons to recover a normal value of FR after the presentation of a broadband noise. For electrode 8, there is a short decrease of FR after the onset response and then a faster recovery compared with the other electrodes. Finally, one notes that there was no offset response in these examples.

Figure 2
figure 2

Post stimulus time histograms (PSTHs) for the data presented in Figure 1. The organization is similar to that in Figure 1. Bin width is 20 ms. The vertical axis of each histogram represents the firing rate from 0 to 30 spikes/s

The CF of neurons were broadly distributed between 681 and 33,636 Hz (mean = 6328 Hz) and so were the Nc values which covered a frequency bandwidth of about 3.3 octaves (range = 500–5000 Hz, mean = 2729 Hz). Figure 3 shows the PSTH data averaged across all the recordings, for each of the 26 TWs and for the four conditions of noise, for the groups “In” (Fig. 3a), “Outclose” (Fig. 3b), and “Outfar” (Fig. 3c), respectively. Note that the “Outfar” and “Outclose” groups have higher FRs compared with the “In” group. On average, and for all conditions of noise, the FR strongly decreases during stimulus presentation (after an acute onset response). Furthermore, a slight increase in FR compared with spontaneous activity is visible around 500 ms after the end of stimulation, i.e., around the 1.5-s mark.

Figure 3
figure 3

Peak PSTH values averaged across all the recordings for each of the 26 TWs and for the 4 conditions of noise, i.e., WN (open circles), NN1 (filled circles), NN2 (open squares), and NN3 (filled triangles), for the groups “In” (a), “Outclose” (b) and “Outfar” (c), respectively. In each panel, n indicates the number of recordings.

Figure 4 shows the ratios of peak PSTH values (NN/WN), for the “In” group (Fig. 4a), the “Outclose” group (Fig. 4b), and the “Outfar” group (Fig. 4c). For the “In” group one notes that the ratios are generally enhanced during and after NN presentation. However, the ratios of peak PSTH values statistically increased during the stimulus presentation for only NN3, for TW starting at 0.3, 0.4, and 0.5 s. For the NN1 and NN2 conditions, because our Bonferoni correction of the ρ value is very conservative, the significance level is not reached. The ratios also increased after the stimulus presentation for TWs with a starting time between 1 and 1.3 s for NN2 and NN3 but not for NN1. On the other hand, for the “Out” group the changes in PSTH are very small compared with those for the “In” group. Consequently, we did not notice any ratio that was significantly different from 1. However, when the “Out” group was divided into “Outclose” and “Outfar” groups (Fig. 4b,c), t-tests revealed that some ratios for the NN2 condition were significantly different from 1. That is, in the “Outclose” group, the ratio for the TW, with a start time at 1.3 s, was significantly less than 1 after stimulus presentation. In contrast, in the “Outfar” group, the ratio at the same TW (starting at 1.3 s) significantly increased after stimulation.

Figure 4
figure 4

Ratios of the PSTH data (NN/WN) averaged across all the recordings for each of the 26 TWs and for the 3 conditions of NN, i.e., NN1 (filled circles), NN2 (open squares), and NN3 (filled triangles), for the groups “In” (a), “Outclose” (b), and “Outfar” (c), respectively. Error bars indicate the standard error of the mean.

Finally, in order to find a potential neural correlate for the distinctly tonal pitch of the ZT, we separated the “In” neurons into “Inpitch” and “Inno-pitch” groups. Indeed, it has been demonstrated that the ZT pitch was located at about 0.2 octave — about 1 critical band — from the low-frequency edge of the NN, independent of the notch bandwidth (Zwicker 1964; Wiegrebe et al. 1996; Zwicker and Fastl 1999). Consequently, the CFs of the “Inpitch” neurons correspond to the frequency band between the low-frequency border of the “In” frequency band (one-half octave below Nc) and Nc. The “Inno-pitch” neurons correspond to “In” neurons which have their CF above Nc. Figure 5 shows the ratios of peak PSTH values (NN/WN) for the “Inpitch” group (Fig. 5a) and the “Inno-pitch” group (Fig. 5b). For the “Inpitch” group, one notices that the standard errors of the mean are large, which is a result of the relatively small number of recordings for this group (n = 21). However, some ratios are significantly different from 1. After the stimulation, the ratios differed statistically from 1 in the NN2 condition for the TW starting at 1 s. For the “Inno-pitch” group, the ratios were different from 1 in the NN2 condition for the TWs between 1 and 1.3 s. For NN3, the ratios were different from 1 for the TWs between 0.2 s and 0.5 s and for those between 1 and 1.2 s.

Figure 5
figure 5

Ratios of the PSTH data (NN/WN) averaged across all the recordings for each of the 26 TWs and for the 3 conditions of NN, i.e., NN1 (filled circles), NN2 (open squares), and NN3 (filled triangles), for the groups “Inpitch” (a) and “Ino-pitch” (b), respectively. Error bars indicate the standard error of the mean.

Cross-correlation

Figure 6 shows the cross-correlation histograms (CCH) calculated for a 500-ms time window (with a starting time at 1.2 s, bin = 2 ms) and for each pairwise combination of electrodes shown in Figures 1 and 2. Note the variability in the shape of CCHs as well as in the peak values of CCHs. A clear peak, however, is visible for all combinations of electrodes and for all conditions of noise. Furthermore, all the peaks of CCHs shown in Figure 6 were relatively close to the time lag equal to 0. One observes in the CCH between electrodes 3 and 7, i.e., between units having their CF falling in the notch, that the peaks are higher for each NN condition than for the WN condition. In contrast, for units having both their CFs outside the notch (electrodes 4 and 8), the peak of the cross-correlogram is higher for the WN condition than for each NN condition. Finally, when one unit has a CF in the notch and the other one has its CF outside the notch (group “In–Out”), the peaks of the cross-correlogram are usually higher for the NN conditions than those for the WN condition (between electrodes 3 and 4, 4 and 7, 7 and 8). The ρ values (incorporating firing-rate-based expected values and SDs, shown in Fig. 6 at the right of each panel) give the same pattern of results.

Figure 6
figure 6

Cross-correlation histograms (CCH) calculated for 500-ms time window (with a starting time at 1.2 s, bin size = 2 ms) and for each pairwise combination of electrodes shown in Figures 1 and 2. Peak cross-correlation coefficients are indicated at the right-hand side of each panel.

Peak cross-correlation coefficients were calculated for 150, 99, and 209 electrode pairs for “In–In,” “In–Out,” and “Out–Out” groups, respectively. Figure 7 shows the ρ values averaged across all the recordings for each of the 26 TWs and for the four conditions of noise, for the groups “In–In” (Fig. 7a), “In–Out” (Fig. 7b), and “Out–Out” (Fig. 7c), respectively. The averaged geometric mean of the firing rates of the two recordings used in the calculation of cross-correlation coefficient is also shown for the groups “In–In” (Fig. 7d), “In–Out” (Fig. 7e), and “Out–Out” (Fig. 7f). For the groups “In–In” and “Out–Out” (Fig. 7a,c), one can see that the curves representing the cross-correlation peaks for each condition of noise show an increase, peaking at about 350 ms after stimulus onset. This increase in ρ is much less clear in the “In–Out” group, probably because the ρs are much smaller (Fig. 7b). The curves representing the geometric mean of the firing rate (Fig. 7d–f) show that the FR, after an acute increase at the onset of the stimulus, is strongly suppressed during stimulation. One can assume that TWs starting 1 s after the end of the noise, i.e., well after any expected effect of broadband noise on neural activity (Eggermont 1994), and finishing 2 s after the end of the stimulus represent spontaneous activity of neurons. One can see in Figure 7d–f that the averaged neural activity drastically decreases after the onset and well below the spontaneous activity level. Initially, the result that ρ increases while the firing rate decreases is surprising. Indeed, it is commonly expected that ρ is dependent on the firing rate (Melssen and Epping 1987; Eggermont 1994; Das and Gilbert 1995). However, recent studies reported the same result (De Charms and Merzenich 1996; Eggermont 1997), i.e., whereas the neural activity during stimulation is not higher than that during a “nonstimulation condition” (spontaneous activity), the peaks of cross-correlation were found to be enhanced. It has been suggested that this increase in the temporal correlation between neurons codes for the presence of a stead-state stimulus (deCharms and Merzenich 1996; Eggermont 1997). Here we further demonstrate that this increase in ρ, during the presentation of a broad band noise, is related to the difference between the CFs of the two recordings used in the calculation of ρ. Indeed, in groups “In–In” and “Out–Out,” the CFs are relatively close (mean CF difference: In–In = 0.2 octave, SD = 0.22 octave, and “Out–Out” = 0.5 octave, SD = 0.46 octave). On the other hand, in the “In–Out” group, the CFs are, on average, relatively far away from each other (mean CF difference = 0.93 octave, SD = 0.48 octave).

Figure 7
figure 7

Peak cross-correlation coefficients averaged across all the recordings for each of the 26 TWs and for the 4 conditions of noise, i.e., WN (open circles), NN1 (filled circles), NN2 (open squares), and NN3 (filled triangles), for the groups “In–In” (a), “In–Out” (b), and “Out–Out” (c), respectively. The averaged geometric mean of the firing rate of the two neurons used in the calculation of cross-correlation coefficient is also shown for the groups “In–In” (d), “In–Out” (e), and “Out–Out” (f).

In Figure 7a, one can see that the ρ are relatively similar for the different noise conditions, except for NN2. For NN2, the ρ are clearly enhanced during stimulation (between 100 and 700 ms after onset) compared with those of the other conditions, with a maximum increase at around 300 ms. The geometric mean of the FR is also changed (Fig. 7d); however, this does not account for the changes in ρ since the geometric mean changes much later. Indeed, in the TW where the ρ values are increased for NN2, the FR is very similar across noise conditions. On the other hand, while the FR is enhanced from about 600 ms to 1.4 s for NN2 and NN3 with a maximum at around the offset of the stimulus, the ρ values are very similar across the different noise conditions.

In Figure 7b, where ρ plotted for the “In–Out” group according to the starting TW, notice that the ρ values tend to be, on average, higher for the three NNs compared with those of the WN. This increase occurs during stimulation (for a TW roughly between 100 and 800 ms) and after the end of the presentation of the noise stimulus (to about 1.5 s). In the FR (Fig. 7e), no clear pattern of differences between NN and WN is visible. Consequently, once again, the changes in FR do not account for the changes in ρ. Finally, for the “Out–Out” group (Fig. 7c,f), ρ and FR are relatively similar across the different noise conditions.

Figure 8 show the ratios of ρ (NN/WN) for “In–In” (Fig. 8a), “In–Out” (Fig. 8b), and “Out–Out” groups (Fig. 8c). In Figure 8d–f, the ratios of the geometric mean of the firing rates are plotted. Note in Figure 8a (“In–In” group) and 8b (“In–Out” group) that the ratios have a tendency to be enhanced during and after stimulus presentation.

Figure 8
figure 8

Ratios of ρ for the 26 TWs and for the 3 conditions of NN, i.e., NN1 (filled circles), NN2 (open squares), and NN3 (filled triangles), for “In–In” (a), “In–Out” (b), and “Out–Out” groups (c). The ratios of the geometric mean of the firing rate are also plotted (d,e,f). Error bars indicate the standard error of the mean.

The ρ ratios were significantly different from 1 in the “In–In” group for the NN2 condition, during stimulus presentation, for TWs with starting time between 0 and 0.5 s (six different TWs). The changes in ρ during stimulation were much smaller for NN1 and NN3, with a significant increase only at 0.5 s for NN1 and at 0.2 s for NN3. After the stimulus presentation, ρ was significantly increased for all the NN conditions for starting at 1.1 s.

In the “In–Out” group, the ρ ratios were significantly different from 1 during stimulus presentation for NN2 at 0.5 s and NN3 at 0.2, 0.4, and 0.5 s, as well as after stimulus presentation for NN2 at all the TWs between 1 and 1.5 s and NN3 at 1, 1.1, 1.2, and 1.4 s.

In the “Out–Out” group, the ρ ratios were significantly increased during the stimulus presentation for the NN1 condition at all the TWs between 0 and 0.5 s, and for the NN2 condition at 0, 0.2, and 0.5 s. After stimulus presentation, the ρ ratios were increased for NN2 at 1.1, 1.2, and 1.3 s and for NN3 at 1.3, 1.4, and 1.5 s.

It is important to note that the ratios concerning the geometric mean of the FR are also largely enhanced in the “In–In” group and the “In–Out” group. However, it is unlikely that the enhanced FR account for the increase in ρ. Indeed, in the “In–In” group, there was no correlation between ρ ratios and FR ratios. Neither was there a correlation for TWs between 0 and 0.5 s (linear regression analysis, R 2 = 0.007, 0.027, and 0.001 for NN1, NN2, and NN3, respectively) nor for TWs between 1 and 1.5 s (R2 = 0.002, 0.001, and 0.00011 for NN1, NN2, and NN3, respectively). Similarly, in the “In–Out” group, there was no dependence of the FR ratios on the ρ ratios either during stimulation (R 2 = 0.0002, 0.003, and 0.003 for NN1, NN2, and NN3, respectively) or after stimulation (R 2 = 0.003, 0.006, and 2 × 10−5 for NN1, NN2, and NN3, respectively). Furthermore, linear regression analysis revealed that there was no dependence of the FR ratios on the ρ ratios for the “Out–Out” group, either during stimulation (R 2 = 0.03, 0.002, and 0.002 for NN1, NN2, and NN3, respectively) or after it (R 2 = 0.007, 1.9 × 10−6, and 0.009 for NN1, NN2, and NN3, respectively).

Figure 9 shows a summary of the results for all the NN conditions, in terms of both FR ratios (Fig. 9a–c) and ρ ratios (Fig. 9d–f), as a function of the distance between the CF and Nc (see above). The values plotted in Figure 9 correspond to the averaged data of the six TWs during stimulation and the averaged data of the six TWs after stimulation. The “During” group was defined as TWs with starting time between 0 and 0.5 s (filled circles) and the “After” group was defined as having TWs with starting time between 1 and 1.5 s (open circles). During stimulation, notice that the pattern of results for FR is similar across the three NN, with an enhanced ratio for the “In” group and smaller changes for the “Outclose” and “Outfar” groups. On average, one can see a tendency for the “Inno-pitch” neurons to show a greater increase in FR compared with “Inpitch” neurons. After stimulation, the pattern of results is slightly different. The FR for “In” neurons still increases compared with the “Outclose” and “Outfar” groups, but this time the increase is maximal for the “Inpitch” neurons. This pattern of result after stimulation is reminiscent of a lateral inhibition process.

Figure 9
figure 9

Summary of the results for all the NN conditions in terms of both FR ratios (a,b,c) and ρ ratios (d,e,f), in function of the distance between a neuron’s CF and Nc (see text). The “During”values correspond to the averaged data of the six TWs starting between 0 and 0.5 s (filled circles), and the “After” values correspond to averaged data of the six TWs starting between 1 and 1.5 s (open circles). Error bars indicate the standard error of the mean.

Concerning the ρ ratios, during stimulation, there is a maximal increase for NN2 for the “In–In” group compared with the “In–Out” and “Out–Out” groups. On the other hand, for NN1 and NN3, no clear difference is noted between the “In–In” and “In–Out” groups. After stimulation, the pattern of results follows the same tendency across the different NN conditions: i.e., the ρ ratios are maximally increased for the “In–Out” group, whereas they are similar (small changes) between the “In–In” and “Out–Out” groups. It is interesting to note that for NN2 the changes in ρ ratios are different during and after stimulation: ρ maximally increases in the “In–In” group during the exposure and in the “In–Out” group after the exposure (Fig. 9e). This latter increase in ρ in “In–Out” group for the NN2 condition after the NN exposure is paralleled by a maximal increase in FR for the “Inpitch” group (Fig. 9b).

DISCUSSION

The major findings of our study can be summarized as follow (Tables 1 and 2): First, the firing rate of neurons with CFs “In” the notch is enhanced during (for NN3 only) and after (NN2 and NN3) the NN presentation compared with the WN condition. Second, ρ values are also significantly increased in the group “In–In” during and after NN presentation (for all NN conditions) as well as in the group “In–Out” during and after NN presentation (NN2 and NN3). Finally, in the “Out–Out” group, the ρ values are enhanced during the NN1 and NN2 presentation and after the NN2 and NN3 presentation.

Table 1 Summary of the significant time windows for changes in firing rate
Table 2 Summary of the significant time windows for increased peak cross-correlation coefficient (ρ)

Neural correlates of the psychoacoustical changes during ZT perception

In previous studies, it has been demonstrated that absolute thresholds and loudness were changed concomitant to the ZT perception (Wiegrebe et al. 1996; Noreña et al. 2000, 2002a). Noreña et al. (2002a) suggested that the improved hearing sensitivity as well as the increase in loudness during the perception of the ZT might be related to a decrease of “internal noise,” i.e., the spontaneous activity (SA) in the auditory centers. The present study does not corroborate this assumption. Instead, it suggests that the psychophysical changes during ZT perception at frequencies falling in the notch are associated with an increase of SA compared with the control condition (WN presentation). It is then possible that an increase in driven FR of neurons might be related to the improved detection thresholds concomitant to the ZT perception. This hypothesis is consistent with that of Viemeister and Bacon (1982) to account for the increase in forward masking induced by deleted components of a harmonic complex, that is, an increase in “gain” — responsiveness — of neurons in regions corresponding to the deleted component could account for the latter result.

Potential neurophysiological mechanisms of the ZT

Our study is the first one to focus on the potential electrophysiological correlates of a ZT using multiectrode, multiunit recordings. To our knowledge, only three previous studies (Hoke et al. 1996, 1998; Pantev et al. 1999), all using MEG, investigated the effects of a NN exposure on the auditory cortex. Hoke et al. (1996) suggested that the neurons excited by the NN are progressively adapted while the neurons having their CF in the notch are not. Consequently, because the FR of neurons out of the notch could be decreased, the FR of neurons in the notch is relatively increased, accounting for the perception of the ZT.

Our finding of a relative increase in FR for “In” neurons compared with “Out” neurons could also be explained by synaptic depression. Indeed, considering that “In” neurons receive less excitatory inputs in the NN condition than in the WN condition, less depression is expected for the NN condition. Furthermore, as the bandwidth of the notch increases, the FR of the “In” neurons also increases. Indeed, for the NN1 condition, the notch bandwidth is likely too narrow to cause any significant difference in terms of FR in comparison to the WN condition. In contrast, for the condition with the broadest notch (NN3), the increase in FR is greater than that for NN1 and NN2, during and after stimulation. Finally, taking account of the shape of the pattern of peripheral excitation (Moore 1992) induced by the NN, one could expect more adaptation of “Inpitch” neurons (close to the low-frequency border of the NN) compared with “Inno-pitch” neurons. On the other hand, after the NN stimulation, there is a tendency for the FR to be greater for “Inpitch” neurons compared with “Inno-pitch” neurons. After the end of stimulation, another mechanism may be involved in causing the FR changes.

As suggested in the Introduction, a notched stimulus can also be considered as simulating a functional deafferentation (Pantev et al. 1999; Noreña et al. 2000, 2002a). The NN simulates the background noise in “everyday life” conditions, and its missing frequency band simulates localized peripheral damage. It has been suggested that a NN could induce an imbalance between excitation and inhibition — through a decrease in inhibition — thereby unmasking previously inhibited inputs (Pantev et al. 1999; Noreña et al. 2000, 2002a). One possibility for decreasing central inhibition would be that intracortical excitation as well as inhibition adapt over time. If the lateral inhibition is more adapted than intracortical and/or thalamocortical excitation, then the excitatory inputs should dominate over the inhibitory ones. However, it has been demonstrated in visual cortex that inhibitory synapses show less overall adaptation than excitatory synapses (Galaretta and Hestrin 1998; Varela et al. 1999). Alternatively, if the activity of thalamic neurons that are excited by the NN decreases, then the lateral inhibition (Shamma and Symmes 1985) from neurons “Out” of the notch toward the neurons “In” the notch might also decrease. The net effect of such a mechanism could be enhanced excitation, resulting from unmasking excitatory inputs (Phillips and Hall 1992). This unmasking is expected to be greater near the low-frequency border of the notch, where the largest imbalance between excitation and inhibition is found. The maximal enhancement of FR for “Inpitch” neurons after the presentation of the NN is consistent with this hypothesis.

If a decrease of lateral inhibition results in an unmasking of previously nonfunctional connections, then the functional connectivity between cortical neurons should increase. Consequently, the peak correlation coefficient calculated in our study should increase (Das and Gilbert 1995). Our results are strongly in agreement with this: The group of “In” neurons shows an increase in ρ. Interestingly, the greatest effect is seen for NN2, which is reported to be the optimal NN to induce a ZT (Zwicker 1964). The increase in ρ for the “In–In” group may be induced by unmasked excitatory connections coming from both “Out” and “In” neurons. The increase in ρ for the “In–Out” group is in accordance with the hypothesis that “Out” neurons may excite “In” neurons through lateral connections. On the other hand, the increase in ρ for the “Out–Out” group is more surprising and an explanation is not evident. Assuming that the net effect of intracortical fibers running across the isofrequency sheets in normal auditory cortex is inhibitory, the reduction in output from the region falling in the notch (compared with WN) might result in a reduction of this long-range inhibition. This, in turn, might increase ρ (all NN) and FR (NN2) in neurons with CFs distant from the notch.

The changes in ρ suggest that neural changes related to a NN presentation take place at thalamocortical or corticocortical synapses. However, the observed changes in ρ do not rule out the possibility that they are the result of changes at lower levels in the auditory system. For instance, we hypothesized that neural adaptation was the key factor of the subsequent central changes (decrease of inhibition and unmasking). It is well known that neural adaptation occurs at all stages of the auditory system, starting in the auditory nerve (Smith 1977; Harris and Dallos 1979). Moreover, we suggested that this neural adaptation could induce a change in the strength of lateral inhibition at more central auditory nuclei. This suggests that subcortical mechanisms may be involved in the changes of cortical neural properties related to a NN presentation.

We have suggested above that a decrease in inhibition could play a key role in modifying neural properties and could account for our data in terms of both FR and ρ. Nevertheless, one could envision direct facilitation of neural connections through changes in efficacy of excitatory synapses. For instance, a mechanism such as long term potentiation (LTP), involving NMDA receptors, is often evoked to account for central reorganization (Hesse and Donoghue 1994; Calford 2002; Chen et al. 2002). However, it is unlikely that an NMDA-based mechanism is involved in the changes that we observed in our study because the anesthetic ketamine is a NMDA channel blocker, likely preventing a NMDA-based plasticity (Cotman and Monaghan 1987; Foutz et al. 1988). On the other hand, it has been proposed that some mechanisms that temporarily enhance synaptic efficacy do not depend on NMDA receptors (Fisher et al. 1997; Malenka 1991). This type of synaptic change is transitory and depends on an increase in calcium concentration in presynaptic terminals (Fisher et al. 1997). Our results do not permit a definite conclusion about the mechanisms involved in the changes we observed.

Electrophysiological correlate of the ZT

A simple neural code for the ZT perception might be represented by an absolute increase in spontaneous activity for the “In” neurons compared to a hypothetical baseline value. However, it is unlikely that a perception is coded by a comparison between an instantaneous and a baseline firing rate. First, the spontaneous activity is variable and strongly dependent on the immediate stimulation history: We observed on average a decrease in SA after the stimulation. Second, one observes in our study that FR tends to be lower than SA after stimulation with all noise types. Consequently, an absolute increase of FR compared with a baseline value does not seem to be the appropriate code for the ZT perception.

On the other hand, a relative increase in FR for “In” neurons compared with that for “Out” neurons might code for the ZT perception. Unfortunately, the variability of cortical neurons in terms of FR is large, and for this reason it is irrelevant to compare the absolute FR of “In” neurons with those of “Out” neurons. Furthermore, the FR for the “In” group on average was lower than that of the “Outclose” and “Outfar” groups. Thus, normalization is needed to assess the effect of NN on neural activity. One solution would have been to normalize the values of FR (or ρ) by dividing them by those obtained during spontaneous activity. However, this approach does not solve the problem linked to the noted variability in the neural activity during stimulation as well as the rebound response after stimulation. Furthermore, some of these changes may not be relevant because they could reflect effects of anesthesia.

We propose that the better way to assess the changes in neural properties related to the NN presentation was to normalize the values obtained under the NN condition by the values obtained under the WN condition. The changes in FR and ρ are dependent on the relation between CF and the center frequency of the notched noise (Fig. 9). After the presentation of NN2 and NN3, the FR relatively increases for “In” neurons compared with that of “Outclose” and “Outfar” neurons. In contrast, ρ maximally increases for the “In–Out” group compared with that of the “In–In” and “Out–Out” groups. Taking into account that the pitch of the ZT is located within the notch, the relative increase in FR for “In” neurons might then represent an electrophysiological correlate of the ZT. Furthermore, after stimulation, there is a tendency for the “Inpitch” neurons to have a greater increase in FR compared with “Inno-pitch” neurons. In contrast, the FR ratios are similar between the “Inpitch” and “Inno-pitch” neurons, or even smaller for the “Inpitch” neurons during stimulation. Thus, this maximal increase in FR of the “Inpitch” neurons might even account for the pitch of the ZT located at about 1 critical band from the low-frequency corner of the notch (Zwicker 1964; Wiegrebe et al. 1996; Zwicker and Fastl 1999).

The functional significance of the increase in ρ after stimulus presentation for the “In–Out” neurons is unclear. Potentially, a combination of increased ρ and increased FR might be a prerequisite for the emergence of the ZT.

Similarities with tinnitus

A mechanism involving unmasked connections after deafferentation has previously been proposed to explain the emergence of tinnitus and to account for its pitch located in the deafferented region (Rauschecker 1999; Noreña et al. 2002b). Interestingly, it has been shown in a tinnitus subject with a notch in the audiogram that the most prominent spectral component of tinnitus was located at a about 1 critical band from the low cutoff frequency of the hearing loss (Noreña et al. 2002b). In the case of an audiogram notch, the pitch of tinnitus is similar to that of a ZT induced by a corresponding NN (with a spectrum matching the shape of the hearing loss).

There are some similarities between our findings and those obtained after a noise-induced hearing loss that are potentially linked to tinnitus. Indeed, it has been found that after a chronic noise trauma, the SA increased in the reorganized region (Eggermont and Komiya 2000) and the percentage of correlations that differed significantly from zero was higher in the reorganized area than in the normal area (Komiya and Eggermont 2000).

However, it is important to point out that tinnitus is often associated with a hearing loss (Sirimanna et al. 1996), whereas the ZT is concomitant to an improvement of threshold. This difference can be explained by assuming that a NN simulates a hearing loss in such a way that it only mimics its central effects, whereas the peripheral auditory system remains intact. An increase in driven FR of neurons within the notch could then account for the improvement in threshold. On the other hand, the peripheral damages associated with a hearing loss might prevent an improvement in threshold potentially induced by similar central changes.