Introduction

All volitional movements initiate sensory events. Compared to externally generated sensory events, self-generated sensory events (reafference) are reduced in perceptual salience. This reduction is thought to be due to the processing of an internal representation or the so-called efference copy of the motor command that is used to predict the sensory consequences of a self-generated movement. Efference copy is crucial in the visual domain as it provides the information necessary to create the forward predictions that ensure stability of visual images is maintained during eye/head movements (Sperry 1950). Another role of efference copy is to reduce the sensory consequences of movements, termed “reafference”. For example, during gait, central processing of sensory outflow from the legs is reduced around the time of heelstrike (Duysens et al. 1995) and similarly, during masticatory movements, sensitivity to stimulation of the teeth is reduced during jaw closing (Sowman et al. 2010). Unmodified, these large reafference signals would inhibit actions by interfering with smooth movements.

Disturbance of the normal functioning of efference copy mechanisms has been implicated in a number of psychiatric and neurological disorders. For example, disruption of efference copy has been implicated in the disturbed sense of agency that is a symptom of schizophrenia (Ford et al. 2008; Synofzik et al. 2010).

Efference copy mechanisms have also garnered much attention in the study of speech motor control. During speech, every utterance produces reafference in the form of auditory stimulation. Electrophysiology has shown that auto-stimulation of the cortical auditory areas by self-generated speech produces a significantly reduced response compared to when the same sounds are played back from a recording (Curio et al. 2000; Ford et al. 2001; Houde et al. 2002; Heinks-Maldonado et al. 2005; Flinker et al. 2010).

Such studies demonstrate that an accurate mapping of the auditory consequences of speech production is an integral part of the speech motor control mechanism. Furthermore, a disturbance of this mechanism has been implicated in stuttering where it is thought that reduced suppression of auditory reafference could interfere with ongoing speech, disrupting fluency (Maraist and Hutton 1957).

In practice, it is difficult to directly study efference copy suppression of reafference during speech because a significant proportion of self-generated auditory stimulation comes via direct conduction of sound through the skull and jaw bones. It is a very complex task to match the quality and intensity of auditory stimulation produced by self-generated versus externally generated sound, and any discrepancy is likely to be a significant confound. Fortunately, the motor-to-sensory mapping that occurs during speech seems to be a generalisable motor control property, in that sensory events evoked by sounds that are generated by other motor actions, even those that are tool mediated (e.g. the bang of a hammer strike), are also attenuated by efference copy mechanisms. In a laboratory setting, efference copy modulation of auditory cortical activity can be demonstrated using a simpler paradigm, which compares auditory cortex activity elicited by tone that is self-initiated by a button press to identical tones that are externally generated. Such studies have demonstrated a reduced auditory-evoked response to self-initiated stimuli (Schafer and Marcus 1973; McCarthy and Donchin 1976; Martikainen et al. 2005; Baess et al. 2008, 2009; Aliu et al. 2009).

The current study used this paradigm to investigate two aspects of the auditory suppression by efference copy phenomenon. The first is the extent to which reductions of the auditory-evoked potential might reflect the effects of temporal certainty of stimulation as opposed to efference copy mechanisms per se. While there have been a number of recent studies that use self-stimulation paradigms to illustrate efference-copy suppression, the contribution that temporal certainty has in mediating the effects of efference-copy suppression is less well studied. A self-initiated stimulus inherently contains powerful timing cues, creating temporal certainty about the onset of the stimulus. Furthermore, the response mapping that occurs with repetitive stimulation creates spatial and magnitude certainty. Stimulus certainty, especially in the temporal domain, is known to reduce the subjective magnitude of sounds (Weiss et al. 2011) and the magnitude of auditory evoked responses measured with EEG (Schafer et al. 1981).

Two notable studies that have considered these two processes in parallel generate somewhat conflicting results. A recent study by Lange (2011) suggests that temporal cueing does not mediate the suppression of N1 seen during self-initiation while earlier work by Ford et al. (2007) indicates that temporal warning can cause a reduction of the N1 that is smaller but similar in amplitude to the reduction caused by self-initiation.

The second aim was to assess the extent to which efference copy suppression in the auditory system is lateralised. In the somatosensory domain, it is thought that efference copy suppression is contralateralised such that the suppression of an evoked response to self-stimulation is largely manifest in the contralateral hemisphere (Rossini et al. 1999). However, the arrangement of the auditory system is such that unilateral inputs project bilaterally, and it is unknown how unilateral motor activations interact with this bilateral sensory representation. With regard to using efference copy suppression as a measure of abnormal auditory motor integration in disorder populations, knowledge of the extent of its lateralisation is essential: a common feature of range of neurological disorders including stuttering and schizophrenia is a reduction of functional lateralisation in the brain (Gur 1977; Foundas et al. 2003).

Method

Participants

Fifteen subjects (9 female and 6 male) participated in this experiment. Their ages ranged from 20 to 31 years (M = 24, SD = 3.7). All participants were right handed (Oldfield handedness inventory score M = 82.3, SD = 23.4). All participants provided written consent and were paid for their participation. Experiments were approved by the Macquarie University Human Ethics Committee.

Stimuli and apparatus

Participants were tested in a dimly lit, quiet room. They were seated in a comfortable chair that did not rotate or swivel. They were then fitted with the EEG cap, which was positioned such that the electrode Cz was over the vertex. After being fitted with EEG electrodes, participants were shown the written instructions for the task and were given the opportunity to clarify any questions they may have had. They held, using both hands, a button box that had two buttons, one for each thumb. EEG activity was recorded using a BioSemi active electrode EEG system connected by optic fibre cable to a Dell Precision, T3400 computer. Continuous EEG was acquired at 2,048 Hz through a 64-channel Biosemi ActiveTwo AD-box. EEG electrode placement conformed to the international 10/20 standard. The Biosemi EEG system is a so-called zero-reference system whereby two extra electrodes, the common mode sense (CMS) and driven right leg (DRL), replace the ground electrode of conventional systems. Experimental presentation and stimulation was controlled by Presentation software (Presentation 14.4, Neurobehavioral Systems, Albany, USA). The stimuli were presented on a 45.5-cm Viewsonic monitor with a refresh rate of 100 Hz. Auditory stimuli were presented via Etymotic ER-2 insert earphones.

The experiment consisted of four different conditions (Fig. 1).

Fig. 1
figure 1

Experimental procedure. a Cue Motor Tone (CMT) condition. Each trial was initiated with a fixation cross which after a variable delay (VT1) changed to a cue to press the left (L) button (50 % of trials were left and 50 % right). After the duration of the participants response time (RT), the button was depressed by the thumb, and a tone was played to the ear that corresponded to the cued side. b Cue Motor (CM) condition. The CM condition was identical to the CMT condition except for the absence of the tone. c Cue Tone (CT) condition. This condition was identical to the CMT condition except that the participant was instructed at the beginning of the block not to respond to the cue. The RT (time between the cue and the tone was set to be the average of the RT in the preceding CMT block). d Tone only (TO) condition. Tones were played at variable intervals (VT2) in the presence of the fixation cross

The first condition, which we will henceforth refer to as ‘Cue Motor Tone’, consisted of a fixation cross, followed by a cue to press either the right or left button. The duration of the fixation cross was randomly varied between 1,000 and 3,500 ms after which time, a black letter (either L or R) against a white background appeared in the centre of the screen. The letter R appeared on half the trials and the letter L appeared on the other half and their order was randomised. When subjects pressed the button in response to the cue, a tone [1 kHz sinusoid, 80 dB SPL (sound pressure level), 400 ms] was played monaurally to the ear corresponding to the cue. This first condition was utilised in order to determine the extent to which the ERPs are suppressed as a result of self-initiation combined with temporal certainty.

The second condition, which we will henceforth refer to as ‘Cue Tone’, consisted of the same visual stimuli as condition one. However, prior to the start of this condition, subjects were instructed not to respond to the cue. Following the cue, a tone, as in condition one, was then played monaurally; the mean response time gathered from the preceding Cue Motor Tone condition was used as the time of the auditory stimulus onset. This second condition was used to determine the extent to which the ERPs are suppressed as a result of temporal cueing in the absence of self-initiation.

The third condition, which we will henceforth refer to as ‘Cue Motor’, was the same as condition one except that the amplitude of the auditory stimulation associated with the button press was set to zero. This third condition was utilised in order to determine the extent to which the ERPs seen in the (first) self-initiation condition are affected by the motor response of button pressing.

The fourth condition, which we will henceforth refer to as ‘Tone Only’, was the same as the second condition except that there was no cue. This last condition functioned as the control condition, that is, it revealed the ERP amplitude to auditory stimulation when there was no temporal cue or self-initiation. Subjects were presented with a continuous fixation cross and had auditory stimuli played to them with the same timing as in condition 2.

Each condition block consisted of 50 trials, and each block was repeated four times in the above sequence such that the total number of trials for each experiment totalled 800. Each participant completed all sixteen blocks in a single testing session that lasted approximately 90 min.

EEG data processing

ERPs were analysed using SPM 8 (Wellcome Institute, London, UK) running on Matlab R2010a (The Mathworks, Natrick, USA). Data were re-referenced to averaged mastoids, down sampled to 250 Hz and bandpass filtered 0.1–40 Hz. For all trials, the analysis epoch was −100 to 500 ms relative to stimulus onset.

Data reduction

In order to reduce the dimensionality of the data and to investigate the extent to which suppression of auditory responses was lateralised, a source waveform was extracted from each cortical auditory area using a tangentially oriented dipole spatial filter (Ponton et al. 2002). This procedure resulted in two source waveforms for each subject for each condition, one in the right auditory cortex and one in the left.

To validate the locations of these sources, a distributed sources inversion of the grand mean ERPs for the tone containing conditions (i.e. all conditions except Cue Motor) was performed using the Greedy Search method (SPM8). Source power was then averaged across these conditions, and the point of peak power for each of two largest clusters (Fig. 2) was used as the source extraction location.

Fig. 2
figure 2

Average source map for the grand mean event-related potentials for all the tone containing conditions. The auditory-evoked response for the Cue Motor tone, Cue Tone and Tone only conditions was inverted into source space using SPM8, and the resultant source power maps averaged across conditions. Left panels show sources overlaid on a glass brain. Right panels show sources overlaid on a template brain. The red arrowhead and blue cross hairs indicate the voxel of peak intensity for the inversion. This was located in Brodmann area 42 of the superior temporal gyrus in the left hemisphere. These locations were used for the extraction of lateralised source waveforms

To remove the effect of the slow motor potential on the source waveform from the Cue Motor Tone condition, the Cue Motor condition was subtracted from the Cue Motor Tone condition. All subsequent references to the Cue Motor Tone condition refer to the Cue Motor Tone after the Cue Motor condition was subtracted.

Statistical analysis

For each of the three conditions [Cue Motor Tone (after correction), Cue Tone and Tone Only], the baseline to peak amplitude of both the N1 and P2 was extracted from the source waveforms for each subject. This value was then submitted to a 3-factor repeated measures ANOVA, the factors being side of stimulation (left or right), hemisphere (ipsilateral or contralateral to the stimulus) and condition (Cue Motor Tone, Cue Tone and Tone Only). For analysis of the Cz ERP, both the N1 and P2 amplitudes were submitted to a 2-factor repeated measures ANOVA, the factors being side of stimulation (left or right), and condition (Cue Motor Tone, Cue Tone and Tone Only). Post hoc tests to compare means were corrected by the Bonferroni method for multiple comparisons. All statistical analysis was performed using IBM© SPSS© Statistics (v19). Results are presented as mean ± SEM.

Results

Response times

On average, participants responded 692 ± 19 ms after the cue with their right hands in the CMT condition and after 685 ± 19 ms with their left hands. There was no significant difference in response time between hands.

Source localisation

Inversion of the grand mean waveforms resulted in an average source map that consisted of two primary sources. The peak intensity of the source in the left hemisphere was situated in the Superior Temporal Gyrus (STG), BA42 (MNI coordinates: −62, −32 10). The right hemisphere source was also located in the STG (MNI coordinates: 52, −32 6) though in a slightly inferior location (Fig. 2).

Source waveforms

Extraction of the source waveforms returned ERPs that were characterised by the classic auditory event-related P1–N1–P2 morphology (Bressler and Ding 2006). However, the ERP for the Cue Motor Tone condition was superimposed on a slow motor potential. The waveform representing the ERP for the Cue Motor condition closely matched the overall negative trend evident in the compound Cue Motor Tone ERP (Fig. 3).

Fig. 3
figure 3

Grand mean source ERPs for the conditions containing motor activity resulting from button presses. Left column contains source waveforms extracted from the left hemisphere auditory source. Top row contains source waveforms for auditory stimulation of the right ear and the bottom row the same waveforms for auditory stimulation of the left ear. A slow motor shift is evident in all traces. Superimposed on this shift is the auditory-evoked response from the Cue Motor Tone condition. Cue Motor was subtracted from the Cue Motor Tone condition prior to subsequent statistical analysis of N1/P2 amplitudes. N1 and P2 peaks are shown for the Cue Motor Tone condition in the first panel

After subtraction of the slow motor wave, the effect of the ERP conditioning by self-initiation and temporal cueing is evident (Fig. 4). The N1/P2 complex was largest for the Tone only condition and smallest in the Cue Motor Tone condition. The Cue Tone condition evoked an intermediate state between the other two. Also evident in the grand mean source waveforms is an amplitude bias towards the contralateral hemisphere.

Fig. 4
figure 4

Average source waveforms for the 3 conditions tested. Small panels Left column contains source waveforms extracted from the right hemisphere maximum. Top row contains source waveforms for auditory stimulation of the right ear. Cue Motor Tone (CMT) condition has had Cue Motor subtracted from it. A stepwise reduction in the N1/P2 amplitude is evident across the conditions from the largest response evoked by the Tone Only condition (black trace), through to the smallest response evoked in the CMT condition (red trace). In general, the Cue Tone condition (blue trace) evoked an N1/P2 that was intermediate in amplitude. Large panel Mean source waveforms for the 3 conditions tested averaged across the 4 condition/side combinations

N1

Statistical analysis of the N1 amplitude revealed that there was a significant main effect of condition (F (2,13) = 4.0, p = 0.043). While the amplitude of both the Cue Tone and Cue Motor Tone N1 was reduced compared to Tone only, post hoc tests show that only the difference between Cue Motor Tone and Tone only was different (p = 0.041). There was no significant difference between the N1 amplitudes of Cue Motor Tone and Cue Tone (p = 0.26). Additionally, there was a significant main effect of hemisphere (F (1,14) = 7.0, p = 0.019) on the amplitude of the N1; the contralateral N1 was on average 19 % larger than the ipsilateral N1. There were no statistically significant interactions between factors.

P2

Statistical analysis of the P1 amplitude revealed that there was a significant main effect of condition (F (2,13) = 4.1, p = 0.041). While the amplitude of both the Cue Tone and Cue Motor Tone P1 was reduced compared to Tone only, post hoc tests show that only the difference between Cue Tone and Tone only was different (p = 0.047). There was no significant difference between the N1 amplitudes of Cue Motor Tone and Cue Tone (p = 1.0). There was no statistically significant effect of side or hemisphere or interaction between factors (Fig. 5).

Fig. 5
figure 5

Effects of condition on N1 and P2 amplitudes. There was a significant main effect of condition on both the N1 and the P2 amplitude. Post hoc comparisons showed that the N1 was reduced significantly by self-initiation (Cue Motor Tone) compared to Tone only, whereas the largest reduction of the P2 was caused by temporal cueing (Cue Tone). Statistically significant post hoc comparisons are indicated by the horizontal bar. Amplitudes for source waveforms reported in arbitrary units. TO Tone only, CMT Cue Motor Tone, CT Cue Tone

Erp

The auditory N1/P2 complex is maximally measured at the vertex (Goff et al. 1977; Naatanen and Picton 1987). In order to verify the validity of the morphology N1/P2 source waveforms in the current study, we calculated the grand mean ERPs at the vertex (Fig. 6).

Fig. 6
figure 6

Event-related potentials from the vertex electrode (Cz). Upper panel shows the response to auditory stimulation of the right ear, lower panel left ear. The largest N1/P2 complex is evident for the Tone only condition (black trace) for both ears. The Cue Motor Tone condition (red trace) and Cue Tone (blue trace) are both reduced in comparison

Discussion

The current study investigated the effects of self-initiation and temporal cueing on auditory-evoked responses. We have demonstrated that a temporal cueing condition that mimics the conditions of cued self-initiation creates reductions in N1- and P2-evoked responses that are consistently smaller than responses to tones that are not temporally predictive. These reductions are similar in magnitude to those produced by self-initiated sounds.

The effects of self-initiation and temporal cueing

Recent theoretical formulations of forward modelling processes in motor control—and in particular in the control of speech processes (Rauschecker 2011)—have given rise to a resurgence of interest in the process of efference copy suppression of self-generated afference. A number of recent studies have demonstrated that suppression of auditory evoked responses occurs when the onset of the stimulus is self-initiated (Martikainen et al. 2005; Baess et al. 2008, 2009; Aliu et al. 2009; Horvath et al. 2012; Knolle et al. 2012). This phenomenon has been interpreted as evidence for motor-related inhibition of predicted sensory processes (Lange 2011). However, none of these studies have ruled out a role for temporal cueing that is known to be a powerful suppressor of auditory-evoked responses (Schafer et al. 1981; Clementz et al. 2002; Lange 2009).

Recently, the hypothesis that efference copy suppression might in part be due to temporal cueing effects has been directly investigated by Lange (2011) using a button press auditory self-stimulation paradigm. This author showed the auditory N1 to be significantly smaller for self-initiated tones that occurred at either predictable or unpredictable delays from the time of the button press than for tones that occurred at these same delays but were generated externally. Lange concluded that motoric suppression of ERPs evoked by self-initiated tones occurs independently of temporal cueing.

While Lange demonstrated a significant amplitude difference between ERPs to self-initiated and cued tones, her lack of a tone only control makes it difficult to evaluate the extent to which embedded cueing effects might contribute to the suppression effect that occurs with self-initiation. Furthermore, the lack of comparison to a control condition makes it impossible to ascertain whether her cueing task was causing the expected suppression effect. In Lange’s study, one would expect that the cued condition, in which the tone followed an unpredictable delay, would be equivalent to a tone only control, in which case a significant suppression relative to this condition should have been observed in the cued condition where the tone followed at a predictable delay. However, the lack of an interaction between the conditions of ‘Source’ (cued or self-initiated) and ‘Accuracy’ (predictable or non-predictable delay) suggests that little cueing-induced suppression was present in their study (Lange 2011). Additionally, without a control condition, it remains possible that the difference effect was due to an enhancement of the cued ERP relative to the self-initiated ERP. Given that attentional manipulations that closely resemble the catch-trial structure method used by Lange (2011) have been shown to enhance auditory N1 (Teder et al. 1993), such a possibility is distinct. Our results with regard to the N1 reduction caused by temporal cuing are largely in agreement with those of Ford et al. (2007). Their study, comparing temporal warning and self-initiation, showed similar reductions in N1 amplitude for both conditions in control subjects. Similar to Ford et al., the reduction in N1 with temporal cueing in our study was not significantly different from uncued tones, but there was a trend towards a reduction. When viewed in the context of the findings of Ford et al., this indicates the likelihood of a contribution to efference copy-mediated N1 suppression by temporal certainty. Furthermore, we did find a significant reduction in the P2 amplitude with temporal cueing. Interestingly, a recent study by Knolle et al. (2012) found a significant reduction in both the N1 and the P2 during a self-initiation task. Furthermore, their comparison between controls and patients with cerebellar lesions showed that while N1 suppression was diminished in the presence of cerebellar lesion, P2 suppression was not. Given that the current data suggest that temporal certainty plays a significant role in P2 suppression, it might be hypothesised that N1 suppression better reflects ‘true’ efference copy effects, whereas P2 suppression is a better correlate of the suppressive effects of self-initiation contingent temporal certainty.

With the novel temporal cueing control used in the present study, our results suggest that both temporal cueing and self-initiation produce markedly similar effects, albeit slightly larger in absolute magnitude for the self-initiated tones. In the current study, when the auditory stimulus was applied to the right ear, there was a significant reduction in the amplitude of the N1/P2 complex for both cuing and self-initiation as compared to the control, Tone only, condition. We found no significant difference between the cued and self-initiated responses, though it must be noted that when comparing the two conditioned N1 responses to the Tone only condition in the main effect analysis, only the Cue Motor Tone effect was significantly different from the Tone only condition. These results are consistent with the explanation that the efference copy suppression effect is mediated, at least in part, by temporal predictability effects. Other recent evidence take this conclusion a step further, suggesting that N1-reduction by non-speech self-initiated sounds may not be caused by the efference copy mechanism at all, rather it may be brought about by action-sound coincidence (Horvath et al. 2012).

The present results cannot exclude the possibility that suppression of the ERPs by self-initiation and cueing occurs via two independent processes, in which case it might be argued that we have demonstrated separate processes with coincident actions and magnitudes. However, given the inherent temporal quality of the efference copy process, it seems intuitively plausible that temporal prediction is an intertwined process. We therefore suggest that while there may be some motor-only aspect of the efference copy suppression effect, as evidenced by the slightly stronger effect of self-initiation over temporal cueing, any model of the efference copy suppression effect must account for non-specific temporal cueing mechanisms.

Lateralisation of efference copy suppression

In the current study, we have demonstrated that, for the auditory system, efference copy suppression is bilaterally equal. While there was a main effect of hemisphere that agreed with the well-known contralateral advantage (Connolly 1985), there was no differential suppression of either hemisphere (ipsilateral or contralateral) as evidenced by the lack of a condition by hemisphere interaction.

Limitations

A methodological consideration that might in future be investigated directly is the effect that proactive withholding of a prepotent response in the CT condition. While studies have shown effects of reactive response inhibition on ERP amplitudes (Dimoska et al. 2006), the effect of proactive inhibitory tone on sensory-evoked responses is unknown. We are confident that in the current study, the block design we have used would have negated any ongoing inhibitory influences, though this should be studied directly in future.

Summary

The current study has shown that self-initiated tones evoke auditory ERPs that are suppressed in amplitude relative to matched, externally initiated tones. We additionally show that temporal cueing similarly reduces auditory ERPs, albeit to a lesser degree for the N1. This effect is bilaterally symmetrical.