Slip or fallacy? Effects of error severity on own and observed pitch error processing in pianists

Albrecht, Christine; Bellebaum, Christian

doi:10.3758/s13415-023-01097-1

Slip or fallacy? Effects of error severity on own and observed pitch error processing in pianists

Research Article
Open access
Published: 17 May 2023

Volume 23, pages 1076–1094, (2023)
Cite this article

Download PDF

You have full access to this open access article

Cognitive, Affective, & Behavioral Neuroscience Aims and scope Submit manuscript

Slip or fallacy? Effects of error severity on own and observed pitch error processing in pianists

Download PDF

Christine Albrecht¹ &
Christian Bellebaum¹

1205 Accesses
1 Citation
Explore all metrics

Abstract

Errors elicit a negative, mediofrontal, event-related potential (ERP), for both own errors (error-related negativity; ERN) and observed errors (here referred to as observer mediofrontal negativity; oMN). It is unclear, however, if the action-monitoring system codes action valence as an all-or-nothing phenomenon or if the system differentiates between errors of different severity. We investigated this question by recording electroencephalography (EEG) data of pianists playing themselves (Experiment 1) or watching others playing (Experiment 2). Piano pieces designed to elicit large errors were used. While active participants’ ERN amplitudes differed between small and large errors, observers’ oMN amplitudes did not. The different pattern in the two groups of participants was confirmed in an exploratory analysis comparing ERN and oMN directly. We suspect that both prediction and action mismatches can be coded in action monitoring systems, depending on the task, and a need-to-adapt signal is sent whenever mismatches happen to indicate the magnitude of the needed adaptation.

Common mechanisms in error monitoring and action effect monitoring

Article 01 August 2018

Contrasting time and frequency domains: ERN and induced theta oscillations differentially predict post-error behavior

Article 17 April 2020

Mindfulness Meditators Do Not Show Differences in Electrophysiological Measures of Error Processing

Article 08 February 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

During the past 30 years, researchers have intensely investigated the neural correlates of error processing (Falkenstein et al., 1991; Gehring et al., 1993; Jessup et al., 2010; Ullsperger et al., 2014). Contrasting errors versus correct actions showed that error processing involves several areas at the medial wall of the prefrontal cortex (this region will be referred to as medial prefrontal cortex or mPFC), including the anterior cingulate cortex (ACC; Debener et al., 2005; Ullsperger et al., 2014). An event-related potential (ERP) component investigated in the context of error processing is the error-related negativity (ERN), a negative-going frontocentral deflection that peaks around 100 ms after an erroneous response (Falkenstein et al., 1991; Falkenstein et al., 2000; Gehring et al., 1993, Gehring et al., 2012; Holroyd and Coles, 2002). The ERN appears to be generated in the mPFC, probably the ACC (Debener et al., 2005; Dehaene et al., 1994; Ridderinkhof et al., 2004; Taylor et al., 2007).

Not all researchers agree that the mPFC is primarily involved in error processing. Thus, it also has been questioned whether the ERN reflects error processing per se. Apart from a conception in terms of conflict monitoring (Botvinick et al., 2001; Carter et al., 1998; Yeung et al., 2004), Holroyd and Coles (2002) suggest that the ERN represents a reinforcement learning signal that is used to optimize performance. This signal would not only include the information if an action is right or wrong, but, in case, also the extent of the error as well as whether the event was more or less unexpected—representing a signed prediction error. The more recent predicted-response outcome model (PRO model; Alexander and Brown, 2011) states that mPFC activity reflects unexpected events, e.g., outcomes and actions, rather than errors, i.e., an unsigned prediction error (Gawlowska et al., 2018; Jessup et al., 2010; Wessel et al., 2012). Although there is initial evidence supporting the PRO model in the sense that error and surprise processing rely on similar neural mechanisms (Jessup et al., 2010; Wessel et al., 2012), there also is reason to believe that the mPFC codes information that is particularly relevant for error processing (Hajcak et al., 2005; Maier et al., 2012; Maier and Steinhauser, 2016), which is more in line with the reinforcement learning theory (Holroyd and Coles, 2002).

Both the reinforcement learning theory (Holroyd and Coles, 2002) and the PRO model (Alexander and Brown, 2011) imply that an ACC-driven response-locked neural signal is sensitive to (prediction) error size. Indeed, previous empirical studies have found that ERN amplitude can vary in different contexts and conditions. For example, ERN amplitude is enhanced when errors are particularly significant or participants are more motivated (Ganushchak and Schiller, 2008; Gehring et al., 1993; Hajcak et al., 2005). The actions themselves in these studies were, however, always classified in a binary fashion as either right or wrong. Action valence can vary more gradually than just distinguishing right versus wrong. In sports, music or many other motor-cognitive tasks, people can diverge from the correct movement on a scale from “perfect” to “completely wrong.” In everyday language, we use terms such as slip or fallacy, which also suggests that we distinguish between errors of different severity. The ACC receives input from both motor and cognitive brain areas and is supposedly involved in the planning and regulation of behavior (Devinsky et al., 1995), making it a crossroad for correction and adaptation (Holroyd and Coles, 2002). For this function, the system needs to know how much adaptation is needed: for example, when a pianist hits a key one or two notes amiss and must adapt their hand position within milliseconds to hit the next note. Taking into account the function of the ACC, the variability of ERN amplitudes in different contexts (Ganushchak and Schiller, 2008; Gehring et al., 1993; Hajcak et al., 2005) and the early processing needed for error (severity) detection in order to adapt behavior, it is conceivable that error severity is processed early after error commission in the time window of the ERN. We thus assume that the ERN as a fast indicator of information related to error processing codes action valence on a spectrum and not as an all-or-nothing phenomenon, thus reflecting error severity. Although the PRO model (Alexander and Brown, 2011) assumes effects of error severity in the sense of the magnitude of a prediction error, the previously mentioned findings would support the view that the mPFC/ACC is, at least partially, involved in representing performance accuracy and not entirely driven by event expectancy, as stated by the PRO model (see for example Maier and Steinhauser, 2016 for conflicting results regarding the model), which is more in line with the reinforcement learning theory (Holroyd and Coles, 2002), stating that the ERN reflects a learning signal to optimize performance.

Initial evidence supporting the assumption of a continuous encoding of error severity stems from studies comparing different types of responses yielding different error types (under-reach vs. over-reach, Murata and Katayama, 2005; hand vs. finger, Falkenstein et al., 2000; corrected vs. uncorrected, Paas et al., 2021). An effect of error size has been described in two paradigms in which wrong actions in either one (single error) or two (double error) dimensions were possible (Bernstein et al., 1995; Maier et al., 2008, 2012; Maier and Steinhauser, 2016): double errors led to significantly larger ERN amplitudes than single errors. These results, however, also may be explained by two parallel action monitoring processes for both dimensions, each coding accuracy in a binary fashion, that add up to an increased ERN. It has yet to be investigated whether different degrees of deviations from the aspired action indeed lead to correspondingly increased neural responses in action monitoring regions.

The reinforcement learning theory (Holroyd and Coles, 2002) implies that the ACC acts as a motor control unit, and therefore, an ERN should only occur when the person has acted in some way. In contrast, the processing of observed actions has been suggested to involve similar brain areas as the processing of self-actions, such as the mPFC, specifically the ACC (Yoshida et al., 2012; Koban and Pourtois, 2014) and presupplementary and supplementary motor areas (Scangos et al., 2013), with additional activity, inter alia, in the superior temporal sulcus (Ninomiya et al., 2018), inferior frontal gyrus (Shane et al., 2008), and anterior insula (Cracco et al., 2016; Koban and Pourtois, 2014). Accordingly, observed errors have been reported to elicit an ERP component corresponding to the ERN, the observer error-related negativity (oERN) at frontocentral sites (Bates et al., 2005; de Bruijn and von Rhein, 2012; Miltner et al., 2004; van Schie et al., 2004). Source localization suggests the origin of the oERN also in the mPFC (van Schie et al., 2004), probably in the ACC (Miltner et al., 2004). Compared with the ERN, the oERN displays smaller amplitudes and peaks later relative to the eliciting event, which is an observed action and thus a visual stimulus rather than an own motor response, with the latency depending on the task (Bates et al., 2005; de Bruijn and von Rhein, 2012; van Schie et al., 2004). Research in observed error processing, as in own error processing, has mostly focused on binary response classifications in terms of accuracy (Bates et al., 2005; de Bruijn & von Rhein, 2012; Kobza and Bellebaum, 2013). Recent evidence from our lab indicated, however, that observed responses are processed primarily based on their expectancy and not their accuracy (Albrecht and Bellebaum, 2021a, 2021b; Desmet et al., 2014), which might lead to differences compared to active responding with respect to effects of error severity. For observed action monitoring, the PRO model (Alexander and Brown, 2011) seems to fit empirical results better than the reinforcement learning theory (Holroyd and Coles, 2002).

To investigate whether the ERN does indeed reflect a signal for action adaptation, and to compare effects on own and observed action monitoring, we investigated the effects of error severity in both an active and observation condition. So-called sequential tasks, such as typing or playing the piano (Herrojo Ruiz et al., 2009; Kalfaoğlu et al., 2018; Maidhof et al., 2009; Maidhof et al., 2013; Paas et al., 2021), appear to be particularly suitable to study error severity effects. In these tasks, errors are frequent and participants stay seated while performing a (highly practiced) everyday motor task that is ecologically valid and not dependent on feedback (Herrojo Ruiz et al., 2009). Typically, the ERN occurs 20-100 ms before the response in sequential tasks (Herrojo Ruiz et al., 2009; Kalfaoğlu et al., 2018; Maidhof et al., 2009; Paas et al., 2021) and thus earlier than in tasks involving a single response (Falkenstein et al., 1991; Gehring et al., 1993). Maidhof et al. (2013) showed that potential errors are noticed earlier with regard to the registered keypress (probably due to earlier movement onset compared with nonsequential tasks), and earlier error registration is associated with shorter ERN latencies (Di Gregorio et al., 2022). Furthermore, error monitoring and error severity processing are especially important for adaptation during sequential tasks.

In the present study, we thus conducted two experiments with pianists. In Experiment 1, participants played piano pieces which included frequent changes of hand positions, thereby provoking small and large errors. While participants played, both EEG and behavioral data were assessed. Videos recorded during Experiment 1 served as stimuli for Experiment 2, in which participants watched videos of other pianists performing while EEG data were assessed in the observers. With these experiments we aimed to investigate two main questions: First, are ERN amplitudes enhanced for larger compared to smaller errors? Second, is a similar effect found also for observed errors?

Experiment 1

In Experiment 1, we studied effects of error severity on error processing during active piano playing. Apart from the neural processing of errors, the piano-playing paradigm allows investigation of relevant behavioral variables. First, post-event reaction times can be assessed. A relative slowing of reaction times after errors is a well-studied phenomenon (Rabbitt, 1966, 1969), possibly linked to an attentional shift towards the error (or unexpected event), resulting in an attention reorienting process back to the task that underlies the longer reaction times (Notebaert et al., 2009; Núñez Castellar et al., 2010). Post-error slowing is presumably modulated by activity in the ACC (Danielmeier et al., 2011; Debener et al., 2005; Fu et al., 2019), but findings on the relationship between ERN and post-error slowing are mixed (Chang et al., 2014; Debener et al., 2005; Gehring et al., 1993; Hajcak et al., 2003). Possibly, some factors influence post-error slowing and the ERN differently (such as expertise, Jentzsch et al., 2014, or error awareness, Nieuwenhuis et al., 2001), leading to a dissociation in respective tasks. Post-error slowing also has been observed in piano-playing tasks (Herrojo Ruiz et al., 2009; Paas et al., 2021). A second variable of interest is keypress volume (assessed as velocity), as error notes were played significantly more quietly than correct notes in previous piano-playing studies (Herrojo Ruiz et al., 2009; Maidhof et al., 2009; Maidhof et al., 2013; Paas et al., 2021). Because larger errors might lead to a larger focus of attention on the error, enhanced post-error slowing was expected for large compared to small errors. Additionally, quieter keypress volumes of error keypresses compared with correct keypresses were expected, but as the processes behind the volume reduction are not yet established, we refrain from predicting differences between small and large errors regarding volume.

Method

Participants

We recruited experienced pianists to take part in the study via social media, person-to-person recruiting, and flyers distributed at the university, music conservatory, and music schools. Because the pieces included large steps between keys to induce errors and the pieces were thus difficult to learn, we suggested a minimum experience of 1,500 hours spent with the instrument, although participants were allowed to take part with less experience if they were able to play the pieces fluently. We aimed for a sample size of at least 20 participants, because this sample size seems to be adequate for sequential tasks (Herrojo-Ruiz et al., 2009; Kalfaoğlu et al., 2018; Maidhof et al., 2013). Expecting a 30% dropout-rate for fulfilling one or more exclusion criteria or due to technical problems, we originally recruited 30 participants. Of these, five were excluded due to previous neurological or psychological diseases, so data from 25 participants were recorded. Of these, one had to be excluded due to technical problems during data acquisition. Another three were excluded because they made less than ten large errors, which was especially problematic for the analysis of the ERP data (see below). The remaining sample of 21 participants consisted of 12 cis-gender women and 9 cis-gender men between 17 and 34 years (mean [M] = 23.1 years, standard deviation [SD] = 4.2 years). Twenty of them were right-handed, one person was left-handed. Please note that pianists usually play melodies with their right hand and accompaniment with their left hand, regardless of handedness, so left-handed participants should be able to perform the task as well as right-handed participants, as was the case for the left-handed participant that took part in our study. All participants reported no previous neurological or psychiatric illnesses and no intake of medication that affected the nervous system. All participants took part voluntarily. The study is in compliance with the declaration of Helsinki and was approved by the ethics committee of the Faculty of Mathematics and Natural Sciences at Heinrich-Heine-University, Düsseldorf.

Material

We designed six pieces to be played with only the right hand. All pieces consisted of 96 sixteenth notes in 6 bars and ended with a seventh bar that consisted of a single whole note. To keep the physical distance between played keys constant, all pieces were written in C major and thus only played on white keys. The pieces kept to a general harmonic structure and the highest notes played could be interpreted as a melody, the remaining notes as accompaniment. The pieces were designed to require large hand movements to induce errors. The lowest key throughout the pieces was E3, the highest key was A5. Consecutive notes could differ between 1 and 10 white keys; the average difference was 4.98 white keys (SD = 1.88 keys). The pieces were written in MuseScore 3 (version 3.6.2, MuseScore BVBA, 2021). They are included in the Supplementary Material (Figure S1).

An automatically created recording was generated for each of the pieces (created with MuseScore 3, version 3.6.2, MuseScore BVBA, 2021) in which the melody parts of the pieces were pronounced. In the recording, pieces were played at 60 beats per minute (one beat = one quarter note), and tempo at the top of the score notation also was stated as 60 quarter notes per minute.

Experimental task and setup

The pieces as well as the recording were sent to each participant 2 weeks before testing. Participants were instructed to study the pieces in the next 14 days. They were told that they should be able to play the pieces with the right hand quite fluently but that they should not strive for perfect sound and that occasional errors during play were acceptable. Participants also were told to practice in whatever tempo they felt comfortable. They were given an instruction to practice approximately 15 minutes a day (distributed as they saw fit). According to self-reports, the participants practiced the pieces 204.1 minutes on average (SD = 188.9 minutes, 45-840 minutes).

For data acquisition during the experiment, participants used a digital piano (Casio LK-S450 for most participants, two participants used a Yamaha YDP-144 R Arius). During the experiment, the keyboard was set on mute, so that participants could not hear themselves play. The piano was positioned in front of a desktop monitor (1,920 x 1,080 px) that served for visual stimulation. Participants could navigate through the experiment with their left hand and the lowest note on the keyboard. A Logitech BRIO webcam was connected to an additional laptop for recording the participants’ hand from above during play for the videos used in Experiment 2. A picture of the setup can be seen in Fig. 1. We recorded the Musical Instrument Digital Interface (MIDI) information of the played segments on the experiment computer. MIDI refers to the signal used by digital instruments to generate and communicate tones including note on- and offset, key and velocity (in piano playing, this corresponds to volume). Stimulus presentation, EEG trigger timing and MIDI recording was controlled with Python 3.7.5 using the packages psychopy (version 3.2.3; Peirce et al., 2019) and mido (version 1.2.9, Ole Martin Bjørndalen 2021, mido.readthedocs.io).

The experiment consisted of 60 sequences in total, 10 for each piece. Each sequence started with a score notation preview of the piece that was to be played (a picture of the first two bars, i.e. the first line, of the respective piece score notation, including the piece number). Participants could then start the recording which began with 4 metronome beats (1,000-hz beeps) accompanied by the numbers 1 to 4 displayed on the screen. Subsequently, the score notation of the whole piece was displayed on the screen to allow participants to play from sheet. After they finished playing the piece, participants ended the recording and proceeded to the next sequence. A display of the sequence structure can be seen in Fig. 2.

Before the experiment, participants were asked in what tempo they had practiced the pieces. Accordingly, the metronome beats were set for each participant individually to a tempo slightly faster than the tempo in which they had practiced to increase difficulty. Participants were instructed to start playing right after the last metronome beat had been presented. They were further asked to keep to one tempo (loosely that of the metronome) during each sequence and to put emphasis on playing fluently, even if that meant making errors.

The 60 sequences were preceded by 3 practice sequences in which participants could get to know the procedure of a sequence, but in which they were shown a mock preview and no actual score notation during play. They were instructed to get familiar with the instrument and the procedure during these practice sequences, and to play whatever came to their mind.

Assessment of expertise

As a measure of Piano Playing Expertise we assessed the experience participants had with their instrument, because a certain level of experience was defined as inclusion criterion (see above). Expertise was defined as total hours spent with the instrument, calculated by multiplying the self-reported number of years of piano experience with the self-reported average hours of practice per week times 52 (number of weeks per year).

EEG recording

We recorded EEG signals at a 1,000-Hz sampling rate with a 32-channel actiCap electrode cap (ActiCAP; Brain Products GmbH, Germany) with the software Brain Vision Recorder (version 1.20, Brain Products, Munich, Germany). The active silver/silver-chloride electrodes were attached according to the 10-20 system on 29 scalp sites, i.e., FCz (which was used as online reference), F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO9, P1, Pz, P2, and PO10. Additionally, we recorded the signal from both mastoids to use as offline reference. The ground electrode was placed at AFz. For electrooculogram (EOG) data, we placed two horizontal EOG (hEOG) electrodes at F9 and F10, respectively, and two vertical EOG (vEOG) electrodes at Fp2 and below the right eye. All impedances were kept below 10 kΩ.

An EEG marker was sent every fifth keystroke to avoid a possible overlap of markers (Maidhof et al., 2009). The MIDI data allowed offline determination of markers for the remaining keystrokes. We conducted a pilot test for a possible delay between key press and marker by using a Tektronix TDS 210 oscilloscope. Key presses are transformed to audio signals by the digital instrument in real-time. In the test, we therefore compared onset times between the audio and marker signal. The markers were sent consistently 1.6 ms before tone onset across all tests.

Procedure

Participants received the piano pieces 2 weeks before the actual experiment in the lab. After arrival, participants gave informed written consent to take part in the study and completed a demographic questionnaire and an expertise self-report measure.

Subsequently, EEG electrodes were attached to the scalp and participants started the experiment. Participants received written instructions and the experimenters were present during three practice sequences for questions and further explanations. At the start of the experiment, recordings of video, MIDI and EEG were started. The experiment lasted between 35 and 75 minutes, depending on the speed in which participants played. After completion of the experiment, participants received compensation in the form of either course credit or 40 €.

Data analyses

Behavioural data preprocessing and definition of event types

All following analysis steps were performed in MATLAB, version R2017b (Mathworks, Natick, MA). We employed the MATLAB MIDI Toolbox (Eerola and Toiviainen, 2004) and a dynamic score matcher algorithm created by Large (1993; see also Palmer and van de Sande, 1993; Rankin et al., 2009) to compare the recorded MIDI signal with the correct score notation the participants had been asked to play. This procedure was used to determine the different types of trials for which ERP and behavioral data were compared (see below). The algorithm finds a so-called optimal match between two MIDI sequences and assigns every played note an attribute: match, substitution (a score notation note was replaced in the performance), addition (there was an added note in the performance that could not be matched to any notation note), and miss. All substitution events were defined as “uncorrected” errors (see also below).

We then calculated the interval in white keys for substitution events between the correct score notation note and the corresponding performance note. Black keypresses were not considered in the analysis.

In the analyses, we included the event types correct, small error (one-note errors that were not corrected), and large error (two-note errors that were not corrected). All errors larger than two-note errors were excluded. Moreover, we only included error and correct events that were preceded and followed by a correctly played note, which also excludes correct notes played before or after miss events. Each of the 97 notes included in the score notation of each piece was played 10 times in the course of the experiment, which allowed us to calculate the note accuracy for every note as the percentage of times the note was played correctly. Only notes that had an accuracy higher or equal to 40% were considered in the analysis, to exclude notes that were played systematically wrong. Additionally, we only included notes for which at least one error trial and one correct trial was included to avoid confounds of note selection.

Behavioral Dependent Variables

Two behavioral measures served as dependent variables, which possibly differed between event types (correct, small error, large error). To investigate potential behavioral effects of error severity, namely on keypress volume and post-event slowing, the behavioral dependent variables Volume and Inter-Keypress-Interval (IKI) were assessed. Volume was defined as the recorded velocity in the MIDI signal of each note. IKI was defined as the difference between note onset time of the current and of the following note (see Paas et al., 2021). This maps the time delay between the event (correct, small or large error) and the subsequent correct keypress and serves as a measure of post-event reaction time, which is used to calculate post-error-slowing.

Behavioral data statistical analysis

For all statistical analyses, if not stated differently, we conducted single-trial linear mixed models (LME) analyses in R (version 3.5.3) using the package lme4 (version 1.1-23). According to best practice (Meteyard and Davies, 2020), all models should include all within-subject main and interaction effects as random effects, if this is possible without leading to model fit errors. For all subsequently described analyses, we performed an iterative process: all within-subject main and interaction effects were first included as random factors. If this led to model fit errors (singular fit or overfitting), we tested which random effect led to this error and removed this from the model. As most of our models included only the main effect Event Type, for some models this factor is included as random effect factor and for others not, depending on the model fit.

We conducted LME analyses, calculating separate models for dependent variables IKI (post-event reaction time) and volume (velocity). As independent variable, we set the three-level factor Event Type (correct, small error, large error). Small error was set as baseline condition to determine both the difference between correct and (small) errors and between small and large errors. Consequently, we created the design matrix depicted in Table 1 based on simple coding. We included random intercepts and slopes for Event Type per participant into each model.

Table 1 Design matrix of the factor event type

Full size table

With Cook’s Distance outlier detection (using the “influence” function of the package stats, version 4.02, in R) based on the calculated models (with a cutoff value of 4/(n-number of predictors-1)), we removed 4 participants from the IKI analysis (remaining n = 17, 17–34 years, M = 22.8 years, SD = 4.4 years, 9 women, 8 men) and two participants from the volume analysis (remaining n = 19, 17–34 years, M = 23.2 years, SD = 4.4 years, 11 women, 8 men). Subsequently, the models were recalculated with the new sample.

EEG data preprocessing

We recoded the EEG marker files offline by synchronizing the markers sent every five notes with the recorded MIDI data using MATLAB. The new markers were then written into new marker files, which were loaded into Brain Vision Analyzer (Brain Products, Munich, Germany). Subsequently, we applied a 0.5-Hz high-pass and 30-Hz low-pass filter to the data (as suggested by Luck, 2014). As participants read score notations while they played and were not prevented from looking down on their hand (both to obtain maximum ecological validity), vertical and horizontal eye movements occurred frequently during the experiment and the corresponding artefacts had to be removed from the EEG data. For this, we used the Gratton and Coles ocular correction algorithm (Gratton et al., 1983). The respective hEOG and vEOG channels were used as reference for eye artefact detection. The data were segmented into 900-ms–long epochs starting 300 ms before note onset. Subsequently, an automatic artifact rejection based on the signal from the electrodes of interest Fz, FCz, and Cz was performed. The artifact rejection removed all segments that included voltage steps larger than 50 μV/ms, for which the difference between highest and lowest amplitude was more than 100 μV, in which amplitudes were lower than −100 μV or higher than 100 μV, and for which activity was less than 0.1 μV. On average, 12.1 segments per participant were removed (0-146 segments, SD = 31.7 segments). This left enough segments per participant and condition for the following analyses (see also Table S3 in the Supplementary Material).

The interval between 300 and 200 ms before the event was used for baseline correction (for similar procedures in sequential task paradigms, see Herrojo Ruiz et al., 2009; Maidhof et al., 2013). Single-trial data as well as averages for each Event Type were exported per participant.

In our statistical analysis of the ERN amplitude, we used ERP data from single trials. To determine the time points for data extraction in the single trials, we took the individual participants’ averages in each condition into account, thereby applying a combination of average- and single-trial-based analyses. Thus, only participants were included who had at least ten trials in each experimental condition. The EEG signal was first pooled at Fz, FCz, and Cz, because at these sites the ERN is typically maximally pronounced, which also was the case in the present study. Because participants were allowed to play in their individual tempo, and the latencies of ERNs in sequential tasks are related to movement onset (Maidhof et al., 2013) and thus indirectly to tempo, we expected large peak latency variations between participants which were visible in single-participant data inspection (for a display of single-subject ERPs, see Figure S5 in the Supplementary Material). To determine the typical ERN latencies in each participant, we considered the participants’ averages for each Event Type and searched for the maximum negative peak in in a time window between 130 ms pre- and 130 ms post-event. Likewise, we determined the latencies of the preceding maximum positive peak in a time window between 180 ms pre-event and the negative peak (for a similar procedure, see Maier et al., 2012). We subsequently calculated the single-trial amplitude measures corresponding to the peaks in the averages as the mean signal in the time window 10 ms before to 10 ms after the negative and positive peak latency in the average, respectively. Single-trial ERN measures corresponding to an average-based peak-to-peak measure were then calculated as the difference between the two values derived for each segment. We used difference measures (amplitude around negative peak – amplitude around positive peak), as segments might partly overlap in a sequential task and subtracting the preceding positivity can partially account for differences in baseline activity which can indeed be seen in Fig. 3. In two control analyses we used only single trial values corresponding to the negative peak in the average (without subtraction of the preceding positivity) or mean amplitude values in a time window from 50 ms before to 50 ms after keypress, because the relative negativity for errors was most pronounced in this time window. This analysis yielded comparable results (see Section S9 in the Supplementary Material).

EEG data statistical analyses

We defined an LME model with ERN amplitude as dependent variable (see above for the general procedure for defining LME models). Event Type served as independent variable, coded as in the behavioral analyses (Table 1). Random intercepts per participant were included (adding Event Type as random factor led to singular fit error). No participant was excluded based on Cook’s outlier detection.

Results

Additional statistical results for all models can be found in the Supplementary Material (Tables S6, S7, and S8).

Expertise

Participants spent 7760.38 h on average playing the piano during their lifetime (range 520 h – 24,960 h, SD = 7,871.76 h, Median = 4,368.0 h; see Figure S4 for a histogram).

Behavioural data

On average, correct keypresses occurred in 90.70% of all keypresses, small errors in 3.03%, and large errors in 1.44%. All included participants made at least ten large errors. For detailed information, see Table S3.

IKI

There was a significant effect of Event Type on IKIs, F(2,15566) = 27.44, p < 0.001. Contrast comparisons revealed a significant difference between small and large errors (p = 0.001, b = 22.14), and between correct responses and small errors (p = 0.036, b = -4.71). After a large error keypress, participants took significantly longer (M = 369.66 ms, SD = 29.68 ms) to press the next key compared to after a small error keypress (M = 361.68 ms, SD = 16.30 ms), and the IKI after correct actions was shorter than after a small error keypress (M = 358.63 ms, SD = 4.99 ms).

Volume

We found a significant effect of Event Type, F(2,17509) = 88.23, p < 0.001. Both correct events (p < 0.001, b = 4.10) and large errors (p < 0.001, b = 3.96) resulted in significantly higher volume levels (M = 69.49 velocity, SD = 0.50 velocity; and M = 68.30 velocity, SD = 2.21 velocity, respectively) compared with small errors (M = 64.14 velocity, SD = 1.68 velocity).

ERN

For a display of ERPs in the three Event Type conditions, see Fig. 3 (Figure S5 in the Supplementary Material shows single subject averages). There was a significant effect of Event Type, F(2,2079.10) = 21.61, p < 0.001. Contrasts revealed significantly lower amplitudes for correct responses (M = −1.07 μV, SD = 0.34 μV) compared with small errors (M = −1.87 μV, SD = 0.96 μV; p < 0.001, b = 0.88) and significantly higher amplitudes for large errors (M = −3.16 μV, SD = 1.68 μV) compared with small errors (p = 0.004, b = −1.21). A significant difference between small and large errors remained if the single-trial ERN was quantified just based on the maximum negative peak in the average in a time window between –130 ms and 130 ms per participant and condition, and if the ERN was quantified as the amplitude mean between −50 ms and 50-ms relative to the button press, but no significant difference between small errors and correct events was found for these analyses (see Section S9 in the Supplementary Material).

Conclusion for experiment 1

In Experiment 1, we compared the processing of different error types in a piano-playing paradigm. Our results show that ERN amplitudes as well as behavioral measures vary depending on the type of error. Larger ERN amplitudes were observed for large compared with small errors, whereas all errors were accompanied by a larger ERN relative to correct responses. Post-error-slowing was seen after all error types but was largest for large errors, whereas small errors were played in a lower volume than large errors and correct keypresses. The results indicate that the action monitoring system does not only differentiate between right and wrong but also between different degrees of erroneous actions. In a post-hoc analysis on measures that might represent expectancy, we found that error severity explained the effects better than the frequency of the event, the difficulty of the respective note, and the insecurity before and during the respective keypress (see Supplementary Material, Section S2).

Experiment 2

Observing errors can be just as important as monitoring one’s own errors, for example, when musicians play together or teach others. As established above, the mechanisms of processing vicarious actions appear to be similar, albeit not completely identical, compared with those involved in the processing of own actions. Researchers observed a corresponding ERP component, the oERN (Bates et al., 2005; Miltner et al., 2004; van Schie et al., 2004), and increased activity in the mPFC for observed others’ errors (Koban and Pourtois, 2014).

As outlined for own responses above, also the neural response to observed actions can be modulated by surprise and expectancy (Alexander and Brown, 2011), as has been shown for mPFC activity (Schiffer et al., 2014) and the amplitude of a frontocentral oERN-like ERP component (Albrecht and Bellebaum, 2021a; Kobza and Bellebaum, 2013). Recent studies from our lab even suggest that previously observed valence effects for observed actions on this component can be completely attributed to expectancies (Albrecht and Bellebaum, 2021a, 2021b). The occurrence of an oERN-like component in action observation is contrary to the assumptions of the reinforcement learning theory (Holroyd und Coles, 2002), which assigns the ACC a role as motor control unit. In addition, the component seems to be primarily driven by expectancies, not valence, whereas the theory expects the signal to resemble a signed, rather than an unsigned, prediction error. Based on the empirical findings concerning the ERP component after observed actions, it is questionable whether the component is related to observed error processing at all. We will thus subsequently refer to it as observer mediofrontal negativity (oMN). The strong expectancy effect on the oMN amplitude may suggest a functional dissociation between ERN and oMN, with potentially differing effects of error severity on the two components.

As with active error processing, research on observed error processing has so far focused on a binary classification of response accuracy (Bates et al., 2005; de Bruijn and von Rhein, 2012; Kobza and Bellebaum, 2013). The observational data used in this study were taken from the actively performing participants of Experiment 1. We expected to see higher oMN amplitudes for errors than for correct keypresses, because errors were less frequent and thus more unexpected. Because there was only a slight difference between small and large error frequency in the videos for Experiment 2 and because we assumed that the oMN was mainly driven by the expectancy of the observed response, we suspected to find no difference in oMN amplitude between the error types and thus a different pattern as for own responses in Experiment 1.

To directly compare the processing of own and observed actions, we also conducted an analysis including the ERPs from experiments 1 and 2 with factors agency and event type. In this exploratory analysis, amplitude differences between the components ERN and oMN were eliminated via z-standardization. Because we hypothesized to find differences between small and large errors in the ERN, but not in the oMN, we expected to find a significant interaction between agency and event type.