What controls gain in gain control? Mismatch negativity (MMN), priors and system biases
- First Online:
- Cite this article as:
- Todd, J., Heathcote, A., Mullens, D. et al. Brain Topogr (2014) 27: 578. doi:10.1007/s10548-013-0344-4
- 345 Views
Repetitious patterns enable the auditory system to form prediction models specifying the most likely characteristics of subsequent sounds. Pattern deviations elicit mismatch negativity (MMN), the amplitude of which is modulated by the size of the deviation and confidence in the model. Todd et al. (Neuropsychologia 49:3399–3405, 2011; J Neurophysiol 109:99–105, 2013) demonstrated that a multi-timescale sequence reveals a bias that profoundly distorts the impact of local sound statistics on the MMN amplitude. Two sounds alternate roles as repetitious “standard” and rare “deviant” rapidly (every 0.8 min) or slowly (every 2.4 min). The bias manifests as larger MMN to the sound first encountered as deviant in slow compared to fast changing sequences, but no difference for the sound first encountered as a standard. We propose that the bias is due to how Bayesian priors shape filters of sound relevance. By examining the time-course of change in MMN amplitude we show that the bias manifests immediately after roles change but rapidly disappears thereafter. The bias was reflected in the response to deviant sounds only (not in response to standards), consistent with precision estimates extracted from second order patterns modulating gain differentially for the two sounds. Evoked responses to deviants suggest that pattern extraction and reactivation of priors can operate over tens of minutes or longer. Both MMN and deviant responses establish that: (1) priors are defined by the most proximally encountered probability distribution when one exists but; (2) when no prior exists, one is instantiated by sequence onset characteristics; and (3) priors require context interruption to be updated.
KeywordsAuditory evoked potential (AERP)Mismatch negativity (MMN)Predictive modellingPriors
The auditory mismatch negativity (MMN) component is elicited when a sound violates some established patterning in the acoustic stream (Winkler et al. 1996; Näätänen 1992). There are many ways to conceptualise the system producing MMN, and the way it is conceptualised is important because it defines the questions we ask and the methods we use to address them. In this paper we detail how one theory-driven design—the multi-timescale paradigm—has produced results that expose a bias in the system. The bias demonstrates a powerful influence over MMN amplitude that is not a simple function of the probabilistic rarity or physical difference of the eliciting sound (Todd et al. 2011, 2013). By examining the change in MMN amplitude over time we test the way in which the multi-timescale sequence exposes the influence of prior experience on how the acoustic stream is filtered.
A prior in Bayesian statistics encapsulates pre-existing assumptions about the probability distribution defining the likelihood of events (Griffiths et al. 2008). Although some applications of MMN explicitly assume an influence of prior knowledge (such as knowledge about language stored in long term memory, e.g. Hahne et al. 2002; Pulvermüller et al. 2008), most experimental designs assume a kind of tabula rasa—that MMN is responsive to sound purely as a function of the way a sequence has been constructed and the probabilistic information it contains. Furthermore, sequences used to elicit MMN are generally unattended with participants attention directed toward a concurrent activity (such as watching a subtitled movie, Kujala et al. 2007). Studied this way, MMN reveals exquisite efficiency in the brain’s ability to identify local patterns in the acoustic environment based on as few as 2–3 repetitions (Sams et al. 1993; Cowan et al. 1993; Bendixen et al. 2007). By learning the pattern, the probable characteristics of future states can be extrapolated. Learning creates a “prediction model”, which is expressed in altered responsiveness to new sounds as a function of past exposure (Winkler et al. 1996; Winkler 2007, 2010). MMN reflects the outcome of a process that compares current sensory input with expected sensory activation based on the prediction model (Winkler et al. 1996). As a component of the auditory evoked potential, it is quantifiable as peak negativity in a difference waveform comparing the response to a repetitive and a deviating sound, which reaches maximal amplitude approximately 100–250 ms from the point of pattern violation (Näätänen 1992; Näätänen and Winkler 1999; Kujala et al. 2007).
Due to the local sensitivity of the system, a sound initially encountered as a rare pattern deviation (referred to as “deviant”), can rapidly form the basis of a new prediction model if it subsequently becomes a repetitious and predictable sound (referred to as “standard”; see e.g. Winkler et al. 1996). Todd et al. (2011, 2013) exploit this feature to construct sequences in which two sounds alternate roles as the highly predictable standard and rare, MMN-eliciting deviant. The multi-timescale element of this paradigm refers to the fact that the rate at which roles alternate is varied across sequences creating a different second-order structure with some sequences being more volatile than others. Although Todd et al. observed MMN to both sounds when encountered as the rare deviant, the bias observed refers to the volatility having a different effect on MMN amplitude depending on whether the sound was first encountered as the standard sound or the initial rare deviation (i.e. initial roles biases susceptibility to second-order structure). In the present paper we conduct a closer investigation of how MMN amplitude changes as a function of role stability within sequences to gain new insights into why this bias occurs and how it relates to second-order learning. First, however, we review information about the system underlying MMN that is critical to understanding the chosen methodology and the subsequent findings.
To elicit significant MMN, a deviation must represent a discriminable difference from the repetitive patterning within a sequence and be statistically rare (estimates suggest a deviant probability p ≤ 0.30 is required, Kujala et al. 2007). MMN is used widely to answer a variety of questions about auditory system function and perceptual inference. Although MMN latency (time-to-peak or time-to-onset) can be very informative (Näätänen et al. 1978; Schröger and Winkler 1995; Richter et al. 2009), it is the MMN amplitude that is generally the main quantifier of interest, whether comparing groups or experimental conditions. MMN amplitude can be construed as an index of the amount of surprise generated by the stimulus based on two main factors: (1) how far the sound falls from predictions reflecting the auditory system’s assessment of the most likely continuations of the sound sequence; and (2) the system’s confidence in the violated prediction (Winkler 2007, 2010).
Friston (2005) proposed that the MMN process is a basic exemplar of a broader principle of brain function—that prediction facilitates the reduction of entropy (free-energy), which in turn optimises the distribution of limited resources. Events that can be predicted provide limited new knowledge, and it is advantageous to minimise their impact (e.g. by reduced responsiveness to predicted properties of sounds). This conserves resources for events that indicate that the world is different than anticipated and thus requires the acquisition of new learning and knowledge. Within this framework MMN can be considered a prediction-error signal (for a discussion of the relation of MMN to predictive coding theories, see Winkler and Czigler 2012). MMN indicates that an auditory input differs from predicted causes and signals the need for two related actions: firstly, the model has proven inaccurate and therefore requires updating (Winkler et al. 1996; Winkler 2007), and second, the environment has changed in some way that might require a change in ongoing behaviour (Näätänen 1990; Schröger 1997; Näätänen et al. 2011). The former action is indicated by research demonstrating that activity contributing to MMN is dependent on what model(s) the auditory system possesses at the time the deviant is encountered as opposed to the characteristics of the deviant itself (see Winkler et al. 1996; Winkler and Czigler 1998). The latter action reflects an interpretation of MMN as part of a relevance filtering process that operates as the first stage in an orienting response to the deviating event (Escera et al. 2000; Friston 2005; Näätänen et al. 2011).
Reduced responsiveness to repetition and accompanying changes in sensitivity to violations is an example of “gain control” in the sensory system. Gain control processes enable the brain to produce informative signals within a limited dynamic range by making context dependent adjustments in response (see Butler et al. 2012 for review). MMN amplitude, therefore, can be thought to reflect the degree of gain (i.e. difference in response level) to a predicted versus unexpected event. It is important to note, however, that the traditional or classic computation of MMN (a deviant minus standard difference waveform, referred from here on as MMNc) captures the “real time” differences in cortical response to the repetitive and deviating element within a sequence. This includes what can be termed MMN proper (the true memory-based comparison output; also often referred to as “genuine MMN”) but also any differential amplitude in the event-related potentials (ERPs) reflecting effects on other components that may arise, for example, from physical differences in the eliciting tones or differential refractoriness due to differences in probability (for a thorough discussion see Ruhnau et al. 2012). Isolation of MMN proper requires considerable experimental control (for a review, see Kujala et al. 2007); the majority of the research we refer to below pertains to MMNc, and the degree to which it applies to MMN proper is, for the most part, unknown.
MMNc will be affected by the degree to which the properties of the next-state prediction differ from the properties encountered. A deviation that entails a large physical difference from the predicted state will produce a larger discrepancy between the predicted and encountered neural states, and so constitutes a greater error that elicits a larger MMNc. Furthermore, a brain representing sound properties with high precision will register a more substantial discrepancy between encountered and predicted states, and so will produce a larger MMNc than a brain with lower representational precision (Näätänen et al. 1987; Näätänen 1990; Näätänen and Alho 1997). These factors reflect the quantification of the error in a literal sense (see Todd et al. 2012 for discussion).
The influence a prediction model has over responsiveness is also a function of the degree to which it has demonstrated accuracy and been reinforced (Winkler 2007, 2010). Each time a sound conforms to the next state predicted by the model, confidence in the model is increased. This assumption is formalised in dynamic causal models, where changes in intrinsic (local intra-cortical) and extrinsic (inter-cortical) connectivity explain the altered responsiveness giving rise to MMNc (Friston 2005; Garrido et al. 2009; Wacongne et al. 2011, 2012; Lieder et al. 2013). Increments in model influence are reflected by changes in predictions (changes in extrinsic connectivity) and changes in precision (changes in intrinsic connectivity). Changes in intrinsic or recurrent activity affect cortical gain control and may underlie modulation of the MMN by modulating the response of the superficial pyramidal cells generating the ERP (Friston 2005).
Put simply, MMN can be regarded as a precision weighted prediction error, where the expected precision operates as a cortical gain control. In predictive coding, this is thought to be reflected by the post-synaptic sensitivity of superficial pyramidal cells that report prediction error—and are thought to be the primary contributors to the ERP. Heuristically, the detection of a violation or deviant can be regarded as something like performing a t test to reject the null hypothesis (of a standard). Crucially, one needs both the difference in means (prediction error) and a measure of variability (the standard error) to make a reliable inference. In this setting, the t-statistic (c.f. MMN) is the prediction error times the expected precision—where precision is the inverse of the standard error.
MMNc amplitude is, therefore, highly sensitive to sound probability, because reinforcement of the model is proportional to the probability of pattern repetition and inversely proportional to the probability of deviation (Sams et al. 1983). The influence of this kind of local reinforcement is evident in the strong linear association between deviant probability and MMNc size (Näätänen et al. 1987). The influence of reinforcement has been elegantly demonstrated in paradigms that include a roving standard design, where one property of the standard alters after each deviant (Cowan et al. 1993; Baldeweg 2006). In such paradigms, the ERP elicited to sounds conforming to the repeating pattern show incremental suppression with extended repetition between deviants. For example, the ERP to a standard after 36 repeats is more suppressed than that after 24 repeats (Costa-Faidella et al. 2011)—reflecting an increased confidence or precision that is afforded to the prediction errors. There is also a concurrent increment in the additional negativity present in the ERP to the deviating event, such that the ERP to a deviant after a steady train of 36 repetitions is more negative than that after 24 repetitions. Hence, the increment in MMNc size with increased confidence in the prediction model results from changed responsiveness to both the standard pattern (repetition in the oddball paradigm) and the deviating event (change).
Longer-term influences on MMN
Strictly local reinforcement is not the only influence on predictive confidence. One of the most important factors influencing MMN size is contextual; whether or not the system processes the current sound with reference to a particular prior prediction model. It had been thought the sensory memory underlying MMN generation decays after 10 s (Cowan 1984; for an MMN demonstration, see Sams et al. 1993). However, this idea has been challenged by a number of experiments demonstrating that MMN displays the characteristics of a longer-term form of memory (for a review, see Winkler and Cowan 2005). For example, Cowan et al. (1993) showed that memory for a specific regularity within a sequence may become dormant then later be reactivated. They presented trains containing probable standard and rare deviant sounds with different inter-train intervals. The principle of reactivation (also called reinstatement by Ritter et al. 2002) was evident in MMN elicited to a deviant in position 2 of a second train of sounds presented after 11–15 s of silence following an initial train. MMN in position 2 was only observed when the tone in position 1 was identical to the standard in the first train (for a stronger test of this issue, see Winkler et al. 2002). Hence, the re-presentation of the identical standard sound in the latter train acts as a reminder, prompting the reinstatement of the prediction model established in the previous train.
The complexity of the memory system storing information about regularities is revealed by Ritter et al. (1998) who demonstrated that exact tone repetition is not necessary for reactivation. For example, when the pitch of the repetitious 70 dB standard was altered each time a new train commenced, MMN was still elicited to an intensity deviant in position 2. This finding reflects both the feature specificity of MMN and its relational nature. The first standard of a new string of frequencies still replicated the prior standard intensity, and this was enough to activate a more parsimonious rule (e.g. that sounds of any frequency should be 70 dB in intensity). Further studies of this kind have demonstrated that reactivation can occur for inter-train intervals up to 30 s (Winkler et al. 2002), and also for repeated inter-sound relations, as opposed to absolute sound features (Korzyukov et al. 2003).
In the present paper we examine what appears to be an even longer-term effect on predictive confidence, the primacy bias affecting MMNc amplitude described by Todd et al. (2011, 2013). As noted above, the bias is evident when MMNc is elicited in a sequence in which two tones of different lengths (e.g. 60 vs. 30 ms) alternate roles as a highly probable standard (p = 0.875) and a low probability deviant (p = 0.125). Each sequence in these studies contained equal numbers of the two tones, but the rate at which roles alternated varied from fast changing (12 blocks in total, changing every 160 tones = every 0.8 min) to slow changing blocks (four blocks in total, changing every 480 tones = every 2.4 min). The overall probability of deviance (p = 0.125) was the same in any block of the sequences and the physical difference was also the same (long vs. short, short vs. long), but the different speed in role alternation created standard/deviant ratios that were more stable in the slow changing sequences (i.e. a difference in sequence volatility). If the auditory system is influenced by contextual effects that assign more precision (gain) to less volatile contingencies, MMNc should be larger in slower changing sequences (where predictive confidence can build up over the longer period of role stability) and smaller in faster changing sequences (where the roles are more unstable). In contrast, if the system is governed by local probability only, MMNc should be more-or-less equivalent in both sequences.
Todd et al. (2011) found that sensitivity to role volatility (block alternation speed) depended on whether a sound had been first encountered as a deviant or a standard (i.e. it depended on initial role assignment). For the sound that was first encountered as a deviant (i.e. it rarely occurred in the first block of each sequence), MMNc was much larger in less volatile sequences. In contrast, if the sound was first encountered as a standard (i.e. it commonly occurred in the first block of each sequence), MMNc to its later occurrence as a deviant was equivalent in slow and fast changing sequences. This result was not dependent on tone characteristics (as the same result occurred regardless of whether sequences began with a short sound as first deviant or long sound was first deviant) and was not dependent on sequence order as different role alternation speeds (five in total) were counterbalanced across participants. These very long-acting order-dependent effects substantially extend the estimated time span over which prior knowledge influences MMNc amplitude from the 30 s demonstrated by Winkler et al. (2002) to a span of several minutes. Furthermore, the results suggested an order-driven bias whereby role volatility only affects gain for the sound first encountered as deviant.
Todd et al. (2013) demonstrated an even longer-term influence by exploring how the bias was affected by alternating initial role assignment across sequence pairs. Participants heard three sets of slow followed by fast changing sequences with each pair separated by 5 min of silence. For the first sequence pair, sequences began with the long tone as the first encountered deviant, in the second pair, sequences began with the short tone as the first encountered deviant, and the final pair replicated the first pair. The results for the first two pairs replicated the bias observed in Todd et al. (2011), with volatility only modulating MMNc amplitude for the sound first-encountered as a deviant (i.e. the long tone in the first pair and the short in the second pair). However, in the third pair the bias disappeared, with MMNc larger in slow than fast changing sequences for both tones. That is, the initial role assignment no longer affected how MMNc amplitude differed in the slow and fast sequences. The disappearance of the bias for the final pair was attributed to a type of second-order learning that the bias could operate in both directions (favouring both the long sound and the short sound), indicative of a long-term memory like influence on MMNc lasting over tens of minutes.
The differential sampling hypothesis
Todd et al. (2011, 2013) suggested that the bias might be explained by differential probability sampling—that is, long term probability sensitivity for MMNc to the first deviant and local shorter-term sensitivity for MMNc to the second deviant. Differential sampling has testable implications for how MMN amplitude might change over time within blocks of the multi-timescale sequences. The equal amplitude for slow- and fast-sequence MMNc for the second deviant (those first encountered as standard) implies that confidence in the model reaches asymptote within 0.8 min (the fast sequence block length). Hence, the second deviant MMNc should be the same in the first and second half of a slow sequence block, as the first half of this block allows sufficient time (1.2 min) for confidence to reach asymptote. However, the second deviant MMNc should grow from the first to the second half of fast-sequence blocks; although with very short probability sensitivity (i.e. at asymptote within 0.4 min), the increment from the first to the second half may be quite small. The first-deviant MMNc should grow from the first to second half of fast-sequence blocks. However, due to the long term probability sensitivity for this deviant, it should then increase again for the first half of a slow sequence block (as it is longer than the entire fast block), with the largest MMNc amplitude occurring in the second half of slow blocks. In this paper, we provide a test of these predictions of the differential sampling hypothesis by re-analysing data from the first sequence pair in Todd et al. (2013) using ERP data separately extracted from the first and second half of each block within slow and fast sequences.
Participants were 14 healthy adults (seven women, seven men; 18–31 years, mean = 25 years, SD = 4 years), community volunteers and first-year undergraduate psychology students at the University of Newcastle. Participants were excluded if they were diagnosed with or being treated for mental illness, had a first-degree relative with schizophrenia, regularly used recreational drugs, or had history of neurological disorder, head injury or surgery, hearing impairments, or heavy alcohol use. Course credit was offered for participation to students and cash reimbursement to community volunteers. Written informed consent was obtained from all participants to complete the protocol as approved by the Human Research Ethics committee, University of Newcastle.
Stimuli and sequence
Each block within both fast and slow sequences was broken into halves to separate the responses elicited after points of transition between the two standard/deviant configurations (first half) from later points in the block where the new repetition had time to become established (second half). This division is illustrated in Fig. 1b where all periods within a sequence marked ‘1’ were averaged together to provide an index of first-half ERPs for slow and fast sequences, separately and all periods within a sequence marked ‘2’ were averaged together to provide an index of second-half ERPs. Halves of the fast and slow sequence blocks are necessarily different in length (first half fast = 0–0.4 min, second half fast = 0.4–0.8 min, first half slow = 0–1.2 min and second half slow = 1.2–2.4 min). These ERPs were then used to generate four difference waveforms per tone type: first-half and second-half slow sequence MMNc and first-half and second-half fast sequence MMNc. In each case, the computation always involved the subtraction of the ERP to that sound as a standard in that sequence period (first or second half), from the ERP to that sound as a deviant in the same sequence and period. For example, a second half fast sequence MMN to the 30 ms tone was computed by subtracting the ERP to the second half of 30 ms standards in each fast sequence block from the ERP to the second half fast sequence deviants in each fast block. The breakdown into first and second halves was based on trial numbers within blocks which lead to approximately 60 deviants in each half. The minimum sweeps contributing to an average for any participant was 47 with the mean between 57 and 58 for all deviant waveforms.
Differences evident in the ERPs were analysed for mean amplitudes (for standards) or mean-peak amplitudes (for deviants and difference waveforms). Mean amplitude was used instead of mean peak amplitude for the standard ERP because the period assessed comprised the influence of multiple ERP components. Mean amplitude was extracted over two periods for the standard ERPs: the first was commensurate with the period commonly used to define repetition positivity (50–200 ms, Baldeweg 2006) and the second to capture the main visible differences over the P2 period (140–170 ms). The maximum negativity in deviant and difference waveforms was quantified by extracting the mean amplitude over a 20-ms period centred on the most negative point within the 70–270 ms post-stimulus interval (facilitating a stable measure capturing the peak amplitude, Kujala et al. 2007). ERP measures were compared in repeated measures ANOVA with within-subjects factors of Speed slow veraus fast, Tone first (60 ms) versus second (30 ms) deviant, and block Half first versus second. Paired (two-tailed) t tests were performed for simple effects with α = 0.025 (Bonferroni corrected).
Analysis within sequences revealed a significant Tone by Half interaction for the slow sequence only (F(1,13) = 4.72, p < 0.05). MMNc amplitude in the slow sequence incremented (became more negative) across halves for the second deviant (30 ms) (t(13) = 2.96, p < 0.025) but did not change significantly for the first deviant (p = 0.66) as illustrated in Fig. 2a. These findings are again inconsistent with the differential sampling account, as a longer time scale for probability sensitivity predicts the opposite pattern (i.e. if anything the first deviant should increment).
Standard and deviant analyses
Under predictive coding models, MMN is ascribed to a prediction error whose gain reflects precision or confidence about predictions that is established by the stability in repetitive patterning. Crucially, confidence has to be learned and this depends upon the volatility of probabilistic contingencies. The present study explored an apparent order-dependent bias in gain in Todd et al.’s (2013) two-tone multi-timescale sequences where the two tones alternate roles as a highly repetitive standard and rare deviant. The bias refers to the fact that volatility in a sequence appears to affect precision estimates for only one of the two sounds—the sound first encountered as a rare-deviant tone. The differential effect of volatility on precision estimates is inferred from the patterns of MMN amplitude when each sound occurs as a contextually-rare deviant stimulus. In stable sequences where roles alternate slowly there should be high confidence/precision in predictions leading to larger MMN than in the context of low confidence/precision when roles alternate rapidly.
The fact that this difference as a function of speed of sequence alternation only occurs for the sound first-encountered as a deviant appears to be a peculiar asymmetry in second order learning. The asymmetry or primacy bias was quantified by a Speed by Tone interaction in Todd et al. (2013) that had a quite large effect size (Cohen’s 1992, d = 1.4). Here we demonstrate that the primacy bias is anchored to what is happening at transition points in the sequence where one regularity switches to another. The Speed by Tone interaction characterising the primacy bias is only present in data obtained from the first half of blocks immediately following role transitions in the slow and fast changing sequences, and the effect is much larger in first half data (d = 2.4) than for data analysed across the whole block as performed by Todd et al. (2013). MMNc obtained from the second half data exhibited neither main effects nor interactions involving Tone or Speed factors. These data provide evidence that the influence of the prior that drives the bias is strong at block onsets but diminishes over time.
The results of the re-analysis clearly contradict Todd et al. (2011, 2013) differential sampling account. The differential sampling account is most clearly ruled out by MMNc to the stimulus first serving as a standard and later experienced as deviant. This MMNc reaches a significantly higher amplitude in the first half of the fast sequence blocks (0.4 min of stability) than it does in the first half of slow sequence blocks (1.2 min of stability). Neither shorter nor longer sampling periods can explain larger MMNc when there is less time to accumulate predictive confidence. On the differential sampling hypothesis, only a smaller, or at best constant, amplitude can be expected.
Given that MMN is a difference waveform, the bias we report could conceivably be due to changes in either the response to tones as repetitive standards or as rare deviants. Examination of changes in standard and deviant ERPs replicated previous findings (Todd et al. 2011, 2013) that the bias is reflected in the response to the sounds when presented as deviants only. In fact, the data indicate that the bias is most powerfully evident in analyses of deviant ERPs where the effect size for the Speed by Tone interaction in first half data was a very large effect size of d = 4.0 (larger than that for the MMN in difference waveforms reported above). Whatever creates the bias, therefore, has a very substantial influence over how gain accumulates in the system, changing the relative “importance” of the two sounds by altering response to pattern deviations. Overall, the pattern of MMNc and deviant ERP results is consistent with the involvement of some form of reactivation. However, previous theories about what is reactivated must be refined because: (1) in the present study differential reactivation is only evident in deviant ERPs; and (2) there is a reactivation of confidence in tone “roles” (or reactivation of the whole context) rather than only for the regularities. We explain each of these refinements in turn.
The response to repetition shows prolonged sensitivity to pattern stability but no bias
ERPs to repetitive standard tones are characterised by a decrease in negativity from the first to second half of blocks for both fast and slow sequences, and this decrease is generally larger in slower changing than faster changing sequences. The suppression in responsiveness continues to increase with stability of the standard (i.e. the period of time over which it is the most likely tone) despite equivalent local probability (i.e. the likelihood of interruption by a deviant averages at 1/8 in both halves and sequences). Each time a string of standards is interrupted by a deviant the expected precision associated with a prediction model will fall initially and then recover. Recent research delineates effects of repetition suppression from what has been termed expectation suppression (Todorovic and de Lange 2012). The former, linked to processes such as local neuronal stimulus specific adaptation (Ulanovsky et al. 2003; for a recent review, see Ayala and Malmierca 2013), requires replication of a sound over time and leads to reduction in early components of the ERP (e.g. 40–60 ms; Slabu et al. 2010). In contrast, expectation suppression is related to expectation of a pattern continuation (Summerfield et al. 2008; Alink et al. 2010; den Ouden et al. 2010; Todorovic et al. 2011), with an effect that is present in later ERP components (e.g. 100–200 ms).
In present data suppressed response to standard tones in the slow versus fast sequence is only evident in later portions of the ERP and may therefore be the result of increased confidence in the prediction model due to its prolonged utility in estimating the most probable (as opposed to the exact) stimulus characteristics in the current context. However, to the extent that suppressed responsiveness reflects predictive confidence, the data suggest an equivalent increment for both tones as standards. It is clear that the standard ERPs provide no evidence of differential reinstatement of confidence in the first versus second encountered standard, as only response to deviants reveals the bias. This is to be expected under predictive coding because the increase in precision or gain is only disclosed by the prediction errors (i.e. the response to deviations) that are modulated by that gain.
The bias is reflected in response to deviations
To explain the primacy bias in terms of differential reactivation of a prediction model, one has to assume a decoupling of the two components of precision/predictive confidence—the suppression of the anticipated properties and sensitisation to alternate properties. The former (as noted) provides no support for differential processing of two tones as standards but the latter does. The paradigm begins with the slow sequence and the data demonstrate that MMNc to deviants in the first block type (initial roles) is large in the first half data and stays large over the second half. This implies high predictive confidence over both periods. In contrast, MMNc to the tone first encountered as standard when it is a later deviant (new roles) is small in the first half of the blocks. MMNc then increases in the second half of the blocks, consistent with the notion of accumulating predictive confidence through experience. So, while certainty in the standard (as per standard ERP) appears equivalent for the two tones, responsiveness to deviations is biased.
When the sounds conform to their first encountered roles (a reactivated context), the deviant appears to be rapidly recognised as rare/unexpected/potentially-relevant (hence MMNc is large in both halves) but when the roles reverse, evidence must be accumulated over time for the tone that was the first standard stimulus to be later accepted as a rare deviant stimulus (hence the growth in MMNc across halves). The data are, therefore, consistent with previous explanations in that the bias is due to initial role differences assigned to the two sounds, but inconsistent with the previous assertion that this leads to longer-term probability sensitivity for the first deviant and short-term sensitivity for the second (Todd et al. 2011, 2013). The absence of any change across halves for MMNc to the first deviant suggests that response to this sound is unaffected by its recent role as a highly probable standard. On the other hand, MMNc data for the first standard as a later deviant are consistent with the notion that its initial role as a high probability standard affects how it is later processed as a deviant. However, if this were due to a difference in the time scale of sound monitoring, one might also expect to see differential effects on the ERP to this sound as a standard—but the present analyses reveal no evidence of this.
A role-reactivation interpretation implies that the auditory system maintains and reinstates a full context. The slow sequence data suggest the bias we see may initially reflect the first encountered roles. Our data suggest that, when roles alternate, there is an initial reluctance to accept significance/rarity of a first-encountered standard, although this is overcome with sufficient countermanding evidence, and a readiness to accept the significance/rarity of the first encountered deviant. This interpretation is consistent with an earlier observation of Sussman et al. (2003), who found that no sound could act as a deviant and as a standard at the same time. This result implies that each sound has a fixed role within a given context, which is implicitly assumed by the current interpretation. This interpretation thus integrates two foci in MMN research: whether MMN indexes processing of the deviant sound (Näätänen 1990; Schröger 1997) or the representation of the standard (Winkler 2007). It appears that prediction models underlying this process include both standard and deviant affecting the processing of both sounds.
Not entirely an order-driven phenomenon
A caveat for our prior assumption that the bias is solely an order-driven phenomenon (Todd et al. 2011, 2013) is the stark contrast in the pattern of results in the fast sequence compared to slow sequence data. Despite commencing with the same structure as the slow sequence, the fast sequence data demonstrate the opposite pattern. In this sequence it is the MMNc to the second deviant (that first encountered as a standard when later experienced as a deviant) that starts and remains large across block halves, whereas MMNc to the stimulus first experienced as deviant begins small and builds with increasing role stability. To explain the bias we offer two potential descriptions for how the second sequence becomes “filtered” by the opposite role reactivation: the first by assuming a proximal prior; and the second by assuming superordinate pattern extraction.
An important difference between the two sequences becomes apparent when we consider the presence of priors—the slow sequence starts without prior exposure to the two sounds while the fast sequence begins shortly (~1 s on average) after the slow sequence. One explanation for these data is that the processing of both sequences is biased by an automatically generated assumption regarding the contextual roles of the two tones: for the slow sequence, the assumption is driven by the initial probability difference encountered (high probability short sound and low probability long sound) while the bias for the fast sequence is driven by an updated prior reflecting the most proximal probability distribution (the high probability long sound and low probability short sound as per the block concluding the slow sequence, see Fig. 1a).
This interpretation has interesting implications for how priors interact with incoming (bottom-up) stimuli and when they can be updated. The disappearance of bias in the second half of blocks indicates that its influence can be over-written by new learning. The curious observation here is that this new learning does not remove nor get integrated into the prior during an unbroken sequence, as the prior appears to be reactivated again when the blocks change. The only time we see evidence of the prior being updated is after the period of silence between the slow and fast sequences. When a new sequence begins the prior appears to switch to adopt the most proximal distribution—which then continues to influence how the subsequent fast sequence is processed.
One problem for this account is that it implies that the order of the sequences should be crucial to the effect because a slow following a fast sequence should lead to smaller MMN for the first slow-sequence deviant (because in the last block of the preceding fast sequence this stimulus has been the frequent one); data from Todd et al. (2011) indicate that this is not the case. In their paper, MMN to the first encountered deviant (but not the second) was larger in slow than fast sequences regardless of the order in which they were presented. So, while a proximal prior provides a parsimonious explanation of this present data, it seems unsatisfactory as a general account of primacy bias.
An alternative explanation arises if we consider that the auditory system is able to form predictions based on the superordinate structure of the sequences. In the design examined here, the first superordinate pattern is that roles of the tones reverse every 2.4 min. Assuming this regularity is learned, a major violation of this pattern occurs when the first block of the fast sequence stops early (see Fig. 1a). This violation would presumably cause a second order prediction error leading the system to re-evaluate assumptions. The re-evaluation would occur at the time when the initial roles are reversed (short deviant, long standard) providing an explanation for why the role-reactivation phenomenon reversed.
Unlike the proximal prior, superordinate pattern learning can explain the sequence order independence observed by Todd et al. (2011). Whenever a faster changing sequence was followed by a slower changing sequence, the second order pattern violation would occur when the initial block exceeded the expected block length. Relearning would, therefore, be prompted during the first block, leading to favouring initial roles of the tones (because the first slow block has the same roles for the two tones as the very first fast block), causing a larger MMNc to the first deviant. Whenever a faster changing sequence followed a slower changing sequence the second order violation would occur when the initial block finished early, prompting relearning during the second block in which roles are reversed, thus explaining smaller MMNc to the first deviant.
Evidence of very long timescale effects
Regardless of the mechanisms, these data imply that the bias affecting gain in MMN is caused by a prior that specifies the roles assigned to tones at the onset of a sound sequence. The bias is characterised by a perseverance of a prior within a sequence that is then imposed upon sound processing even though its influence can be counteracted over time. Reactivation of the prior leads to its marked influence on first half data while the lack of effect on the second half attests to the diminishing influence of the prior farther from the transition point. Reactivation of the prior appears to continue until something alerts the system to a kind of context-shift. The proximal prior explanation assumes the context-shift is caused by a silent break in the sequence while superordinate pattern learning assumes it is the break in regular timing of role reversals. The long-acting influences observed here extend over a time period greatly exceeding the duration of sensory memory and even the longest period after which reactivation has been previously observed (30 s, Winkler et al. 2002).
Reactivation of a former standard/deviant role 2.4 min later (as per slow sequence data) falls into the time-frame of pattern extraction proposed for more rostral brain areas such as prefrontal cortex (Kiebel et al. 2008). Given that the auditory cortical response ultimately overwrites the prior, and yet it is still reactivated or re-imposed on the system, it is possible that the prior is stored in a different location (perhaps at a higher level in the processing hierarchy as suggested in computational models, Friston 2005; Garrido et al. 2009) and it is only updated when a marked contextual break occurs. The possibility of superordinate pattern learning extends the time periods of regularity extraction even further as the system would need to store regularities that emerge over tens of minutes, which is necessarily reliant on prefrontal cortex (see Kiebel et al. 2008 for review). Indeed, it may even be possible that the disappearance of the bias observed by Todd et al. (2013) was due to a tertiary level pattern extraction or third order learning. Once two repetitions of slow—fast sequences pairings were presented (i.e. the long-first deviant order followed by the short-first deviant order), the repetition of the first pair would represent a familiar sequence and the third change from a slow to fast pattern may have been anticipated. This experience with all tone patterns may be sufficient to drop the tone bias.
The primacy bias is a good example of how the way we challenge a system will determine what we learn about it. The system contains a bias favouring maintenance of a prior, distorting the effects of actual sound statistics in the process. The memory underlying MMN appears to store the full context (standard and deviant) or at least the assignment of roles. The bias demonstrates the system’s reluctance to reassign roles—deviant ERPs suggest both a readiness to accept the first encountered deviant as a rare/important sound when it occurs again in the presence of the first encountered standard, and an initial reluctance to accept the role reversal. The present analysis indicates that the type of gain control reflected in MMNc is sophisticated reflecting a long time course of learning contained in a contextual memory. While learning clearly shapes our priors, priors also appear to shape our learning. Finally, our findings about hierarchical learning and inference, and its bias by initial exposure, relates closely to hierarchical models of learning in the computational neurosciences and neuro-economics. A nice example here is the hierarchical Bayesian formulation of classical learning schemes, in which the precision at higher hierarchical levels encodes confidence and is directly related to the volatility of the current environment (e.g. Mathys et al. 2011). Here we observe compatible findings in task-independent perceptual learning that speaks to the likely generality of underlying principles.
This research was supported by a Project Grant 1002995 from the National Health and Medical Research Council of Australia. István Winkler was supported by the Hungarian Academy of Sciences (“Lendület” LP2012-36/2012) and Andrew Heathcote by an Australian Research Council Professorial Fellowship. We offer special thanks to Gavin Cooper for programming support in recoding original sequences.