Situations in which performance pressure is high can cause people to perform below their actual abilities, a phenomenon called “choking under pressure,” which may have heavy consequences on people’s lives (Beilock, 2010). Examining the relationship between choking under pressure and individual differences in working memory capacity (WMC), Beilock and Carr (2005) concluded that the individuals who are most likely to fail under pressure are those who, in the absence of pressure, have the highest potential for success (the higher-WMC individuals). DeCaro, Thomas, Albert, and Beilock (2011) suggested two different processes leading to poor performance in high-pressure situations. According to the first process, choking occurs because task-irrelevant thoughts and worries distract executive attention (a key component of WMC; Engle, 2002) away from task execution (e.g., Beilock & Carr, 2005; Beilock & DeCaro, 2007; Gimmig, Huguet, Caverni, & Cury, 2006; Markman, Maddox, & Worthy, 2006), which is problematic in difficult tasks requiring attentional control. The second process implies just the opposite—that pressure shifts too much executive attention toward the task at hand, which may cause poor performance in routine (non-attention-demanding) tasks relying on skill processes and procedures that normally run best outside of conscious awareness (e.g., Beilock, Bertenthal, McCoy, & Carr, 2004; Beilock, Carr, MacMahon, & Starkes, 2002; Beilock, Jellison, Rydell, McConnell, & Carr, 2006).

DeCaro et al. (2011) found evidence that the first process (distraction) is most likely in situations associated with “outcome pressure”—the perspective of an incentive if a certain outcome is achieved—whereas the second process results from being watched by others (“monitoring pressure”), particularly when one’s performance is being evaluated in some manner (e.g., being watched by a teacher or video camera). By relating reduced executive attention with outcome pressure, and the counterproductive allocation of attention to skill processes with monitoring pressure, DeCaro et al. made an important step toward reconciling seemingly disparate theories of skill failures.

However, this integrative approach does not fit with independent research on distraction/conflict theory (Baron, 1986), indicating that being watched by others can create distraction (the consequence of outcome pressures, according to DeCaro et al., 2011), especially when the observers are unpredictable and/or a source of evaluation. When attending to others is incompatible with the task demands, attentional conflict may ensue, a form of response conflict regarding what attentional response one should make (Baron, 1986; Huguet, Barbet, Belletier, Monteil, & Fagot, 2014; Huguet, Dumas, & Monteil, 2004; Huguet, Galvaing, Monteil, & Dumas, 1999; Muller & Butera, 2007; Normand, Bouquet, & Croizet, 2014; Sharma, Booth, Brown, & Huguet, 2010). This conflict, in turn, may lead to poor performance on difficult or attention-demanding tasks. Likewise, early research on the biopsychosocial model of challenge and threat (Blascovich, Mendes, Hunter, & Solomon, 1999) showed evidence that being watched by an evaluative audience (the experimenter) harms performance on the rule-based category-learning tasks also used by DeCaro et al. to test the detrimental effects of outcome pressures. Thus, it seems that the nature of the pressure at stake does not necessarily predict the type of processes that take place.

Here, we tested whether being watched by others can lead individuals with higher WMC to choke on a classic measure of executive control. Assuming that monitoring pressures are more likely when one’s performance is being evaluated in some manner, we distinguished the presence of the experimenter from the presence of peers, a classic distinction in the literature on social presence effects (for a review, see Guerin, 2009). There is indeed longstanding evidence that experimenters are spontaneously viewed by participants as experts, and therefore probably evaluative (Sasfy & Okun, 1974; Stotland & Zander, 1958). This characteristic makes distraction/conflict more likely in the presence of the experimenter than in the presence of peers (Baron, 1986). A peer presence condition was also required, to make sure that any impaired performance was due to the evaluation potential of the person present rather than to social presence, per se. As we are reminded by DeCaro et al., however, choking under pressure is not just poor performance; it is performing more poorly given one’s skill level, a key criterion that has been neglected in earlier studies on distraction/conflict or challenge and threat. Ironically, the individuals with higher WMC are precisely those who are most able to attend simultaneously to both the focal task and the experimenter presence (i.e., to experience distraction/conflict), and thus the most susceptible to choke on tasks requiring executive attention when the person present is the experimenter. This counterintuitive hypothesis was at the core of the present research.



The participants were 54 undergraduate students (33 female, 21 male; mean age = 20.46, SD = 2.14) from Aix-Marseille University. All participants received financial compensation (€10) for their participation, were naive concerning the purpose of the experiment (which was presented as a study on visual perception), and reported normal or corrected-to-normal vision. The sample size was determined—as recommended by Tabachnick and Fidell (2007)—on the basis of the desired power (.80), alpha level (.05), number of predictors (five in the main analysis), and anticipated size of the choking effect (f 2 = .30) in the Simon task (see the Method section). Using Daniel Soper’s sample size (online) calculator, the minimum required sample size was calculated as 49.


Working memory capacity

We first measured participants’ WMC using a computer-based version of the classic Reading Span Task (RSPAN; Daneman & Carpenter, 1980). Each display included a meaningful or meaningless sentence that the participants read aloud while verifying whether or not it made sense, and a to-be-remembered letter, which they also read aloud (e.g., “We were fifty lawns out at sea before we lost sight of land. ? X”). The sentences (each composed of 12–17 words, M = 14.4 words, SD = 1.2) were taken from the French version of the RSPAN (Desmette, Hupet, Schelstraete, & Van der Linden, 1995). The meaningless sentences were created by changing only one word (miles to lawns, in the previous example). The set size ranged from two to five sentence–letter problems per trial, with three trials per set size, for a total of 12 trials. At the end of each set, participants had to write down the sequence of letters in the correct order. An item was scored as correct when it was recalled in the correct serial position. RSPAN scores were equal to the percentage of correct answers on each trial weighted by the number of sentences in the trial. This first measure lasted no longer than 15 min, followed by a rest period of 2 min.

Executive control task

Executive control was measured directly using a conflict task (as recommended by Engle, 2002), rather than indirectly from a task not specifically designed to measure executive attention (e.g., math equations—Beilock & Carr, 2005; rule-based category learning—DeCaro et al., 2011; Raven’s matrices—Gimmig et al., 2006). Participants were trained on a standard Simon task (Simon, 1990), which provided a direct and sound measure of executive attention (van den Wildenberg et al., 2010). They were told that they should press a button on the right-hand side when a red light appeared on a given device, and the button on the left-hand side when a green light appeared (or vice versa). They were also asked to ignore the location of the stimulus and to respond as quickly as possible (each trial ended 1 s after stimulus onset) while minimizing errors. In the Simon task, the irrelevant stimulus information (spatial location) nonetheless elicits a strong response impulse that interferes with goal-directed action (responding to the color of the stimulus; see Fig. 1a). When the stimulus occurs on the opposite side from the correct response button (incompatible [INCOMP] trial), the reaction time (RT) is typically longer and the error rate higher than when the stimulus occurs on the same side as its response button (compatible [COMP] trial). This difference in performance between INCOMP and COMP trials is termed the “Simon effect” (Proctor & Vu, 2006; Simon, 1990). According to dual-route models (De Jong, Liang, & Lauber, 1994; Kornblum, Hasbroucq, & Osman, 1990), the location of the stimulus automatically activates the spatially corresponding response through preexisting stimulus–response associations, which are independent from instructions (Barber & O’Leary, 1997). Consequently, when a stimulus appears, two routes are activated: a fast unconditional (or automatic) route, and a slow conditional (or controlled) route. In COMP trials, the same response as the one indicated by the relevant stimulus feature is activated automatically. On the contrary, in the case of INCOMP trials, the erroneous response is automatically activated, slowing the correct RT and increasing the error rate. The mean compatibility or interference effect is taken to reflect the extra demands and time required to overcome the interference caused by the incorrect response activation produced on INCOMP trials that is absent on COMP trials, due to the facilitation from direct route processing (Ridderinkhof, van den Wildenberg, Wijnen, & Burle, 2004). The training phase included eight blocks of 96 trials (768 trials overall), with 50 % INCOMP trials and 50 % COMP trials that were delivered according to a pseudorandom sequence. This intensive training was required to make sure that all participants had mastered the task before the manipulation (see the supplemental materials, Text S1). This phase lasted no longer than 45 min, during which all participants were left alone (experimenter outside the room).

Fig. 1
figure 1

In the Simon task (a), participants have to respond to the color of stimuli while ignoring their spatial location. Spatial location nonetheless elicits a strong response impulse (represented by the solid arrow) that interferes with goal-directed action (represented by the dashed arrows) on incompatible (INCOMP) trials, but not on compatible (COMP) trials. (b) Linear model of the Simon effect (y-axis: RT on INCOMP trials minus RT on COMP trials), including participants’ (mean-centered) working memory span scores (x-axis) and social presence conditions. (c and d) Simon effects (y-axis) as a function of the RT distribution (x-axis: INCOMP and COMP trials averaged) and social presence for participants with higher (c) and lower (d) working memory span. The beta values (β) indicate standardized (beta) regression coefficients. * p < .05, ** p < .01, *** p < .001

Forty-eight hours after training (to minimize fatigue), participants performed the same eight blocks of 96 trials (randomized differently than in training) while being randomly assigned to one of three social presence conditions. In the “alone” condition, the participants performed the Simon task alone (experimenter outside the room). In the “peer presence” condition, the participants performed the task in the apparently incidental presence of a confederate (seemingly because of technical problems in the adjacent room). The confederate was positioned so as not to see the participant’s ongoing task (i.e., opposite the participant), but watched the participant 60 % of the time. As in previous research on mere presence (Guerin, 2009), the experimenter was outside the room. In the “experimenter presence” condition, the experimenter behaved as had the confederate in the peer presence condition (sat opposite the participants and watched them 60 % of the time).

To eliminate any possibility of outcome pressure and worries related to the performance situation and consequences, all participants were told as early as the first Simon session that (a) no incentive was contingent on their performance and (b) they would do the Simon task twice “in order to collect enough data” (the first Simon session, hence, could not be assimilated to a training session associated with the necessity to reach a given performance standard). Each participant was also encouraged to work “as hard as possible” throughout the Simon session. All instructions except instruction “b” were repeated right before the second Simon session. No performance feedback was given before, during, or after the Simon sessions.


At the end of the experiment, all participants filled out a questionnaire mainly consisting of self-report items for the measurement of distraction during the Simon task (two items taken from Baron, 1986), task difficulty (two items), task-specific effort (three items from Geiselman, Woodward, & Beatty, 1982), task-related anxiety (eight items from Morris, Davis, & Hutchings, 1981), and self-evaluation (two items).


Four participants, one who did not follow the Simon task instructions and did not complete the task and three who completed the task but with abnormal RTs (±2 SDs from the group mean), were excluded from the analyses. However, the inclusion of the three participants who completed the task did not change the main findings. As was recommended by Aiken and West (1991) for an interaction between a nominal (social presence) and a continuous (RSPAN scores) variable, we centered the continuous variable and dummy-coded the nominal one. In dummy coding, one group is designated as the reference group (here, the alone condition) and is assigned a value of 0 for every code variable. Each other group (the peer presence and experimenter presence conditions) is given a value of 1 on the dummy-coded variable that will contrast it with the reference group in the regression analysis, and a value of 0 on the other dummy-coded variable.

RT data

All RTs less than 150 ms (anticipations, <1 %) were excluded. We then regressed the Simon effect (correct RTs on INCOMP minus COMP trials) on social presence condition (dummy-coded using “alone” as the reference category), the RSPAN scores (ranging from .40 to .90), and their interaction. The whole model was significant, F(5, 44) = 3.14, p < .02 (R 2 = .26). Whereas the peer presence condition (M = 22.67, 95 % confidence interval [CI] = [16.27, 29.08]) did not change the size of the Simon effect, relative to the control condition (M = 21.09, 95 % CI = [14.75, 27.43]; b = 0.06, t = 0.40, p = .69), the experimenter presence condition (M = 29.91, 95 % CI = [22.34, 37.48]) induced greater interference, b = .333, t = 2.20, p < .04. More importantly, the experimenter presence condition had a dramatic impact on the relationship between WMC (RSPAN scores) and the size of the Simon interference (Fig. 1b). In the alone condition, this relationship was negative, b = −.44, t = −1.97, p = .05, unstandardized CI = [−101.34, 1.25]: The higher the RSPAN scores, the lower the interference. In the peer presence condition, this same relationship did not differ from that found in the alone condition, b = .15, t = 0.84, p = .41, unstandardized CI = [−43.41, 105.57], whereas it was significantly reversed in the experimenter presence condition, b = .60, t = 3.15, p < .003, unstandardized CI = [39.57, 180.33]. In this critical condition, the higher the RSPAN scores, the higher the Simon interference. Because this reversal may seem counterintuitive, we tested whether it remained significant when increasing sample size (although the present sample size was sufficient according to Tabachnick & Fidell’s, 2007, requirements, as noted previously). It did, as expected (see the supplemental materials, Text S2). Thus, it seems that the presence of the experimenter impaired executive control in participants with higher WMC.

Consistent with this argument, the present findings were restricted to the slower segment of the RT distribution. There is evidence in the Simon task that top-down suppression of incorrect automatic responses takes time to build up, and is therefore most effective on this segment (Ridderinkhof, 2002; van den Wildenberg et al., 2010), except when the suppression mechanism is disrupted (as was demonstrated in Parkinson Disease patients by Wylie, Ridderinkhof, Bashore, & van den Wildenberg, 2010). Here the reversed relationship found between participants’ WMC and the Simon effect was observed exclusively on the slower responses (see the supplemental materials, Fig. S1), indicating an impairment of the suppression mechanism in the higher-WMC participants in the presence of the experimenter. Another way to look at this disruption of executive processes would be to calculate the slope value that quantified the reductions of interference on the slower responses for the high- and low-WMC participants separately in each presence condition. Participants were assigned (as in Beilock & Carr, 2005) to low- and high-WMC groups (LWMs, M = .58, SE = .06; HWMs, M = .78, SE = .07), using a median split (Mdn = .67). The slope values were significant in both groups of participants in all conditions (linear trends ranged from –.05 to –.12, ps < .03), except among the HWM participants in the presence of the experimenter (Fig. 1c and d). This pattern can be taken as further evidence that monitoring pressure may cause executive control (i.e., the top-down suppression of incorrect automatic responses) to operate less efficiently.


We regressed participants’ errors on INCOMP trials (on which the suppression mechanism was required) on social presence condition (dummy-coded using “alone” as the reference category), the RSPAN scores (mean-centered), and their interaction. The whole model approached significance, F(5, 44) = 2.35, p = .06 (R 2 = .21). In the alone condition, the relationship between RSPAN scores and errors was negative (the higher the RSPAN score, the lower the error rate), but not significant, b = −.36, t = −1.52, p = .14. In the peer presence condition, this relationship did not differ from that found in the alone condition, b = .05, t = −0.26, p = .80, whereas it did differ significantly in the experimenter presence condition, b = .57, t = 2.89, p < .01: the higher the RSPAN scores, the higher the error rate (see the supplemental materials, Fig. S2). The same analysis conducted on COMP trials did not show any significant effect.


The self-reports were analyzed in a series of 2 (WMC: low vs. high) × 3 (Social Presence Condition: alone, peer presence, experimenter presence) ANOVAs. No effects were found (Fs < 1), with the exception of the self-reports of distraction (ranging from 1 to 6), for which the WMC × Social Presence interaction was significant, F(2, 44) = 4.46, p < .02, η p 2 = .17. As expected, in the experimenter presence condition, the HWM participants (M = 3.81, SD = 1.16) reported more distraction (p < .01) than did their LWM counterparts (M = 2.25, SD = 0.98), whereas no difference occurred between the two groups in the control (alone) and peer presence conditions (ps > .22), which did not differ from one another (p = .30).


The present findings provide the first evidence that simply being watched by evaluative others (monitoring pressure) leads individuals with higher WMC to choke on a classic measure of executive control. Higher WMC is usually associated with better executive control (Engle, 2002), which is what we found in the absence of an evaluative audience (alone or peer presence condition). However, this relationship was clearly reversed in the evaluative presence of the experimenter: the higher a participant’s WMC, the worse the executive control. This is exactly what would be expected from our approach based on distraction/conflict theory (Baron, 1986). We assumed that the individuals with higher WMC were those most able to attend simultaneously to both the focal task and the presence of evaluative others, and therefore the ones most susceptible to experiencing distraction/conflict and to choke when being watched by the experimenter on tasks requiring executive attention. Consistent with this reasoning, the correlation between WMC and the size of Simon interference was negative in the alone and peer presence conditions, but positive in the experimenter presence condition, a reversal that remained significant when we increased the sample sizes in the two most critical conditions (alone vs. experimenter presence). This pattern actually goes beyond the classic findings in the choking literature indicating the lack of any relationship between WMC and performance on attention-demanding tasks under evaluative pressure. However, our research has been the first to investigate choking while combining experimenter presence with a direct measure of executive control. This combination proved to be very informative, because our findings suggest that the relationships between the type of pressure and the related processes underlying choking may be more complex than had been thought previously. Not only can monitoring pressure related to the presence of evaluative others reduce executive control (previously thought to be exclusively associated with outcome pressure), but the magnitude of this reduction under monitoring pressure seems to be positively related to WMC. Furthermore, the HWM participants in our research reported more distraction than did their LWM counterparts when being watched by the experimenter (and only in this condition), with no effects on the self-reports of anxiety. This suggests that the distraction in the HWM participants being watched by the experimenter did not necessarily take the form of worries about the experimenter, situation, and consequences. Of course, we admit that being watched by someone of consequence or power may induce worries (even in the lack of outcome pressure), but as was suggested by our data, it does not seem to always be the case.

Taken together, therefore, our findings suggest that we should not restrict choking resulting from distraction to outcome pressures (i.e., the perspective of an incentive if a certain outcome is achieved) and related worries. One may argue that there must still be an element of outcome pressure inherent in the presence of an evaluative other, which could perhaps best be framed as an intrinsic incentive to perform well while under observation. This would lead one to expect increased self-reports of task-specific efforts when being watched by the experimenter. Again, however, the experimenter condition affected only the self-reports of distraction. Thus, when combined with the performance findings, participants’ self-reports strengthen our confidence that being watched by an evaluative audience (monitoring pressure, as defined by DeCaro et al., 2011) may lead to choking by shifting a portion of executive attention away from task execution, even in the lack of any worries related to the performance situation.

The present results do not invalidate DeCaro et al.’s (2011) finding that being watched by evaluative others (monitoring pressure) may increase counterproductive attention to skill processes. Instead, they indicate that this effect is not the only possible consequence of monitoring pressures, and that another consequence may actually be one that so far has been associated exclusively with outcome pressures (i.e., reduced executive attention). DeCaro et al. considered that choking may result from multifaceted high-pressure situations (combining both types of pressure and their related processes), but they did not anticipate that reduced executive attention might result from monitoring pressure alone.

This possibility is consistent with the literature on social presence effects (Baron, 1986; Conty, Gimmig, Belletier, Georges, & Huguet, 2010; Huguet et al., 2004). However, more than one century of research in this area (for a review, see Guerin, 2009) has failed to consider that the negative effects of being watched by evaluative others on difficult or attention-demanding tasks may actually be restricted to the individuals with higher WMC. By revealing this boundary condition, our findings also suggest that many (probably not all) of these effects actually reflect choking, and also highlight the importance of individual differences such as those related to WMC in social presence effects. Individual differences are still largely overlooked in this area (Uziel, 2007). Our findings also indicate the importance of distributional analyses—still largely overlooked—in research on social presence effects. Since Bond and Titus’s (1983) influential meta-analysis of these effects, their magnitude has been thought to be small. However, research in this area has focused on mean or median RTs, which by concealing the temporal dynamics of behavior may have led to erroneous conclusions about effect sizes. Here, we have shown that social presence effects are much stronger at the slowest segment of the RT distribution than for faster responses, indicating how important analyzing the RT distribution can be to conclusions about effect sizes (an argument that also applies to the choking literature). Thus, not only do the present findings help refine our understanding of the processes underlying choking in high-monitoring-pressure circumstances, but they also lead us to conceive the negative effects associated with the presence of evaluative others in an entirely new light.

Finally, the present research also has practical implications for experimental psychology in general. If being watched by the experimenter leads the individuals with higher WMC to choke on tasks relying on executive attention, then even subtle variations in the experimenter’s behaviors from one study to another (if not within the same study) may cause dramatic changes in attention-demanding tasks, resulting in contradictory findings.