The results of Experiment 1 reconfirms previous studies that suggest that confidence ratings are reactive (Birney et al., 2017; Double & Birney, 2017a). Furthermore, the findings suggest that reactivity to confidence ratings is a consequence of repeatedly presenting the word ‘confident’ to participants. The relationship between PCA and performance was positive only in the CR and prime groups, suggesting that the relationship between PCA and performance was exaggerated whenever the word ‘confident’ was present in intertrial ratings. Based on this finding, in Experiment 2 we assess whether reactivity to confidence ratings can be reduced or even eliminated by rephrasing the confidence ratings to remove the word ‘confident’.
Method
Participants and materials
Sample size was determined using the same method as Experiment 1, indicating a desired sample size of 111. Due to the larger than expected number of participants that had to be excluded in Experiment 1, we recruited a slightly larger number of participants to ensure that power was adequate in the final sample. One hundred and sixty-two participants (55.6% female) were recruited using Amazon’s Mechanical Turk (Mage = 35.85 years, SD = 11.48 years). Participants for whom two or fewer correct responses were recorded were automatically discarded (n = 41). All participants completed RPM in the same fashion described in Experiment 1. After each RPM item, participants provided a particular rating. The CR group (n = 40) again had to provide confidence ratings (‘How confident are you that you answered the previous item correctly?’). The likelihood group (n = 42) rated how likely it was that their previous answer was correct (‘How likely is it you that you answered the previous item correctly?’). Finally, the control group (n = 39) again rated the extent to which two squares were the same colour. PCA was assessed in the same manner as Experiment 1.
Results
All data analysis was performed in the same fashion as Experiment 1. Descriptive statistics are available in Table 2. We utilised a linear regression model to examine the extent to which PCA moderated the effect of experimental group. The dummy coded group effect and PCA (mean centred) as well as their interaction were entered as predictors. The task-irrelevant rating (control group) was entered as the reference group. In addition, sex and age were entered as covariates in the model to control for demographic effects.
Table 2 Descriptive statistics for Experiment 2 Density plots of RPM score are presented in Fig. 2b. The results suggest that there was no significant overall difference between the CR group (M = 5.98) and the control group (M = 6.33); β = −.10, t = 0.92, p = .361. Similarly, there was no overall difference between the likelihood (M = 6.52) and the control group; β = .17, t = 0.47, p = .641. PCA was not a significant predictor of RPM score; β = −.05, t = .28, p = .780. This was qualified by a significant interaction between PCA and the CR versus control effect; β = 0.30, t = 2.22, p = .029. As shown in Fig. 4, this effect was largely driven by impaired performance of low PCA participants in the CR group. There was no significant interaction between PCA and the likelihood versus control group effect; β = .17, t = 1.19, p = .237. For completeness we also reran the analysis with the CR group as the reference group. The results suggest that the CR group was not different to the likelihood group in terms of overall performance, β = .15, t = 1.41, p = .160. The difference between PCA and the CR versus likelihood was not significant; β = −.16, t = 1.19, p = .236.
As a follow-up analysis, we again probed the moderation using the PROCESS macro in SPSS. As with Experiment 1, we examined group differences for low, moderate, and high PCA participants (25th, 50th, and 75th percentiles). There was a significant difference between the CR group and the control group for low PCA participants (t = −2.09, p = .039), whereas the difference between the likelihood group and the control group was not significant (t = −.473, p = .637). There were no significant group differences for moderate or high PCA participants (all ps > .10). While this finding is different from Experiment 1, where the group differences were largely in low PCA participants, the pattern of results is similar between studies in that the effect of confidence on performance is exaggerated in conditions where the word ‘confident’ is presented to participants.
Metacognitive accuracy
As both the CR and likelihood groups provided task-relevant judgements of their performance, it was possible to examine BOTH the effect of judgement type on metacognitive accuracy and whether participants' metacognitive accuracy interacted with group effect. We operationalised metacognitive accuracy using a within-person Goodman–Kruskal gamma correlation as is typically done in the metacognition literature (e.g. Koriat, Ackerman, Lockl, & Schneider, 2009; Son & Metcalfe, 2000). The gamma correlation is the correlation between performance and confidence for participants across trials. Firstly, the CR group had a significantly higher gamma correlation (gamma = .68, 95% CI [.60, .76]) compared with the likelihood group (gamma = .44, 95% CI [.34, .54]). This suggests that the relationship between confidence and performance is stronger in the CR group. While, there are a number of possible interpretations of this finding, it is in keeping with our suggestion that presenting the word ‘confident’ increases the impact of participants’ confidence on performance.
Secondly, we calculated a gamma correlation for each participant individually as a measure of metacognitive accuracy, then examined the interaction between the gamma correlation term and experimental group. Obviously, as the control group did not make task relevant ratings, only the CR and likelihood groups could be compared in this way. The results suggested that the difference between the CR group and the likelihood group did not interact with metacognitive monitoring accuracy (β = −.33, t = 1.44, p = .155) although given that this result was marginally significant and used a reduced sample a more powerful replication is necessary to rule out the interactive effect between monitoring accuracy and rating type.
General discussion
The current study examined reactivity to confidence ratings and the extent to which reactivity is moderated by confidence. First, there was no overall reactivity effect for either the CR or prime conditions. However, using a measure of confidence as a moderator (PCA), we showed that high confidence participants tended to experience positive reactivity to confidence ratings (Experiment 1), while low confidence participants tended to show negative reactivity effects (Experiment 2). This is consistent with previous research that establishes the moderating effect of self-confidence on reactivity to confidence ratings (Double & Birney, 2017a, 2017b; Double et al., 2018). In addition, this study was the first to specifically examine the mechanism for reactivity to confidence ratings. We evaluated two distinct hypothesised mechanisms: a priming mechanism, where reactivity is driven by the repeated presentation of the word ‘confident’, and a metacognitive introspection mechanism, which proposed that task-relevant introspection prompted reactivity. Our results provided support for a priming mechanism driving reactivity.
Reactivity to metacognitive ratings has shown inconsistent effects, with some authors observing positive reactivity (e.g. Double & Birney, 2017a; Double et al., 2018; Soderstrom, Clark, Halamish, & Bjork, 2015; Witherby & Tauber, 2017), others observing negative reactivity (Birney et al., 2017; Mitchum, Kelley, & Fox, 2016), and still others finding no reactivity effects (Kelemen & Weaver, 1997; Tauber, Dunlosky, & Rawson, 2015). It has been proposed elsewhere that the direction and magnitude of reactivity is in part determined by task characteristics (Double et al., 2018) or participant characteristics, such a self-confidence (Double & Birney, 2017a). The current findings support such individual differences models of reactivity by showing that self-confidence (measured using PCA) moderates the direction of reactivity to confidence ratings. This is an important finding from a methodological view, because it suggests not only that confidence ratings cannot be considered an innocuous self-report measure when collected during an experiment, but eliciting confidence ratings may exaggerate confidence-related differences in cognitive performance.
The current results suggest that, regardless of whether a rating is task relevant, if the word ‘confident’ is included, then self-confidence-related reactivity is observed. This supports the notion that reactivity is driven by priming participants’ self-confidence, brought about by the repeated presentation of the word ‘confident’. This finding provides an important insight into the nature of reactivity to confidence ratings, in suggesting that reactivity is a specific response to the language of the rating. This provides an obvious recourse to reactivity effects by using more neutral language (i.e. not including the word ‘confident’), which we demonstrated to be somewhat effective in Experiment 2, to the extent that the likelihood ratings group did not show any significant difference from the control group. Therefore, it is advisable that researchers interested in eliciting confidence ratings adopt a more neutral phrasing in order to eliminate unintentional reactivity effects. Furthermore, it suggests that cognitive performance can be enhanced in high self-confidence individuals by priming these self-confidence-related beliefs. This finding is congruent with earlier evidence that suggests that goals and motivation can be unconsciously primed (e.g. Dijksterhuis & Aarts, 2010) and that proximally primed self-confidence can affect performance on an intelligence test (Steele & Aronson, 1995).
Many theories of self-regulated learning espouse the benefits of metacognitive introspection to learning outcomes (Carver & Scheier, 2001; Efklides, 2011). Furthermore, evidence suggests that metacognitive prompts can have a beneficial effect on learning (Bannert, 2006; Bannert, Hildebrand, & Mengelkamp, 2009; Bannert, Sonnenberg, Mengelkamp, & Pieger, 2015). Previous studies that have shown positive reactivity to metacognitive ratings have posited that this may be a result of the metacognitive introspection demanded by such ratings (Double & Birney, 2017a). However, the present findings suggest that there is little benefit to the metacognitive reflection provided by confidence ratings, instead reactivity effects can be found even when task-irrelevant ratings are made, so long as confidence is primed. This suggests that a more controlled approach to the examination of reactivity and the evaluation of metacognitive prompts is needed, as the effects may well be driven by the specific wording of the prompt, rather than the introspection produced, as is often assumed.
In adding to the body of research establishing reactivity to confidence ratings, the current findings have raised methodological issues for the measurement of metacognition. However, the fact that the specific wording of the rating may drive reactivity to confidence ratings provides a clear avenue to reduce reactivity effects by modifying the language used in confidence ratings. In addition, it remains unclear to what extent these priming effects depend on the accuracy of self-confidence judgements. While our results suggest that metacognitive monitoring accuracy (as measured using the gamma correlation) did not interact with the difference between the CR and likelihood groups, the analysis was somewhat limited by the reduced power. Furthermore, the likelihood group cannot be considered a true control group if one wants to examine the effect of monitoring accuracy on reactivity (while reactivity appears reduced in the likelihood group it unlikely to be completely negated). While it is possible that priming the confidence of individuals influences their cognitive performance, regardless of whether they are, in fact, over/under confident in their abilities, this question is deserving of further research.
The present study has provided further support to the notion of reactivity to confidence ratings and replicated the previous findings showing the magnitude and direction of reactivity to confidence ratings is, at least in part, determined by the self-confidence of participants. Furthermore, this was the first study to show evidence that reactivity to confidence ratings occurs due to a priming effect driven by using the word ‘confident’ in the rating. These findings are important for researchers who intend to assess metacognition using confidence ratings and suggest that the use of more neutral language in confidence ratings is an effective way to reduce unintentional reactivity effects.