Introduction

Metacognition is the ability to be conscious of your own thoughts, actions, and cognitions (Flavell, 1979; Stuss, 1991). This monitoring is necessary to adapt and achieve successful goal-directed behavior. There are many ways to define metacognition. Most models are based on a distinction between a stable reflective knowledge of one’s cognitive capacities and a more “on the fly” online metacognition (Fleur et al., 2021). Another distinction can be made between local metacognition, such as confidence in isolated decisions, and global metacognition such as self-efficacy beliefs (Seow et al., 2021). Fleur et al. (2021) stress the need for more studies on protocols to measure the different constructs of metacognition and the relationships between these constructs.

Two main elements of metacognition, as described by a neuropsychological model of metacognition, are metacognitive knowledge and online awareness (Fig. 1; Toglia & Kirk, 2000). Metacognitive knowledge is knowledge and beliefs of one’s cognitive processes and strategies that are stored in long-term memory. This knowledge base is gained over time and is rather stable, although it can be influenced by successes and failures (Toglia & Kirk, 2000). Online awareness refers to evaluation of performance within the context of a task. This is more flexible as it depends on task characteristics such as complexity and familiarity of the task, as well as personal factors such as motivation and meaningfulness of the task (Toglia & Kirk, 2000). This resembles “on the fly” metacognition (Fleur et al., 2021) or metacognitive experiences (Efklides, 2009). In this model, online awareness consists of two elements. The first element is monitoring, which is awareness of performance within the context of a task. This includes anticipating performance after appraisal of task demands but before task completion, as well as error recognition during a task. The second element is self-regulation, which refers to the ability to change strategies and adjust performance in response to changing task demands and experience. These self-regulatory processes depend on accurate self-monitoring but also motivation and socio-emotional processes. There are different measurement methods to assess these different elements of metacognition (see Fig. 1).

Fig. 1
figure 1

Conceptual overview of metacognition based on previous literature. Note. This figure provides a conceptual overview of metacognition. The arrows are the measurement methods to assess the different elements of metacognition, which are underlined. AUROC2 = type two area under the receiver operating characteristic curve

Common ways of assessing metacognition are questionnaires and performance-based measures. Metacognitive questionnaires typically address situations either in the past or in the future, which trigger pre-existing knowledge and beliefs about one’s cognition stored in long-term memory and, therefore, quantifies metacognitive knowledge. Online awareness can be quantified with performance-based measures. One way is using discrepancy scores between performance accuracy and confidence in one’s performance assessed before and after performing the task (Fleming et al., 2016). Confidence judgments before a task (prospective confidence) are based on an analytic process of evaluating task demands and experiences (Koriat, 2000), for which one might employ metacognitive knowledge. Confidence judgments made after a task (retrospective confidence) are experience-based judgments that rely on feeling of knowing (Koriat, 2000) and is an overall judgment of performance (e.g. this was a difficult task for me, I think I have not done well).

There are also more computational measures of metacognition and these have developed over time. For example, the Goodman-Kruskal gamma correlation was one of the first recommended measures for accuracy of feeling-of-knowing (Nelson, 1984). Others have investigated metacognition using signal detection theory (Benjamin & Diaz, 2008) or have investigated relative and absolute accuracy of feeling-of-knowing judgments (Schwartz et al., 2016). Furthermore, there are non-parametric methods such as metacognitive sensitivity (Fleming & Lau, 2014). Metacognitive sensitivity can be measured by asking participants during the task, after each trial, to indicate how confident they are that their answer is correct. When someone reports high confidence for correct answers and low confidence for incorrect answers, they are considered to have good metacognitive sensitivity (Fleming & Lau, 2014).

The assessment of metacognition remains difficult and there is a demand for empirical evidence to support theoretical frameworks of metacognition and its different elements (Seow et al., 2021). Moreover, there is a gap in the literature concerning the associations between the different constructs of metacognition (Fleur et al., 2021). Therefore, the aim of the current study was to explore how metacognitive sensitivity can be predicted by the different elements of metacognition, as measured by different measurement methods. The hypotheses are that smaller prospective and retrospective discrepancy scores (i.e. better metacognition) will predict better metacognitive sensitivity. In addition, metacognitive knowledge is expected to be positively associated with metacognitive sensitivity.

Methods

Participants

Participants were 128 healthy volunteers (100 female, 27 male, 1 non-binary) who were 20.6 (± 1.6) years old. Most were not native English speakers (N = 116). Participants had to (1) be 19 up to and including 24 years old, (2) be able to give informed consent, (3) have a good comprehension of the English language. Participants were excluded when they (1) had a history of, or a current, psychiatric illness or neurodegenerative disease, or (2) had a history of, or were currently under treatment for, alcohol or substance abuse. Participants were recruited through flyers at the Maastricht University and social media. In order to reach a power of 0.80, with a significance level of 0.05, and a medium effect size of f2 = 0.10, the aim was to include a minimum of 124 participants. The final sample consisted of 128 participants (37 exclusions due to invalid answers).

Procedure

Participants completed the study from home in a single online session that took about an hour using Qualtrics (see Fig. 2). Participants first read the information letter and signed informed consent online. After filling-out a short demographics questionnaire, the self-report metacognitive questionnaire was filled out. Subsequently, they performed a memory task and an abstract reasoning task (see Supplementary materials). Some validity questions were entered throughout the session to ensure that participants took part in a serious manner. Ethical approval was obtained from the Ethics Review Committee Psychology and Neuroscience at Maastricht University with reference number OZL_233_21_02_2021.

Fig. 2
figure 2

Schematic overview of study procedure and materials. Note. MAI = Metacognitive Awareness Inventory

Materials

Metacognitive Awareness Inventory (MAI)

This self-report questionnaire developed by (Schraw & Dennison, 1994) consists of 52 items measuring metacognitive knowledge. Ratings for each item were made on a continuous scale ranging from 0 (completely false) to 100 (completely true). Psychometric analyses of this instrument show that it is a reliable instrument with high internal consistency (coefficient α = 0.88 per subscale; Schraw and Dennison, 1994).

Prospective and retrospective confidence judgments

For the tasks, prospective confidence judgments about performance (after reading the instructions but before starting the task) and retrospective confidence judgments about performance (directly after completion of the task) were made. Participants had to indicate how well they thought they would perform (prospective) or had performed (retrospective) on the task on a scale from 0 to 10, where 0 corresponded to ‘very bad’ and 10 to ‘very good’.

Memory task

The memory task was a word pair learning task consisting of two phases: an encoding phase and a recognition phase (Inquisit 6, (Millisecond, 2006). During the encoding phase, participants were presented with 40 unrelated word pairs (cue word + pair) which were previously used and validated by Payne et al. (2012; List 3). Each word pair was displayed for 5 seconds and order of presentation was randomized for each participant. The recognition phase consisted of a two-alternative forced choice test. Participants were presented with 40 word pairs again. Cue words from the learned word pairs list were either paired with a correct cue word (in 50% of trials), or a new distractor word. They had to determine whether the word pair was in the list they had previously learned. Answer options were ‘yes’ or ‘no’. The order of presentation was random, there was no time limit to give an answer, and participants were asked to be as accurate as possible. Every trial each answer was followed by a confidence rating. Participants had to indicate how confident they were their answer was correct. There were six options ranging from ‘completely uncertain’ to ‘completely certain’. Participants were instructed that ‘completely uncertain’ meant that they answered completely based on guessing.

Outcome measures

Error recognition

Metacognitive sensitivity

Whether a person can differentiate correct from incorrect responses answer was measured using the trial-by-trial confidence ratings and accuracy of the answers given. Metacognitive sensitivity was quantified with the type two area under the receiver operating characteristic curve (AUROC2) code from Fleming and Lau (2014) in MATLAB (R2021b; Inc, 1996). Scores can range from 0 to 1. A score of 0.5 is at chance and corresponds to no metacognitive sensitivity. A score above 0.5 indicates higher metacognitive sensitivity, with 1 representing perfect metacognitive sensitivity (people are high confident for correct trials, and low confident for incorrect trials). Scores below 0.5 are atypical but indicate reversed metacognitive sensitivity (people are high confident for incorrect trials, and low confident for correct trials).

Retrospective discrepancy

Discrepancy scores between the confidence judgments made after the tasks (ranging from 0 to 10) and performance accuracy (number of trials correct/total number of trials*10; ranging from 0 to 10) were calculated. The discrepancy scores can range from − 10 up to 10. Scores close to zero represent good metacognition and the further the score is away from zero (either negative or positive), the more this type of metacognition is impaired.

Metacognitive knowledge

Metacognitive Awareness Inventory (MAI)

The knowledge of cognition subscale of the self-report questionnaire MAI consists of questions covering knowledge about self and about strategies, knowledge about how to use strategies, and knowledge about when and why to use strategies (Schraw & Dennison, 1994). Mean scores range from 0 (low) to 100 (high) level of perceived knowledge of cognition.

The regulation of cognition subscale of the self-report questionnaire MAI consists of questions covering the control aspect of learning: planning, information management strategies, comprehension monitoring, debugging strategies, and evaluation strategies (Schraw & Dennison, 1994). Mean scores range from 0 (low) to 100 (high) level of perceived regulation of cognition.

Anticipation of performance

Prospective discrepancy

Discrepancy scores between the confidence judgments made before the tasks (ranging from 0 to 10) and performance accuracy (number of trials correct/total number of trials*10; ranging from 0 to 10) were calculated. The discrepancy scores can range from − 10 up to 10. Scores close to zero represent good metacognition and the further the score is away from zero (either negative or positive), the more this type of metacognition is impaired. Negative scores represent under-confidence and positive scores represent over-confidence.

Data analysis

Statistical analyses were conducted with IBM SPSS Statistics version 27 for Windows. All measures showed high reliability scores. Cronbach’s alpha was α = 0.827 on the MAI knowledge of cognition subscale and α = 0.859 on the regulation of cognition subscale. Odd-even split-half correlation analyses showed high reliability on the memory task (r = .828 for accuracy; r = .935 for confidence ratings). Descriptive analyses were performed to identify the distribution of scores on the different metacognitive measures (see Supplementary materials). Multiple regression analyses were conducted to investigate whether scores on the questionnaires and accuracy of confidence judgments could predict metacognitive sensitivity (AUROC2). For these analyses, the prospective and retrospective discrepancy scores were transformed into absolute values (because both positive and negative values indicate poor metacognition). The raw data is available at https://doi.org/10.34894/U1YTW8. The experiment was not preregistered.

Results

The distribution of scores on the metacognitive measures for the memory task can be found in Fig. 3. Performance scores can be found in Table 1. Performance accuracy on the abstract reasoning task was close to guessing (M = 6.1, SD = 2.1). Therefore, the metacognitive measures on this task cannot be reliably interpreted and will not be further discussed (but can be found in Supplementary materials).

Fig. 3
figure 3

Distribution of scores on the different metacognitive measures. Note. The raincloud plots visualize the raw data and show the distribution of the dataset on the memory task. The boxplots show medians and quartiles. The dots are the raw data points. AUROC2 = type two area under the receiver operating characteristic curve

Table 1 Accuracy scores on the memory task

Prospective and retrospective discrepancy scores predict metacognitive sensitivity

The multiple regression model significantly predicted metacognitive sensitivity (F(4,123) = 2.637, p < .05,R2 = 0.079, see Table 2). Significant predictors were higher prospective discrepancy scores (β = 0.233, p = .014) and lower retrospective discrepancy scores (β=-0.185, p = .038). MAI knowledge of cognition and regulation of cognition subscales did not significantly predict metacognitive sensitivity (all p >. 703).

Table 2 Multiple regression analysis predicting metacognitive sensitivity

Discussion

The aim of this study was to explore whether metacognitive knowledge, anticipation and evaluation of performance predict metacognitive sensitivity. In this healthy population there is large variability of scores on all the different metacognitive measures. The results show that metacognitive sensitivity on the memory task was predicted by poor anticipation of performance (large prospective discrepancy score) and good retrospective evaluation of performance (small retrospective discrepancy score). Moreover, metacognitive knowledge did not predict metacognitive sensitivity.

The hypothesis that there would be a negative association between metacognitive sensitivity and discrepancy scores was true for the retrospective discrepancy score but not the prospective discrepancy score. Higher metacognitive sensitivity during the task was associated with more accurate judgment of performance after the task. This suggests that people use the metacognitive information during the task to shape their evaluation after the task, as has been suggested in previous research (Akama & Yamauchi, 2004). The confidence judgment after the task is different from before the task, and this might be due to metacognitive experiences during the task such as familiarity, difficulty, satisfaction, and effort (Efklides, 2006). The shift from a positive association between metacognitive sensitivity and prospective discrepancy to a negative association with retrospective discrepancy supports this idea. Though the proportion of variance was small, indicating that there are likely other important predictors for metacognitive sensitivity, such as domain-specific self-concept (Händel et al., 2020).

While online metacognitive measures seem to be associated with each other, offline measures seem not to. Metacognitive knowledge did not predict metacognitive sensitivity. This confirms the idea that metacognition can be split into metacognitive knowledge and online awareness as described in different models (e.g. Flavell, 1979; Toglia & Kirk, 2000). The questionnaire items cover a broad range of metacognitive behaviors and strategies, they relate to situations more distant in time, and might be more abstract than the in-the-moment confidence judgments after every trial. This is in line with a recent review that concluded that self-report measures can be unclear and differ from metacognitive behavior (Craig et al., 2020).

The strengths of this study are the large sample size, and the inclusion of different measures of metacognition, which allows for a proper investigation of the associations between the different elements of metacognition. However, there are also some limitations to this study. The first is that the study was fully online, making it difficult to check whether people took part seriously or whether they took long breaks. When replicating this study in a more controlled environment such as a lab, this can be registered. For now, we attempted to control for this by adding validity questions. These questions led to exclusion of five participants who indicated they ‘just clicked through’ the study. Secondly, the included sample was very homogenous, including mostly highly educated healthy people within a small age range. This limits generalizability of the results. Thirdly, the metacognitive sensitivity measure (AUROC2) might not be very meaningful when there is a floor or ceiling effect in performance accuracy. There must be enough correct and incorrect trials for people to be able to distinguish the two types of trials with their confidence judgment. Similarly, if the task is too difficult and people do not know whether their answer is correct or incorrect, it is difficult to give a confidence judgement. Because performance was at chance level, the abstract reasoning task was disregarded in further interpretation. There is not enough metacognitive information to be gained during the task. In future research it is important to adopt a task that is not too easy because people need to have both correct and incorrect trials, but also not too difficult because people need to know to some degree whether their answer was correct or not. A solution could be to use an adaptive task. Furthermore, in replication studies, if multiple tasks are included, they should be counterbalanced or randomized to control for practice effects or fatigue.

The current study shows that task-specific online measures of metacognition can predict metacognitive sensitivity, but more general self-report measures of metacognition cannot. Moreover, it provides a framework to investigate the associations between different elements of metacognition within persons. An interesting application of this paradigm would be to investigate these associations in populations with metacognitive impairments, such as after brain disorders. This could give an indication of where the problem arises and, possibly, what should be targeted to improve metacognition. Moreover, for a more comprehensive understanding of metacognition, it would be interesting to investigate which brain areas or networks are involved in the different elements of metacognition.