Skip to main content

The role of metacognition in recognition of the content of statistical learning

ABSTRACT

Despite theoretical debate on the extent to which statistical learning is incidental or modulated by explicit instructions and conscious awareness of the content of statistical learning, no study has ever investigated the metacognition of statistical learning. We used an artificial language-learning paradigm and a segmentation task that required splitting a continuous stream of syllables into discrete recurrent constituents. During this task, statistical learning potentially produces knowledge of discrete constituents as well as about statistical regularities that are embodied in familiarization input. We measured metacognitive sensitivity and efficiency (using hierarchical Bayesian modelling to estimate metacognitive sensitivity and efficiency) to probe the role of conscious awareness in recognition of constituents extracted from the familiarization input and recognition of novel constituents embodying the same statistical regularities as these extracted constituents. Novel constituents are conceptualized to represent recognition of statistical structure rather than recognition of items retrieved from memory as whole constituents. We found that participants are equally sensitive to both types of learning products, yet subject them to varying degrees of conscious processing during the postfamiliarization recognition test. The data point to the contribution of conscious awareness to at least some types of statistical learning content.

Statistical learning is a process for extracting statistical regularities from the environment that enables efficient processing of continuous sensory inputs. One of the tasks that relies on statistical learning is segmenting continuous inputs into discrete constituents (Baldwin, Andersson, Saffran, & Meyer, 2008; Gómez, Bion, & Mehler, 2011; Hard, Meyer, & Baldwin, 2019; Siegelman, 2019; Siegelman, Bogaerts, Armstrong, & Frost, 2019). It is generally assumed that statistical learning is incidental and happens without awareness and across modalities (Arciuli, von Koss Torkildsen, Stevens & Simpson, 2014; Aslin & Newport, 2012; Dienes, Broadbent, & Berry, 1991). However, some empirical evidence suggests that performance can be modulated by attention (Fernandes, Kolinsky, & Ventura, 2010; Toro, Sinnett & Soto-Faraco, 2005), and that conscious focus on a task improves performance (Alamia & Zenon, 2016; Reber Kassin, Lewis, & Cantor, 1980). Surprisingly, despite theoretical tension regarding how conscious statistical learning may be, studies on metacognition in statistical learning are exceptionally rare.

Metacognition is cognition about cognition. It helps humans to evaluate past decisions, make better future decisions, and monitor their own cognitive processes and cognitive states (e.g., Flavell, 1979; Nelson, 1996). Flavell (1979) and Schraw (1998) suggested that metacognition includes two aspects: knowledge about cognition and regulation of cognition. These components are served by different cognitive processes, referred to as metacognitive monitoring and metacognitive control (Dunlosky, Serra, & Baker, 2007; Nelson & Narens, 1990). Monitoring processes track decisions, cognitive states, and behaviour in uncertain situations and estimate retrospective confidence associated with cognitive states and past decisions (Kepecs, Uchida, Zariwala, & Mainen, 2008). Control processes guide future behaviour, taking current cognitive states and available evidence about the current environment and past outcomes into account. In this study, we will focus on metacognitive monitoring in statistical learning tasks by exploring the retrospective confidence that humans assigned their decisions in an artificial language-learning paradigm.

Efficient metacognitive monitoring is associated with accurate estimation of the degree of uncertainty associated with past and future decisions (Kepecs et al., 2008). It manifests as an individual’s ability to estimate the likelihood of error for each conscious decision they make. This estimate is reflected by assigning higher confidence ratings to decisions when the estimated likelihood of making an error is lower, and lower confidence ratings when the estimated likelihood of making an error is higher. Individuals, whose confidence ratings accurately discriminate between correct and incorrect responses are considered to exhibit higher metacognitive sensitivity. The same logic applies to different types of decisions (e.g., in different domains, modalities, tasks) made by the same individual. If, for example, a person’s confidence ratings better discriminate between correct and incorrect decisions in the visual than in the auditory modality, we can say that their metacognitive sensitivity is higher in the visual modality.

In this research, we explore statistical learning mechanisms that operate on a subtype of statistical relations—that is, conditional statistics. Conditional statistics measure the predictive relationships between consecutive events (Harris, 1955; Saffran, Aslin, & Newport, 1996). The strength of a predictive relationship is measured as the likelihood that an Element or Event B will occur given that Element or Event A has just happened (i.e., the transitional probability [TP] between A and B). Frequently, this kind of statistical learning mechanism is explored in the context of splitting a sequence of syllables into word-like units (Saffran, 2001): TPs between syllables within a discrete word are higher than between syllables straddling word boundaries. However, imagine a sequence of syllables, in which syllable triplets ROSENU and PASETI are embedded and recurrently experienced by an individual. Statistical regularities predict SE given either RO or PA with equal probability, and the transition from SE to either TI or NU also happens with equal probability. Thus, while triplets ROSETI and PASENU have not been embedded as whole units in the familiarization input, they could nevertheless be endorsed as eligible constituents because they are statistically congruent with the collective exemplars from the learning input (Endress & Langus, 2017; Endress & Mehler, 2009; Nosofsky & Zaki, 2002; Ordin, Polyanskaya, & Soto, 2020a; Ordin, Polyanskaya, Soto, & Molinaro, 2020; Roediger & McDermott, 1995). The ability to extrapolate the eligibility of these units from learned input would be useful; it would allow the system to learn regularities from a limited set of exemplars, then generalize these to previously unencountered items, or to novel situations that exhibit the previously encountered statistical features. Here, we try to understand the role of awareness and metacognitive processes in the recognition of old items (encountered and learnt during familiarization) as well as novel items that are statistically congruent with the old ones.

METHOD

Material, participants, and experimental procedure

Here, we use the same data set as Ordin et al. (2020), where the participant sample, experimental procedure, and materials are detailed. Below, we provide only the information essential for this empirical report.

Participants listened to a stream of 12 recurrent, randomly concatenated triplets (ROSENU, PASETI, etc.). TPs were set to 0.5 between syllables within triplets and to 0.16 between syllables at the boundaries of triplets. Participants were familiarized with this stream for 18 minutes and were explicitly instructed to detect and memorize the “words” of this “alien language”. The syllables in the embedded triplets (words) were also used to construct 12 novel triplets (phantoms, e.g., ROSETI, PASENU) that embodied the same TPs as the words. After familiarization, participants performed a dual forced-choice test, in which they listened to a pair of possible word candidates and had to indicate whether the first or the second candidate was a word from the alien language they had been exposed to. Participants were not informed that the words were composed according to rules. After choosing one of the two candidates, participants were next asked to indicate, on a 4-point scale, how sure they were in their response. In the test pairs, we pitted words against phantoms, words against nonwords (concatenations of syllables that had never co-occurred sequentially in the familiarization stream, thus violating the statistical regularities embedded in both words and phantoms), and phantoms against nonwords (the order of token types was counterbalanced in the test pairs). In the original study, EEG recording was made throughout the experiment. However, here we analyze only behavioral data, and therefore also include responses from participants who were discarded from the original analysis due to technical issues related to the EEG signal. This brings the total number of participants to 38.

Analytical approach

To estimate metacognition, we used a signal detection (SDT) approach (Galvin, Podd, Drga, & Whitmore, 2003; Maniscalco & Lau, 2012). Correct responses that also receive high confidence ratings are considered to be metacognitive (Type 2) hits; incorrect responses given with high confidence are conceptualized as meta-false alarms; incorrect responses with low confidence ratings are analyzed as meta-correct rejections; and correct responses with low confidence are taken as meta-misses. This logic is applied to binary confidence choices. The ability to estimate the likelihood of making an error and thus to discriminate between correct and incorrect responses by assigning different confidence ratings on a wider scale can be quantified by Type 2 receiver operating characteristic (ROC) analysis (Galvin et al., 2003). Maniscalco and Lau (2012) proposed a modeling approach to quantify Type 2 ROC area in units of Type 1 d'. The basic idea behind this approach is to estimate the pseudo d' measure that would perfectly fit confidence ratings, not Type 1 decisions. This estimated value, which accounts for confidence ratings instead of cognitive decisions, provides a measure of subject-specific metacognitive sensitivity (i.e., meta-d). Meta- d' estimates the reliability of confidence ratings and may theoretically reflect fluctuations in confidence between groups or conditions even when Type 1 performances do not differ. Importantly, meta-d' is independent on metacognitive bias (i.e., individual tendency to assign higher or lower rating overall). However, meta-d' also reflects the quality of the Type 1 information that is subject to metacognitive processing, and in practice may scale with performance (Fleming & Lau, 2014). Thus, it sometimes makes sense to estimate metacognitive efficiency relative to Type 1 performance. This is represented by the meta-d' to d' ratio (M-ratio). Meta-d' reflects the presence or absence of metacognition in a particular task. However, the M-ratio is more informative when the goal is to compare metacognition across groups or conditions, especially if Type 1 performances differ; it is even more useful when Type 1 differences across conditions and groups are caused by unequal variance in the number of participants and trials per group/condition; by potentially different neural and cognitive mechanisms underlying metacognitive judgments across different tasks; or by differences (including purposefully modulated differences) in the perceptual salience of the signal, individually adjusted perceptual or performance thresholds, or individual differences in encoding information at different presentation rates or different modalities. Meta-d' can easily be compared with d' (because they are measured in the same units on the same scale), but comparing across conditions/domains/tasks/experiments may be challenging. Thus, M-ratio—the measure of metacognitive efficiency, or a subject-specific level of metacognitive sensitivity that takes individual level of Type 1 performance into account—can be more informative.

Fleming (2017) developed a method for hierarchical Bayesian estimation of metacognitive sensitivity and efficiency, which we have adopted for our analysis here. Bayesian estimation is superior to maximum likelihood estimation (MLE) and sum-of-squared error (SSE) methods (Barrett, Dienes, & Seth, 2013; Maniscalco & Lau, 2012) for several reasons: It does not require data “padding” when participants give zero responses with a particular (usually the maximum and minimum) confidence level; it naturally accounts for situations when the number of trials for the levels of confidence differs; and it reduces the influence of a single outlying participant on group results without requiring that this outlier be corrected or removed. It thus allows for all data to be included in the analysis and decreases subjectivity in sampling. Finally—although this is not applicable to the current analysis—Bayesian estimation allows groups with different numbers of participants and different numbers of trials per participant/condition to be compared. This can be important in studies comparing metacognition in adults and children at different ages (it is not always possible to collect an equal number of trials from young children and adult participants), or comparing patients and healthy individuals (patient samples often depend on “convenience,” whereas healthy adult participants are easier to recruit). For hierarchical Bayesian modelling, we used the code developed by Fleming (2017; available at https://github.com/metacoglab/Hmeta-d).

A limitation of this method is that data need to be amended to accommodate a two-choice SDT framework—that is, a 2 (stimulus) × 2 (response) × N (confidence rating) matrix. We adopted the algorithm applied by Ordin, Polyanskaya, and Soto (2020b) for this purpose (essential details are given below):

Word versus nonword pairs (i.e., test trials in which words were paired with nonwords, words happened equal number of times in the first and in the second positions of the test pairs): When the word appeared in Position 1 and the nonword in Position 2, the response was considered a hit if the participant responded “1” (i.e., chose a word over a nonword), but a miss if the participant responded “2” (i.e., missing a word in Position 1). If the word occurred in Position 2, the response was defined as a correct rejection if the participant responded “2” (i.e., rejected the nonword in Position 1) and a false alarm if the participant responded “1” (falsely identified Position 1 as containing the target).

Phantoms versus nonwords: A phantom was operationalized as a signal to be detected, so choosing a phantom over a nonword was considered a correct response. When a phantom appeared in the Position 1 and the nonword in Position 2, the response was considered a hit if the participant responded “1”, but a miss if the participant responded “2” (missing the phantom in Position 1). When a phantom appeared in Position 2, the response was defined as a correct rejection if the participant responded “2” (i.e., rejecting the option that the phantom was in Position 1) and a false alarm if the participant responded “1” (falsely identified Position 1 as containing the target).

Words versus phantoms: In a similar fashion, when a word appeared in Position 1 and a phantom in Position 2, the trial was considered a hit if the participant responded “1”, but a miss, if the participant responses “2” (i.e., missing a target in Position 1). When a word appeared in Position 2, the trial was considered a false alarm if the participant responded “1” (falsely identifying Position 1 as containing the target), and a correct rejection if the participant responses “2” (i.e., correctly rejecting the possibility of the target in Position 1).

We estimated Type 1 sensitivity and bias, metacognitive sensitivity, efficiency, and bias separately for these three pair types to explore metacognitive processes when people select old versus statistically congruent novel items (words vs. phantoms), novel statistically congruent items versus novel statistically incongruent items (phantoms vs. nonwords), and old versus novel statistically incongruent items (words vs. nonwords).

RESULTS

We calculated bias (a tendency to select the first or the second item in the test pair), d', meta-d', M-ratio (see Fig. 1a), and metacognitive bias as mean confidence (see Fig. 1b) for the three pair types. We applied a repeated-measures analysis of variance (ANOVA) to each measure to find whether the scores differed significantly between pair types. In all cases, we used Mauchly’s test to check the sphericity assumption. Where necessary, we corrected degrees of freedom and p values using the Greenhouse–Geisser epsilon, reporting uncorrected degrees of freedom and corrected p values. Each significant test was then split into a series of pairwise comparisons using paired two-tailed t tests, applying Bonferroni correction and reporting corrected p values.

Fig. 1
figure 1

a Type 1 (cognitive) and Type 2 (metacognitive) performance in the test trials pairing words against nonwords (w_nw), phantoms against nonwords (ph_nw), and words against phantoms (w_ph). Type 1 performance is represented on the upper figures (perceptual-cognitive sensitivity on the left and bias on the right, negative sensitivity shows below chance individual performance, positive bias scores mean the overall individual tendency to prefer the first of the two options in the alternative forced-choice test, and negative values show individual tendency to choose the second token). Type 2 performance is represented on the lower figures (metacognitive sensitivity on the left and metacognitive efficiency on the right). On this and Fig. 1b, the dots represent the scores of individual participants, the boxes contain 50% of all data (second and third quartiles), whiskers represent upper and lower (the first and the fourth) quartiles. The horizontal lines inside the boxes represent medians in each sample, and filled dots (red dots in online version) show the means. b Metacognitive bias expressed as mean confidence rating assigned to the trials pairing words versus nonwords, words versus phantoms, and phantoms versus nonwords

D-prime

The analysis showed a significant effect of pair type, F(2, 74) = 9.328, p < .0005, ηp2 = .201. Pairwise comparisons revealed that sensitivity was significantly higher for words over nonwords than for words over phantoms, M = .479 (.1), [.278, .681], t(37) = 4.817, p < .0005. The difference in sensitivity between phantoms over nonwords and words over phantoms was also significant, M = .329 (.126), [.073, .586], t(37) = 2.607, p = .039. We did not observe that sensitivity to words over nonwords was significantly different from sensitivity to phantoms over nonwords, M = .15 (SE of the difference .113), 95% CI of the difference [−.08, .38], t(37) = 1.325, , p = .579. To estimate the evidence for absence of difference in the latter contrast, we calculated the Bayes factor using Bayesian t tests (two-tailed, with the prior that both outcomes are equally likely) for the latter contrast. The result, BF10 = .039, shows that, given the data, the null hypothesis (no difference) is 2.55 times more likely than the alternative hypothesis, which provides some (although weak) evidence for absence of significant difference.

Bias (C’)

The difference in bias between conditions was not significant, F(2, 74) = 2.07, p = .133, ηp2 = .053, suggesting that people did not have different preferences across pair types in endorsing the first or the second item.

Meta-d'

The analysis showed a significant effect of pair type, F(2, 74) = 27.52, p < .0005, ηp2 = .427. Pairwise comparisons revealed that sensitivity to words over nonwords was significantly different from sensitivity to phantoms over nonwords, M = .208 (.07), [.067, .349], t(37) = 2.99, p = .015. Meta-d' was significantly higher on words over nonwords than on words over phantoms, M = .482 (.07), [.342, .623], t(37) = 6.95, p < .0005. Finally, the difference in metasensitivity to phantoms over nonwords and words over phantoms was also significant, M = .275 (.06), [.161, .388], t(37) = 4.92, p < .0005.

Overall, this shows that even though people select both words and phantoms over nonwords with equal sensitivity, their confidence ratings better discriminate between correct and incorrect responses in word versus nonword test trials. This suggests higher metacognitive sensitivity to old items, which can potentially be retrieved from memory and thus benefit from memory representations during the recognition test.

M-ratio

The analysis showed a significant effect of pair type, F(2, 74) = 1079.415, p < .0005, ηp2 = .967. Pairwise comparisons revealed that sensitivity to words over nonwords was significantly different from sensitivity to phantoms over nonwords, M = .156 (.11), [.133, .179], t(37) = 13.85, p < .0005. M-ratio was significantly higher on words over nonwords than on words over phantoms, M = .491 (.008), [.474, .508], t(37) = 58.56, p < .0005. Finally, the difference in M-ratios on phantoms over nonwords and words over phantoms was also significant, M = .334 (.12), [.31, .361], t(37) = 27.09, p < .0005.

Metacognitive bias

Overall, mean confidence did not differ across pair types, F(2, 74) = 2.653, p = .085, ηp2 = .067. This suggests that participants used similar confidence references across all conditions and reveals no tendency to assign higher or lower confidence on any of the pair types.

Importantly, differences in discrimination performance across the three pair types could have influenced mean confidence, indicating that this commonly used measure of metacognitive bias would not be bias free in our experiment. To verify that people had no tendency to assign overall higher or lower confidence ratings across conditions, we explored the relations between Type 1 sensitivity and mean confidence ratings using by-subject correlations. Lower correlations would signify overconfidence (low performance and high confidence). Comparison of correlations could then be used to compare the bias across conditions. We compared correlations between sensitivity and mean confidence in word versus nonword trials (r = .197) and word versus phantom trials (r = -.052), controlling for the correlations in phantom versus nonword trials (r = −.05). The correlations were not significantly different, z = 1.026, p = .152. Difference in correlations between Type 1 sensitivity and mean confidence in word versus phantom and phantom versus nonword pairs was not significant either, z = −.009, p = .496 (controlling for the correlations in the third pair type). Finally, difference in correlations for word versus nonword and phantom versus nonword trials were not significant, z = 1.017, p = .155. This analysis converges with the results of ANOVA on mean confidence ratings. There was no evidence that participants tended to assign higher or lower confidence ratings to any pair type.

DISCUSSION

Overall, the results show that people easily discriminate between tokens that violate statistical regularities and tokens that conform to statistical regularities, both when statistically eligible tokens are old (i.e., were embedded in the familiarization stream) or novel. Sensitivity to words over nonwords and phantoms over nonwords appears to be similar, although direct evidence for absence of difference is weak (given the data, the no difference scenario is only 2.6 times more likely than the alternative). Discrimination between old and novel statistically congruent items is more challenging and less accurate, yet still possible (people preferred words to phantoms, when these two types of tokens are presented as word candidates during the recognition test). This pattern of results suggests that when people make a choice between two word candidates, deviations from learnt structures are more salient than the potential to retrieve the token from memory as a whole constituent. However, relying on memory representations may still occur when people need to choose between two candidates that are both congruent with regularities. Memory representations of recurrent constituents extracted from continuous sensory input during familiarization do modulate confidence judgments: Metacognitive sensitivity is higher in trials in which words are paired with nonwords than in trials in which phantoms are paired with nonwords. Despite the insignificant difference in d' scores between these conditions, M-ratio is higher when words are paired with nonwords, pointing to a facilitatory effect of memory representations for old items.

Although metacognitive processing is indeed possible without conscious awareness (Jachs, Blanco, Grantham-Hill, & Soto, 2015; Kentridge & Heywood, 2000), in this experiment participants were explicitly told to detect and memorize the words of “an alien language” during the familiarization stage. During the recognition test, participants also made conscious choices. Thus we, assume that metacognitive processing in our experiment is strongly coupled with conscious awareness of knowledge or conscious assessment of the feeling of familiarity with respect to the content of statistical learning (Kunimoto, Miller, & Pashler, 2001; Persaud et al., 2011; Persaud, McLeod, & Cowey, 2007; Shimamura, 2008). If participants were not consciously aware of the content of their knowledge, their confidence ratings would not be informative with respect to the correctness of their responses, even if recognition performance were above chance. Following this logic (Ko & Lau, 2012; Maniscalco & Lau, 2012; Persaud et al., 2007; Shimamura, 2008), we believe that (at least in the current experimental setup) postdecision confidence judgments objectively measure conscious awareness in the choice of words over nonwords and phantoms, and phantoms over nonwords. Thus, our results suggest that people are more aware of statistical regularities (and violations of such regularities) than memorized triplets, although they can still discriminate these old triplets from novel items that conform to acquired regularities. This means that the statistical learning process produces both discrete constituents and statistical rules (e.g., TPs), yet these are subjected to varying degrees of conscious processing. In this regard, it is important to emphasize that explicit instructions equipped participants with information that the familiarization stream contained discrete constituents (aka were words in an “alien language”), but the existence of statistical rules was not mentioned to the participants. We propose that, over the course of learning, attention was reshifted from constituents to structures, subjecting them to more conscious awareness. This could support the extraction of rules from a small number of cases, and generalization of these extracted rules to previously unencountered situations. It would also be interesting to explore how conscious awareness is modulated by drawing attention (via explicit instructions) not only to the presence of discrete constituents in the sensory input but also to the regularities embedded in that input. It would also be interesting to explore metacognitive processes elicited by an identification task (yes/no recognition task, where the response requires deciding whether a presented token is a word constituent from a sensory input one has been exposed to) as opposed to a discrimination task.

Our data leave many questions unanswered. We do not know whether the observed differences in metacognitive processing were based on different strategies for assigning confidence ratings for the pair types, or on reduced error detection (people might not have perceived rejected phantoms as errors in word–phantom pairs, which would prevent error detection mechanisms from functioning). Furthermore, it is important to note that d' provides a bias-free measure of perceptual and cognitive sensitivity to the signal (in this case, presence of structure, presence of memory representations, or both), without making any conclusions in regard to the processing architecture, and meta-d' provides an estimate of metacognitive sensitivity, also without pointing to a particular type of processing mechanism. Both measures can be driven by multiple, sometimes confounding factors. Studies regarding Type 1 sensitivity in statistical learning have already spawned a rich body of literature. Siegelman, Bogaerts, Christiansen, and Frost (2017) and Frost, Armstrong, Siegelman, and Christiansen (2015) have summarized that individual differences in statistical learning efficiency can be modulated by (a) computational ability (i.e., how well an individual registers statistical regularities and employs them to detect boundaries between discrete constituents); (b) individual differences related to experience with a particular domain, task, or type of material (more expertise makes an individual better at dealing with statistical regularities with higher degrees of complexity); and (c) psychophysics differences related to individual ability to encode information in a particular modality at a particular rate with a particular just-noticeable detection or discrimination threshold. Each of these factors can also affect metacognitive sensitivity and feelings of confidence. Additionally, metacognitive sensitivity can be influenced by (a) information (usually a subjective feeling for which the participant has no logical explanation because the information is not subject to conscious processing) that is taken into account when making metacognitive judgments, but not cognitive decisions (Jachs et al., 2015; Scott, Dienes, Barret, Bor, & Seth, 2014); (b) further processing of the stimulus information after the decision is already taken, leading to meta-d' scores being higher than d' scores (Rabbit & Vyas, 1981); or (c) self-evaluation of one’s feeling of confidence (Fleming & Daw, 2017). Further research is needed to explore these multiple drivers of metacognitive sensitivity, efficiency, and bias in statistical learning tasks, which are reflecting in meta-d' and M-ratio measures.

REFERENCES

  • Alamia, A., & Zenon, A. (2016). Statistical regularities attract attention when task-relevant. Frontiers in Human Neuroscience, 10, 42.

    Article  Google Scholar 

  • Arciuli, J., von Koss Torkildsen, J., Stevens, D. J., & Simpson, I. C. (2014). Statistical learning under incidental versus intentional conditions. Frontiers in Psychology, 5, 747. https://doi.org/10.3389/fpsyg.2014.00747

    Article  PubMed  PubMed Central  Google Scholar 

  • Aslin, R. N., & Newport, E. L. (2012). Statistical learning: From acquiring specific items to forming general rules. Current Directions in Psychological Science 21, 170–176. https://doi.org/10.1177/0963721412436806

    Article  PubMed  PubMed Central  Google Scholar 

  • Baldwin, D., Andersson, A., Saffran, J., & Meyer, M. (2008). Segmenting dynamic human action via statistical structure. Cognition, 106, 1382–1407.

    Article  Google Scholar 

  • Barrett, A. B., Dienes, Z., & Seth, A. K. (2013). Measures of metacognition on signal-detection theoretic models. Psychological Methods, 18, 535–552.

    Article  Google Scholar 

  • Dienes, Z., Broadbent, D., & Berry, D. C. (1991). Implicit and explicit knowledge bases in artificial grammar. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 875–887.

    PubMed  Google Scholar 

  • Dunlosky, J., Serra, M. J., & Baker, J. M. C. (2007). Metamemory. In F. T. Durso, R. S. Nickerson, S. T. Dumais, S. Lewandowsky, & T. J. Perfect (Eds.), Handbook of applied cognition (pp. 137–160). New York, NY: John Wiley & Sons.

    Chapter  Google Scholar 

  • Endress, A. D., & Langus, A. (2017). Transitional probabilities count more than frequency, but might not be used for memorization. Cognitive Psychology 92, 37–64.

    Article  Google Scholar 

  • Endress, A. D., & Mehler, J. (2009). The surprising power of statistical learning: When fragment knowledge leads to false memories of unheard words. Journal of Memory and Language 60(3), 351–367.

    Article  Google Scholar 

  • Fernandes, T., Kolinsky, R., & Ventura, P. (2010). The impact of attention load on the use of statistical information and co-articulation as speech segmentation cues. Attention, Perception, & Psychophysics 72, 1522–1532.

    Article  Google Scholar 

  • Flavell, J. H. (1979). “Metacognition and cognitive monitoring: A new area of cognitive-development inquiry. American Psychologist 34(10), 906–911.

    Article  Google Scholar 

  • Fleming, S. (2017). HMeta-d: hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of Consciousness, 2017(1). https://doi.org/10.1093/nc/nix007

  • Fleming, S., & Daw, N. D. (2017). Self-evaluation of decision-making: A general Bayesian framework for metacognitive computation. Psychological Review 124(1), 91–114.

    Article  Google Scholar 

  • Fleming, S., & Lau, H. (2014). How to measure metacognition. Frontiers in Human Neuroscience, 8, 443. https://doi.org/10.3389/fnhum.2014.00443

    Article  PubMed  PubMed Central  Google Scholar 

  • Frost, R., Armstrong, B., Siegelman, N., & Christiansen, M. (2015). Domain generality versus modality specificity: The paradox of statistical learning. Trends in Cognitive Science 19(3), 117–125.

    Article  Google Scholar 

  • Galvin, S. J., Podd, J. V., Drga, V., & Whitmore, J. (2003). Type 2 tasks in the theory of signal detectability: Discrimination between correct and incorrect decisions. Psychonomic Bulletin & Review, 10, 843–876.

    Article  Google Scholar 

  • Gómez, D. M., Bion, R., & Mehler, J. (2011). The word segmentation process as revealed by click detection. Language and Cognitive Processes, 26(2), 212–223.

    Article  Google Scholar 

  • Hard, B. M., Meyer, M., & Baldwin, D. (2019). Attention reorganizes as structure is detected in dynamic action. Memory & Cognition, 47, 17–32.

    Article  Google Scholar 

  • Harris, Z. (1955). From phoneme to morpheme. Language, 31, 190–222.

    Article  Google Scholar 

  • Jachs, B., Blanco, M., Grantham-Hill, S., & Soto, D. (2015). On the independence of visual awareness and metacognition: A signal detection theoretic analysis. Journal of Experimental Psychology: Human Perception and Performance 41(2), 269–276.

    PubMed  Google Scholar 

  • Kentridge, R.W. & Heywood, C.A. (2000). Metacognition and awareness. Consciousness and Cognition, 9, 308–312.

    Article  Google Scholar 

  • Kepecs, A., Uchida, N., Zariwala, H., & Mainen, Z. (2008). Neural correlates, computation and behavioural impact of decision confidence. Nature, 455, 227–231.

    Article  Google Scholar 

  • Ko, Y., & Lau, H. (2012). A detection theoretic explanation of blindsight suggests a link between conscious perception and metacognition. Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 1401–1411.

    Article  Google Scholar 

  • Kunimoto, C., Miller, J., & Pashler, H. (2001). Confidence and accuracy of near-threshold discrimination responses. Consciousness and Cognition, 10(3), 294–340.

    Article  Google Scholar 

  • Maniscalco, B., & Lau, H. (2012). A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and Cognition, 21(1), 422-430.

    Article  Google Scholar 

  • Nelson, T. (1996). Consciousness and Metacognition. American Psychologist 51, 102-116.

    Article  Google Scholar 

  • Nelson, T., & Narens, L. (1990). Metamemory: A theoretical framework and new findings. Psychology of Learning and Motivation 26, 125–173.

    Article  Google Scholar 

  • Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Journal of Experimental Psychology: Learning Memory and Cognition 28(5), 924–940.

    Google Scholar 

  • Ordin, M., Polyanskaya, L., & Soto, D. (2020a). Neural bases of learning and recognition of statistical regularities. Annals of the New York Academy of Sciences, 1467, 60–76.

    Article  Google Scholar 

  • Ordin, M., Polyanskaya, L., & Soto, D. (2020b). Metacognitive processing in language learning tasks is affected by bilingualism. Journal of Experimental Psychology: Learning Memory and Cognition, 46(3), 529–538.

    Google Scholar 

  • Ordin, M., Polyanskaya, L., Soto, D., & Molinaro, N. (2020). Electrophysiology of statistical learning: Exploring the online learning process and offline learning product. European Journal of Neuroscience, 51(9), 2008–2022.

    Article  Google Scholar 

  • Persaud, N., Davidson, M., Maniscalco, B., Mobbs, D., Passingham, R. E., Cowey, A., & Lau, H. (2011). Awareness-related activity in prefrontal and parietal cortices in blindsight reflects more than superior visual performance. NeuroImage, 58, 605–611.

    Article  Google Scholar 

  • Persaud, N., McLeod, P., & Cowey, A. (2007). Post-decision wagering objectively measures awareness. Nature Neuroscience, 10, 257–261.

    Article  Google Scholar 

  • Rabbit, P., & Vyas, S. (1981). Processing a display even after you make a response to it. How perceptual errors can be corrected. The Quarterly Journal of Experimental Psychology, A, 33(3), 223–239.

    Article  Google Scholar 

  • Reber, A. S., Kassin, S. M., Lewis, S., & Cantor, G. (1980). On the relationship between implicit and explicit modes in the learning of a complex rule structure. Journal of Experimental Psychology: Learning, Memory, and Cognition, 6, 492–502.

    Google Scholar 

  • Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814.

    Google Scholar 

  • Saffran, J. (2001). Words in a sea of sounds: The output of infant statistical learning. Cognition, 81(2), 149–169.

    Article  Google Scholar 

  • Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month old infants. Science, 274, 1926–1928.

    Article  Google Scholar 

  • Schraw, G. (1998). Promoting general metacognitive awareness. Instructional Science, 26, 113–125.

    Article  Google Scholar 

  • Scott, R., Dienes, Z., Barret, A., Bor, D., & Seth, A. (2014). Blind insight: Metacognitive discrimination despite chance task performance. Psychological Science, 25(12), 2199–2208.

    Article  Google Scholar 

  • Shimamura, A. P. (2008). A neurocognitive approach to metacognitive monitoring and control. In J. Dunlosky & R. A. Bjork (Eds.), Handbook of metamemory and memory (pp. 373–390). New York, NY: Psychology Press.

    Google Scholar 

  • Siegelman, N. (2019). Statistical learning abilities and their relation to language. Language and Linguistics Compass, 14(3). https://doi.org/10.1111/lnc3.12365

  • Siegelman, N., Bogaerts, L., Armstrong, B. C., & Frost, R. (2019). What exactly is learned in visual statistical learning? Insights from Bayesian modeling. Cognition, 192, 104002. https://doi.org/10.1016/j.cognition.2019.06.014

    Article  PubMed  Google Scholar 

  • Siegelman, N., Bogaerts, L., Christiansen, M. H., & Frost, R. (2017). Towards a theory of individual differences in statistical learning. Philosophical Transactions of the Royal Society of London: Series B, Biological Sciences, 372(1711). https://doi.org/10.1098/rstb.2016.0059

  • Toro, J. M., Sinnett, S., & Soto-Faraco, S. (2005). Speech segmentation by statistical learning depends on attention. Cognition, 97, B25–B34.

    Article  Google Scholar 

Download references

ACKNOWLEDGEMENT

We are thankful to Magda Altman for editing and proofreading the manuscript.

Funding

This study was supported by the European Commission via H2020 Marie Skłodowska-Curie Actions (Grant Number DLV-792331), and Spanish Ministerio de Economía y Competitividad (Grant Number RTI2018-098317-B-I00). The research institute is supported by the Spanish Ministry of Economy and Competitiveness through the “Severo Ochoa” Programme for Centres/Units of Excellence in Research and Development (SEV-2015-490).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mikhail Ordin.

Additional information

OPEN PRACTICES STATEMENT

We used the same data set reported in Ordin, Polyanskaya, Soto, and Molinaro (2020). The raw data are available from the corresponding author on request (mordin@bcbl.eu), and is deposited online on Figshare (doi:10.6084/m9.figshare.9989000 and doi:10.6084/m9.figshare.9988610). The analysis scripts for hierarchical Bayesian modelling of metacognitive sensitivity are developed by Prof. Stephen Fleming, who made them publicly available (https://github.com/metacoglab/Hmeta-d; last accessed on 25/04/2020).

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ordin, M., Polyanskaya, L. The role of metacognition in recognition of the content of statistical learning. Psychon Bull Rev 28, 333–340 (2021). https://doi.org/10.3758/s13423-020-01800-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13423-020-01800-0

Keywords

  • Statistical learning
  • Awareness
  • Metacognition
  • Confidence