Numerous studies have investigated the association between self-assessed and objectively measured performance, which is commonly called calibration accuracy (Gignac & Zajenkowski, 2020; Wollenschläger et al., 2016). It is widely acknowledged that individuals tend to overestimate or underestimate their performance in various contexts (Coutinho et al., 2021; De Bruin et al., 2017; Dunning et al., 2003; Sanchez & Dunning, 2018) and that confidence judgement is linked to multiple factors (Christopher et al., 2023; Efklides, 2011; Geraci et al., 2023; Serra & DeMarree, 2016; Stankov et al., 2012). However, misalignment between confidence and performance can have long-term consequences for people’s lives (Geraci et al., 2023). Within the educational context, students’ ability to accurately evaluate their performance may allow them to self-regulate their learning (Andrade, 2019; Hosein & Harle, 2018). Conversely, an inaccurate self-assessment could hamper this process, leading them to underutilise their capabilities (Vonkova et al., 2021) or even make suboptimal career choices because of underestimating their abilities (Ehrlinger & Dunning, 2003). For example, an overly critical self-assessment of one’s math abilities significantly contributes to girls’ lower motivation in STEM subjects and their limited consideration of STEM careers (OECD, 2015).

The Dunning–Kruger effect (DKE) refers to the cognitive bias where individuals with low abilities or knowledge overestimate their skills or expertise, while those with high abilities or knowledge tend to underestimate their skills relative to others (Dunning, 2011, 2015; Kruger & Dunning, 1999). Although the DKE is pervasive across domains (Erickson & Heit, 2015; Ernst et al., 2023; Holling & Preckel, 2005; Kruger & Dunning, 1999; Sheldon et al., 2014), most research has focused on upper-secondary or university students (Christopher et al., 2023; Geraci et al., 2023; Vonkova et al., 2021) and single-country settings (Coutinho et al., 2021; Christopher et al., 2023; Vonkova et al., 2022). Limited attention has been given to younger populations (Boekaerts & Rozendaal, 2010; Dapp & Roebers, 2021) and exploring the confidence–performance dynamics across cultures (Morony et al., 2013; Williams & Williams, 2010) or between similar cultures, that is, across Europe.

The present study examines the relationship between students’ mathematics performance and confidence judgements of their item-specific mathematics competence. Next, we investigate cross-country DKE patterns and the influence of demographics, motivational, and identity constructs in connection with the DKE in primary mathematics classrooms.

The role of self-assessment and confidence judgements

Competent decision-making is essential in daily life and any profession (Kleitman et al., 2019), where accuracy and confidence judgement are key ingredients. Accurate assessment of one’s performance is crucial for suitable self-regulation, such as in learning processes (Andrade, 2019; Hosein & Harle, 2018) or career decision-making (Ehrlinger & Dunning, 2003). On the contrary, inaccurate self-assessment could hinder these processes, preventing individuals from utilising their capabilities to their full potential (Vonkova et al., 2021).

Andrade (2019) acknowledges a broad spectrum of self-assessment definitions in education that encompass an assessment of one’s abilities (Brown & Harris, 2013), processes (Panadero et al., 2016), and products (Epstein et al., 2008). Despite this variety, the definitions work because each object of assessment—competence, process and product—is influenced by self-feedback. At the same time, much of the research on self-assessment is oriented towards the accuracy of students’ judgements (Andrade, 2019; Brown & Harris, 2013), yielding mixed results regarding the relationship between student self-ratings and other objective measures ranging from weak (0.20) to strongly positive (0.80).

Another critical component of decision-making is confidence judgement, representing the subjective metacognitive experience stemming from accuracy judgements theory (see Efklides, 2006, 2008, 2011; Stankov, 1999, 2019; Stankov et al., 2012; Stankov et al., 2014). Stankov et al. (2012) suggest that confidence is a robust individual difference situated somewhere between ability and personality (Stankov & Lee, 2008), implementing not only those processes related to task performance but also certainty in beliefs about hypothetical events (Kleitman & Stankov, 2007).

Confidence judgements serve as a metacognitive monitoring measure (Stankov, 2019), and calibration is essential for their effectiveness (Efklides, 2008). Confidence judgements also act as the ‘gatekeeper’ in the decision-making process (Gilovich et al., 2002), with greater confidence expected to be associated with more competent performance (Kleitman et al., 2019; Stankov et al., 2014). However, individuals can make incorrect judgements of their ability despite high confidence, making overconfidence one of the most significant cognitive biases. Nevertheless, overconfidence can be an adaptive response to certain situations (Parker et al., 2012), warranting a thorough examination of individual differences in confidence and performance and their potential miscalibration (Kleitman et al., 2019).

The Dunning–Kruger effect—to be or not to be?

The Dunning–Kruger effect arises when an individual’s relatively low competence prevents them from accurately assessing their performance level (Kruger & Dunning, 1999) because low performers are ‘operating from incomplete and corrupted knowledge, they would make many mistakes and not recognise those mistakes as they made them’ (Dunning, 2011, p. 260). Conversely, those who excel in a given area tend to underestimate their skills compared with others, thinking the task is simple for everyone. Thus, the DKE can be viewed as the problem of systematic individual differences in meta-cognition (Schlösser et al., 2013).

The manifestation of the DKE varies across different domains (Ernst et al., 2023; Holling & Preckel, 2005) or tasks (Tashiro et al., 2021), but its pervasive influence extends to the cognitive, social and emotional domains (Erickson & Heit, 2015; Kruger & Dunning, 1999; Sheldon et al., 2014). Although capturing the magnitude of the DKE is challenging, a recent meta-analysis reports a mean correlation of 0.29 between self-assessed performance and objectively measured performance (Zell & Krizan, 2014). Some general approaches have been applied to detect the presence of the DKE. One method involves using objective measures of performance or knowledge, such as standardised tests or evaluations by domain experts. By comparing self-assessed competence with these objective measures, it is possible to identify those cases in which people overestimate their abilities or knowledge. However, criticism has also been raised, suggesting that the DKE may be a statistical artefact (Gignac & Zajenkowski, 2020; Magnus & Peresetsky, 2022; Sullivan et al., 2019). Nevertheless, studies refuting DKE can explain only a portion of the effect (see Ehrlinger et al., 2008). Conversely, the evidence supporting the DKE suggests the presence of metacognitive differences in online monitoring between low and high performers (McIntosh et al., 2019). Even so, a call for more robust methodologies in assessing the DKE led to more diversity in methodological approaches and a more robust screening of different DKE correlates (Gignac & Zajenkowski, 2020).

The confidence–competence relationship and its correlates

The confidence–competence relationship has attracted much attention, not only in examining the association itself but also in exploring the factors behind this relationship. Stankov and colleagues have consistently demonstrated that confidence is the best predictor of mathematics performance, explaining a substantial portion of the predictive variance of other self-beliefs, such as self-efficacy, self-concept and perceived competence (Morony et al., 2013; Stankov, 1999, 2019; Stankov et al., 2012; Stankov et al., 2014; Stankov & Lee, 2014, 2017). Confidence accounts for approximately 45% or more of the total variance in mathematics performance (Stankov & Lee, 2014, 2017). In general, students’ accuracy in self-assessment is linked to their prior mathematical success and overall (mathematics) confidence (Hosein & Harle, 2018). Komarraju and Nadler (2013) also show that students with high confidence levels believe intelligence is malleable, whereas those with low confidence tend to view intelligence as fixed. For the latter, a fixed mindset may discourage engagement in metacognitive monitoring because of this belief in competence predetermination; thus, possible feedback is disregarded, contributing to further misestimation.

Furthermore, little difference is observed between regions when investigating cross-cultural (in)variance in confidence, and confidence remains the primary predictor of mathematics scores (Morony et al., 2013; Stankov & Lee, 2014). Morony et al. (2013) examine two world regions: Confucian Asia (i.e., Hong Kong, Singapore, South Korea and Taiwan) and Europe (Denmark, The Netherlands, Finland, Latvia and Serbia), finding significant differences in calibration but not the construct of confidence (i.e., the invariance in the meaning of the construct was kept across contexts). Europeans display more overconfidence, probably because of the lower overall mathematics scores of students from Serbia and Latvia, while Confucian Asia shows far more cautious estimations. However, differences between the European countries are also noted—the mean confidence level between European countries ranges from 48.29 in Finland to 64.24 in the Netherlands. In a similar study focused on spatial navigation, Walkowiak et al. (2023) show that cultural clusters of countries tend to be similarly confident; however, when observing Europe alone, Germanic and East European countries show the most overconfidence when compared with the Nordic countries.

In mathematics, compared with other academic subjects, students are prone to show a higher degree of overconfidence (Erickson & Heit, 2015). Task difficulty has been found to significantly affect students’ self-assessments (Tashiro et al., 2021), with students more likely to underestimate their competence after engaging in a more difficult task and overestimate own competence following an easier one, independent of performance. Boekaerts and Rozendaal (2010) show that fifth graders might struggle with accurately assessing their performance, and calibration differs depending on the subject (e.g., application problems vs. computation problems). They further argue that repeated underestimation of performance may indicate a lack of confidence, ultimately affecting other aspects of motivation and performance. Conversely, students who overestimate their performance may experience higher short-term motivation. They may also become frustrated by the long-term outcomes of their efforts.

Likewise, Tashiro et al. (2021) find that how students perceive their understanding, compared with their peers, significantly affects their assessment. For example, students who later reported lower understanding than their peers were more likely to downgrade their self-assessed understanding. Similarly, we can postulate that perceived competence (i.e., student’s perception of competence over achievement-related tasks; Pekrun, 2006) will affect students’ confidence judgements, especially given that, for the latter to be present, one should also hold positive perceptions of competence. Furthermore, mathematics identity (i.e., an individual’s sense of self in the mathematics domain, Darragh, 2016) is argued to be a critical factor related to the development of student attainment, affect and involvement in mathematics (Eccles, 2009; Miller & Wang, 2019), providing new insights into why some students may underachieve, underestimate their own competence or disengage from mathematics, despite their prior abilities.

Regarding gender differences, the findings are inconsistent. Although some studies show no difference (Hosein & Harle, 2018; Tashiro et al., 2021), others indicate males consistently overestimate their performance (Boekaerts & Rozendaal, 2010; Marks et al., 2018), including cross-country comparisons (Morony et al., 2013). Among the countries examined in Morony et al.’s (2013) study, minimal gender differences are found only in Singapore, while these are far more prominent in other observed countries. Regarding age, it is argued that older learners tend to display greater consistency in their judgements (Guillory & Blankson, 2017; Nagel & Lindsey, 2018) and that consistency improves with experience (Nagel & Lindsey, 2018). The findings regarding the contribution of feedback have been inconsistent (Tashiro et al., 2021; Thawabieh, 2017). Similarly, socioeconomic status (SES) is often included as a covariate in examining the complex confidence–competence relationship and possible influences on students’ self-assessment and performance in mathematics. Stankov and Lee (2014, 2017) show that SES accounts for around 30% of the variance explaining performance.

Current study

Previous research has underscored the importance of students’ accurate self-assessment of their performance, particularly because of its crucial role in the self-regulation of learning (Andrade, 2019; Hosein & Harle, 2018). Moreover, in other domains of life, competent decision-making holds substantial importance, affecting different life choices (Kleitman et al., 2019).

From an early stage, children’s self-assessments reflect their objective performance (Dapp & Roebers, 2021). However, multiple factors affect the process, namely, confidence (e.g., Morony et al., 2013; Stankov, 2019; Stankov et al., 2012), task features (Tashiro et al., 2021), gender (Boekaerts & Rozendaal, 2010; Marks et al., 2018), SES (Stankov & Lee, 2014, 2017) and motivation (Boekaerts & Rozendaal, 2010). These factors can hinder students’ capacity to calibrate their judgements. Unfortunately, a cross-cultural examination of this process related to mathematics remains limited (Morony et al., 2013).

The current study focuses on primary students in six European countries, exploring the following research questions:

  1. (1)

    What is the relationship between students’ mathematics performance and confidence judgements of their item-specific mathematics competence? That is, can the DKE be established in younger populations? Given the prevalence of studies in older age groups and university populations (Christopher et al., 2023; Geraci et al., 2023; Vonkova et al., 2021), we expect the phenomenon to be somewhat visible across the observed population

  2. (2)

    Are there cross-country differences in the DKE? Based on cross-cultural studies on confidence-performance dynamics (Morony et al., 2013), we expect DKE to be detected across all the examined European countries. At the same time, similar to Morony et al. (2013), Stankov and Lee (2014) and Walkowiak et al. (2023), we postulate country-specific patterns will be detected in the younger age groups

  3. (3)

    Which demographic factors contribute to DKE? Based on previous studies on the effect of gender (Boekaerts & Rozendaal, 2010; Marks et al., 2018) and SES’s contribution to explaining performance (Stankov & Lee, 2014, 2017), we examine their contribution in connection to the DKE

  4. (4)

    How does the DKE align with one’s mathematics identity and perceived competence in mathematics? This research question is more exploratory. However, based on some prior studies observing these concepts and mutual relationships (e.g., Darragh, 2016; Eccles, 2009; Pekrun, 2006), we expect students who exhibit higher levels of perceived competence and mathematics identity to be less susceptible to the DKE

Methods

The current study used the first-wave data collected within international longitudinal research focused on the development of mathematics motivation in primary education—Co-Constructing Mathematics Motivation in Primary Education–A Longitudinal Study in Six European Countries (MATHMot for short)—funded by the Research Council of Norway (grant number 301033). All students in the six participating countries, namely, Estonia, Finland, Norway, Portugal, Serbia and Sweden, that were involved in the first wave were included in the investigation. The distribution of participants in each country and grade is presented in Table 1. The sample size ranged from 843 in Estonia to 1093 in Norway in grade 3 and from 851 in Estonia to 1255 in Portugal in grade 4. In total, 6073 third graders and 5696 fourth graders were included in the analysis.

Table 1 Number of students in grade 3 and grade 4 in all six countries

Variables

In the first wave of the MATHMot data collection, students in both grades 3 and 4 were asked to answer questions related to, among other things, their background (i.e., number of books at home and language spoken at home), their motivation in learning mathematics (i.e., including perceived competence in mathematics), confidence judgement and mathematics identity. Table 2 shows the variables used in the current study and the variable properties of each grade and country.

Table 2 Information on the variables involved in the analyses

Perceived competence in mathematics (PCM) was measured using the frequency of students endorsing the five statements about self-perception of their general mathematics competence, such as ‘Math is easy for me’. PCM was delivered as a subscale of the Expectancy Value Scale (Peixoto et al., 2023), which is grounded on Eccles’ situated expectancy value theory (Eccles & Wigfield, 2020). Each statement had four response alternatives, ranging from ‘very often’ to ‘never’. The same items were used for both grades. Cronbach’s alphas for PCM were satisfactory across grades and countries, with a trend of higher values in grade 4 (see Table 2 for details and Appendix).

The construct of students’ mathematics identity (MI) was measured by six statements related to their perceived personal (e.g., I think I am a math person) or recognised perceptions (e.g., My family thinks of me as a math person). The MI scale was adapted and further advanced based on the prior work of Vincent-Ruz et al. (2018) and Miller and Wang (2019). Each of the six items has four response alternatives, ranging from ‘very often’ to ‘never’. The Cronbach’s alpha for both grades in almost all countries was over 0.75. Generally, grade 4 samples observed higher reliability in this construct, ranging from 0.780 in Serbia to 0.832 in Estonia. For the grade 3 samples, Finland had the highest reliability at 0.810 and Portugal had the lowest at 0.744. Earlier analyses on the validation of the EVS scale (Peixoto et al., 2023) have shown a correlation of 0.55 between the PCM and MI measures.

Each student was also given a mathematics test with 12 items in grade 3 and 14 items in grade 4. The test was timed (i.e., 25 min in grade 3 and 30 min in grade 4). The tasks were chosen from the previous TIMSS assessment (Approval IEA-22–022), depicting the curriculum span of the involved countries. The students received one point for each correct answer, resulting in a 12-point maximum score in grade 3 and 14 in grade 4. Math performance scores were estimated using the Rasch model based on all item responses. The item pool included seven shared items in both grades as linking items. The students’ estimated math scores were rescaled in the present study within each grade, with an average score of 0 and standard deviation of 1. After each mathematics test item, students were asked how confident they were in solving a task (Kleitman & Stankov, 2007), responding on a 4-point scale (ranging from ‘not at all confident’ to ‘very confident’). These formed the construct of confidence judgement. The construct in the present study captures item-specific self-confidence in mathematics.

The dummy variable language used at home differentiates the native vs. non-native students, indicating the student’s immigrant background. Students’ sex was also a dummy variable, while the number of books at home, which is a proxy for SES, is 5-point categorical variable (see Table 2).

Table 3 presents the percentage of students in each variable category across countries and grades. The distribution of boys and girls was similar for all subgroups (i.e., grades and countries). Ethnicity, as measured by the frequency of speaking the test language at home, captured students’ cultural diversity. With that in mind, Norway had the largest proportion of language diversity, while Serbia had the smallest. For all countries, the largest proportion of students reported having about 26 to 100 books in their homes, ranging from 34% in Portugal to about 40% in Estonia.

Table 3 Distribution of variables in each grade and country (values are given in %)

Analytical approach

Several steps were taken to investigate the four research questions. First, we tested the structure of the PCM and MI measures. Unidimensionality for each scale was established. Second, the factor scores were calculated for the item-specific confidence judgement, PCM and MI. The factor score was estimated using the pooled data of all countries for each grade. The estimated math score for the three constructs was based on all six countries’ pooled data, with an overall mean of 0 and a standard deviation of 1. In doing so, the country-specific mean and standard deviation were relative to a common scale and support cross-country comparisons. In the next step, the mean values of the factor scores of mathematics test scores and item-specific confidence judgement at each quintile of mathematics performance were plotted to study how the two variables posited with each other. Finally, the correlation between mathematics performance scores and confidence judgements was estimated for each quintile. The proportion of males, native individuals, SES (indicated by the number of books at home), PCM and MI was used as controls in both grades and countries to depict the characteristics of individuals in each mathematics quintile.

Results

We start this section by observing the patterns between students’ mathematics performance (i.e., total math score), confidence judgement of their item-specific mathematics competence, MI and PCM. The section further examines the relationship between students’ mathematics performance and confidence judgement of their item-specific mathematics competence, that is, the DKE effect. This was done by plotting the mean values of the two construct measures on the mathematics quintile scale for each country and grade. The bivariate correlation coefficient and partial correlation of the two constructs were also presented for each mathematics performance quintile. The results section continues by examining cross-country DKE differences in the six European countries and those factors contributing to the more substantial DKE effect.

Comparison of mathematics performance, confidence judgement, math identity and perceived competence

As shown in Appendix, Swedish third-grade children achieved the highest math performance scores (0.30) even though they exhibited the lowest confidence judgement (− 0.32) and a relatively low level of MI (− 0.11). A similar pattern was observed among Estonian third graders, who achieved math performance of 0.29 and low levels of confidence judgement (− 0.16) and MI (− 0.25). In contrast, Norwegian students had the lowest math performance (− 0.28), the lowest MI (− 0.41) and relatively low confidence judgement (− 0.14). Portuguese pupils had the highest confidence judgements (0.46) but only achieved an average level on the mathematics test (− 0.06). The third graders in Serbia achieved relatively low scores in mathematics (− 0.20), and notably, they had the highest level of MI (0.46) and relatively high confidence judgement (0.16). Finnish children achieved at or below average levels in mathematics, but interestingly, they had the highest perceived mathematics competence (0.30). Conversely, their Portuguese counterparts had the lowest PCM (− 0.20), despite having a very high item-specific confidence judgement and MI (0.17).

In grade 4 (for details, see Appendix), Estonian students achieved the highest performance in mathematics (0.37), while their confidence judgement was the lowest (− 0.27) and MI was also relatively low (− 0.21). Norwegian students held the lowest math performance (− 0.17) coupled with low MI scores (− 0.31) and rather low confidence judgements (− 0.15) and PCM (− 0.12). Again, the Portuguese students had the highest confidence judgement (0.48) and above average scores on MI (0.13). Regarding PCM, we have found a similar pattern as the one in grade 3—Finland holds the highest value (0.40) and Portugal the lowest (− 0.21). Fourth graders in Serbia achieved around the average in mathematics and maintained an average PCM. However, their confidence judgement and MI were rather high (0.40 and 0.22, respectively).

Can we observe the Dunning–Kruger effect?

A common pattern is exhibited in Fig. 1, plotting the quintile mean students’ mathematics scores and their confidence judgement in grade 3. In all countries, low achievers tended to overestimate their test item-specific confidence judgement compared with their high achiever counterparts. Generally, students who achieved below the third quintile (around the 60th percentile) tended to overestimate their item-specific math competence. However, different country-specific patterns can also be observed when scrutinising the within-quintile correlation between the mathematics score and confidence judgement (see Table 4).

Fig. 1
figure 1

The mean plot of the mathematics performance score and confidence judgement across each quintile of mathematical competence in grade 3

Table 4 Pearson’s correlation coefficients between mathematics performance and confidence judgement for the total sample and within each math score quintile in grade 3

The analysis revealed a significant negative correlation between students’ confidence judgement of their item-specific mathematics competence and their actual mathematics performance across all countries in the total sample, with correlations ranging from − 0.60 in Portugal to − 0.48 in Norway. However, the strength of this negative correlation varied when examining subpopulations of students in each quintile of mathematics performance levels. Specifically, those students in the lowest quintile of mathematics performance in Estonia, Norway, Serbia and Sweden significantly misestimated their item-specific competence, with the highest negative correlation observed in Sweden (− 0.58). The incorrect estimation of students’ item-specific math competence was prevalent only in the low-achieving quintile. As the mathematics performance score increased, only one or two countries observed misestimation. In Norway, students in the 4th quintile misestimated their math competence by − 0.23.

Sweden presented a unique case in which students in almost all quintiles significantly wrongly estimated their item-specific math competence, except for those in the 4th quintile. The only significant correlation between mathematics performance and confidence judgement in the 5th quintile was observed in Sweden (− 0.24). Conversely, in Finland, the students demonstrated independence in confidence judgements and item-specific mathematics competence, with no significant correlations observed in any of the quintiles. Finally, Portuguese students in the 3rd and 4th quintiles significantly underestimated their math performance (− 0.28 and − 0.19, respectively).

Figure 2 displays the plotted quintile means of the mathematics scores and students’ confidence judgements in grade 4. The same pattern observed in grade 3 was also evident in grade 4, whereby all countries exhibited a misalignment between students’ actual mathematics performance and their perceived level of item-specific math competence (see Fig. 2).

Fig. 2
figure 2

The mean plot of the mathematics performance score and confidence judgement across each quintile of mathematical competence in grade 4

Specifically, students with lower mathematics performance levels tended to overestimate their competence. The opposite was also observed; that is, students with higher mathematics scores vastly underestimated their item-specific competence. This trend indicates that individuals with extremely low or high levels of competence may have limited self-awareness of their actual abilities.

As displayed in the ‘Total’ column in Table 5, a highly significant negative correlation was observed across all countries, ranging from − 0.68 in Portugal to − 0.52 in Sweden. However, when scrutinised within each quintile, the relationship revealed a different pattern based on the students’ mathematics performance. Except for Sweden, all countries in the lowest quintile observed a significant negative correlation between students’ mathematics performance and their item-specific confidence judgements, with Portugal exhibiting the strongest correlation (− 0.40) and Norway the weakest (− 0.18). For the top quintile achievers, significant negative relationships were observed for Estonia (− 0.22), Norway (− 0.21), Portugal (− 0.30) and Serbia (− 0.28). In Sweden, the 4th graders appeared to hold more accurate confidence judgements of their item-specific math competence independent of their mathematics performance, and the only mismatch observed was the students in the fourth quintile, who significantly underestimated their math performance (− 0.23). Conversely, in Serbia and Portugal, all or almost all their 4th graders demonstrated a lack of appropriate (meta)reflection of their abilities, either underestimating or overestimating their item-specific mathematical competence. All countries in both grades exhibited the Dunning–Kruger effect concerning students’ mathematical performance and item-specific confidence judgements. Nonetheless, there were variations in the patterns and severity of this effect. Specifically, fourth graders appeared to be more prone to a lack of (meta)reflection on their competence than third graders. We have found the most considerable disparity in the within-quintile relationship between the two grades in Sweden, Serbia and Portugal.

Table 5 Pearson’s correlation coefficients between mathematics performance and confidence judgements for the total sample and within each quintile of math scores in grade 4

Who is more susceptible to the Dunning–Kruger effect?

Given that the DKE primarily impacted individuals who lacked expertise in mathematics (i.e., the lowest quintile of math score) but attempted to assess their mathematics competence, we investigated whether additional characteristics of these individuals are more susceptible to the DKE. Table 6 presents the mean differences in the proportions of boys, native students, average SES, PCM and MI across the lowest quintile of math scores in both grades.

Table 6 Mean differences in the sociodemographic variables and perceived competence in the lowest quintile of math performance in both grades

As shown in Table 4 and 5, students in the lowest quintile of mathematics performance generally misestimate their item-specific mathematics competence. This subgroup of students comprised a significantly higher number of girls with a migration background, fewer books at home, lowest PCM and low MI compared with their counterparts in the 5th quintile (see supplementary material, Tables A and B for detailed pairwise comparisons). However, particular patterns were also observed. For example, the family migration background did not differ across all quintiles for either third or fourth graders in Estonia and Serbia, countries with the least number of non-native participants. The same was observed for third graders in Portugal. Additionally, no differences between boys and girls were observed in Norway and Serbia in grade 3 and Finland and Sweden in grade 4.

Discussion

The present paper has shed light on the relationship between students’ mathematics performance and confidence judgements of their item-specific mathematics competence among primary school students in six European countries. We have also explored demographic factors contributing to the DKE and how the effect aligns with one’s MI and PCM.

Our results showed that overall country patterns remain similar between grades 3 and 4. For example, Norway scored the lowest consistently across math, confidence judgements and MI. Similarly, Portugal (low) and Finland (high) maintained opposite values on PCM and confidence judgements in both grades. These results were consistent with earlier studies on country-specific patterns (Morony et al., 2013; Stankov & Lee, 2014), but they also show the importance of cultural practises and traditions. For example, student self-evaluation has formed a central part of the evaluation in Finnish basic education in recent decades (Pitkänen, 2023), and an earlier study shows that the effect of mathematical competence on self-efficacy is one of the largest in Finland but not vice versa (Williams & Williams, 2010). This supports the idea that experience can improve consistency (Nagel & Lindsey, 2018) and that particular cultural practises have contributed to better calibration among students in Finland. In the case of Serbia, we could easily argue that older learners do not display greater consistency in their judgements, contrary to Guillory and Blankson (2017) and Nagel and Lindsey (2018), because overconfidence is visible in both grades. However, such overconfidence at the national level can also be viewed as a certain cushion, protecting low-achieving individuals (or groups) from experiencing too much stress (Morony et al., 2013).

The second main result of our study is that the DKE was found among pupils in grades 3 and 4 in all studied countries. Low achievers tended to overestimate their test item-specific confidence judgement compared with their high achiever counterparts (Dunning, 2011; Kruger & Dunning, 1999). Thus, the effect is more visible among students scoring below the third quintile (around the 60th percentile). Specifically, students in the lowest quintile of mathematics performance in Estonia, Norway, Serbia and Sweden significantly overestimated their item-specific mathematics competence. These results also confirmed that the effect cannot be merely argued to be a statistical artefact (McIntosh et al., 2019) and that an examination of particular student populations may provide more rigour in examining the DKE and explaining some of its underlying mechanisms.

The results regarding Sweden may be an interesting illustration of that claim. The third-grade students in Sweden significantly overestimated and underestimated their performance. The first and second-quintile students significantly overestimated their performance, and the third and fifth quintiles significantly underestimated their results. One possible explanation for these results might be that the students in grade 3 took the national mathematics tests before the survey’s administration. Hence, it might be that the feedback from the national tests has affected the students’ assessments of their own competence. Even though the national tests in grade 3 are not high stakes, the students perceived them as such (e.g., Bagger, 2016). Hence, it is possible that the higher-achieving students perceived the test feedback as a sign of failure, lowering their performance estimations (Hosein & Harle, 2018). However, this does not seem to explain the overestimation of performance for lower-achieving students. It is possible that this group is not as affected by the feedback from previous testing because of not being as engaged in metacognitive monitoring compared with their higher-achieving peers (Komarraju & Nadler, 2013). At the same time, Swedish students in grade 4 were much more accurate in their assessment of their own performance. A possible explanation for this grade difference might be found in the assessment environment. In grade 4, grading criteria are introduced, and a stronger focus on grading and summative assessments is embedded in teaching. The introduction of grading into the classroom might result in increased social comparison, affecting how students perceive their competence (Tashiro et al., 2021).

Finally, we observed the impact of demographic factors and perceived competence and MI in connection to the DKE, particularly students who lack competence in mathematics because they are more prone to the DKE. The subgroup is characterised by a significant number of girls from the non-native group, fewer books at home and the lowest PCM compared with their counterparts in the 5th quintile. These results confirm earlier findings favouring boys (Boekaerts & Rozendaal, 2010; Marks et al., 2018), even across countries (Morony et al., 2013). However, because there were no differences between boys and girls observed in Norway and Serbia in grade 3 and Finland and Sweden in grade 4, this provided support for the inconsistency of earlier findings (Hosein & Harle, 2018; Tashiro et al., 2021), again stressing the need for a more nuanced approach in observing DKE correlates and mechanisms (Gignac & Zajenkowski, 2020). Because our data do not allow for differentiation of student background and, for example, the population in Serbia is more compact in connection to the native/non-native divide, warrants further investigation on the contribution of demographic variables.

Finally, MI largely showed a linear increase in values between the 1st to 5th quintiles, similar to PCM. Although the linear increase is not perfect, showing 5th-quintile students to be more conservative makes room for concluding that perceived competence is linked to student estimations and their self-image in mathematics (Miller & Wang, 2019). Earlier, Boekaerts and Rozendaal (2010) have suggested that the importance of student calibrations is not only for confidence judgements but also for long-term motivation, of which perceived competence is a part. In addition, when students grasp what they can and cannot do correctly, it ultimately contributes to their self-image and more robust knowledge of which of their skills need additional improvement and where their strengths lie. All of these are essential in the context of successful self-regulation and learning at school (Andrade, 2019; Hosein & Harle, 2018).

Limitations and further research

Although the data used stemmed from a longitudinal project, the study itself was based on a single wave of data, limiting the ability to observe how the effect evolved over time and establish causal relationships. The nature of the current investigation was more exploratory and descriptive, focusing on detecting the DKE in younger samples and possible differences when observing several European countries simultaneously. Given that the effect can be influenced by various factors—cognitive biases, metacognitive processes and socio-emotional factors (e.g., anxiety, Morony et al., 2013; Stankov & Lee, 2017)—or variability in past performance (Geraci et al., 2023), further investigation into their interplay and specific contributions to the effect will be further explored with the second wave data now being collected. This will allow for a more nuanced exploration of the models examined now and possibly aid in developing a calibration measure between students’ confidence judgements and performance as an indicator of the DKE.

Conclusion

The DKE was observed in grades 3 and 4 across all countries in the present study, indicating a general susceptibility to this phenomenon. However, the miscalibration between students’ confidence judgement of their performance and their actual mathematics performance appears primarily associated with students with limited mathematics competence. This group of students—a large proportion of girls—can be characterised as children with disadvantaged sociodemographic backgrounds, low self-perception and low identity in mathematics. However, variations in the DKE were also noted across countries, suggesting that different mechanisms and factors may contribute to the miscalibration of competence and confidence judgements within different performance groups or countries. The current study contributes to Dunning–Kruger’s research by comparing younger individuals with those within Europe. The findings highlight the importance of educators supporting students, particularly those with weaker skills, in striking a balance between self-confidence and actual competence. This balanced approach can help students identify areas for improvement (Boekaerts & Rozendaal, 2010), ultimately fostering more accurate perceptions and increased confidence in their abilities over time, hence supporting learning (Andrade, 2019; Hosein & Harle, 2018).