1 Introduction and literature overview

Studies on gender differences in mathematical achievement seem to gain much more publicity in comparison to research explaining the gender gap in reading, or more broadly in language skills (e.g., Stoet and Geary 2015). This is surprising as the gender gap in reading observed in data coming from the Program for International Student Assessment (PISA) is three times larger than the gender gap in mathematics (Stoet and Geary 2015). The gap, favoring girls, was present in samples coming from 70% of countries participating in PISA study in 2009 and it has grown over the past decade (Stoet and Geary 2015). Analyses conducted at the level of participating countries examining the sources of gender gaps have led to the conclusion that gender differences in academic achievement were not related to political, economic, or social equality (Stoet and Geary 2015), but rather to student’s attitudes toward different academic domains (Stoet and Geary 2018).

Stereotype threat, defined as the activation of a negative stereotype about one’s group in a testing situation (Steele and Aronson 1995), may be one of the factors contributing to the gender gap in school achievement (Steele and Aronson 1995; Spencer et al. 1999; Pansu et al. 2016). For example, the activation of a stereotype about girls being weak at mathematics is likely to lower math test performance in female group. This stereotype activation may not affect boys since it does not concern their gender group. Analogously, the activation of a stereotype about boys being weak at language arts, may lower their verbal test performance, but not girls’ perfomance. Stereotype threat not only affects test performance but can also evoke more negative attitudes toward stereotyped domains (Good et al. 2012). This phenomenon has been increasingly investigated in a wide range of domains such as mathematics (e.g., Aronson et al. 1999), intelligence (e.g., McKay et al. 2002), spatial orientation (e.g., Tarampi et al. 2016) and memory (e.g., Chasteen et al. 2005). The effect has been shown to occur especially when the tasks were difficult (Spencer et al. 1999) or required new solving strategies (Carr and Steele 2009). Stereotype threat research also suggests that the impact of negative stereotypes is not limited to minority groups, such as African American students in verbal skills (Steele and Aronson 1995) or women in mathematics (Spencer et al. 1999), but can be also observed in majority groups such as White men in mathematics (Aronson et al. 1999).

Explanations of test performance deficits under stereotype threat conditions refer to working memory impairments that emerge as a consequence of negative beliefs and emotions activated by stereotypes about one’s group (Schmader et al. 2008). For example, empirical studies documented that negative stereotype activation reduced working memory capacity measured by an operational span task (Schmader and Johns 2003). Similar results were obtained using measures of executive functions such as antisaccade task (Jamieson and Harkins 2007), Stroop-color naming task (Richeson and Shelton 2003; Hutchison et al. 2013), or GO/NOGO task (Mrazek et al. 2011). Additional analyses showed that working memory capacity was a significant mediator of performance in mathematical tests in female samples (Schmader and Johns 2003), confirming its central role in the mechanism of stereotype threat.

Although researchers have amassed a considerable body of evidence to support the importance of studying stereotype threat, they mostly investigated minority samples (women or girls, African American or Latino students). Thus, much less is known about the experiences, correlates, and mechanisms of stereotype threat in boys’ or men’s samples, despite the worldwide evidence that boys’ achievement in reading is much lower than that of girls. Additionally, very few studies examined the effect of gender stereotypes in verbal tests (Keller 2007; Seibt and Förster 2004). To our knowledge, there is only one study examining the effects of stereotype threat in reading performance (Pansu et al. 2016). In this study, in the stereotype threat condition, the test was presented as diagnostic of language abilities, and this subtle manipulation was sufficient to evoke gender stereotypes about boys being weak at language. The present study extends this line of research by investigating the experience of stereotype threat separately in three age cohorts of boys aged from 14 to 16 years.

Since the beginning of stereotype threat studies, it has been speculated that stereotype threat experience is not an isolated incident but rather it may repeat over time (Steele and Aronson 1995). Be that as it may, little research has tested the consequences of repeated experiences of stereotype threat on school achievement and attitudes toward different subjects (e.g., Woodcock et al. 2012).

1.1 Chronic stereotype threat

A large body of research on stereotype threat has traditionally explored the impact of experimentally activated content of group stereotypes on test performance and academic achievement. As a side-effect of experimental design, stereotype threat was perceived rather as a temporary phenomenon limited to the experimental situation with its effects lasting as long as the stereotype is cognitively activated. The hidden assumption was that the effects of stereotype threat were reversible. As such, when negative stereotype activation decreases, individual’s cognitive resources should return to their initial level (see review: Schmader et al. 2008). We question this assumption and postulate that in educational settings a stereotype threat experience does not appear on a single occasion. Due to repeated instances of stereotype threat, its negative consequences may accumulate over time. Consequently, it can be posited that these repeated experiences of stereotype threat may have quantitatively different effects on cognition and motivation compared to the ones triggered by a single stereotype threat incident.

The effects of repeated experiences of stereotype threat only recently have been gaining in importance (Woodcock et al. 2012; Kalokerinos et al. 2014). So far, there is only a scarcity of research examining the correlates of chronic stereotype threat in educational settings. In one of these few studies, Woodcock et al. (2012) evidenced motivational deficits in Latino students who experienced stereotype threat during their scientific career. More specifically, they observed a significant decline in intention to persist in science and in the level of scientific identification in this group of minority students. Domain disidentification also emerged more strongly in the following years of the study, leading to final domain abandonment. This lack of identification with the scientific domain was interpreted as the effect of instances of coping with stereotype threat.

Although domain disidentification is a well-documented consequence of an acute stereotype threat (Steele and Aronson 1995; Major et al. 1998; see review: Schmader et al. 2008) and has been demonstrated to be a significant correlate of chronic stereotype threat (Woodcock et al. 2012), a potential mechanism of domain disidentification due to stereotype threat remains unclear. To fill this gap, we propose intellectual helplessness as a mediator linking chronic stereotype threat and lower domain identification.

1.2 Intellectual helplessness

Intellectual helplessness is a psychological state characterized by two types of deficits, cognitive and motivational. Cognitive deficits can be described as an impairment of more complex processing (e.g., von Hecker and Sedek 1999). Motivational deficits, on the other hand, lead to impairments in intrinsic and instrumental motivation and/or lack of interest in the domain (Rydzewska et al. 2017; Sedek and McIntosh 1998). As a result, intellectual helplessness may lead to lower school achievements and a lower interest in a particular school subject.

The mechanism explaining the emergence of intellectual helplessness is based on the cognitive exhaustion model of learned helplessness (e.g., Sedek and Kofta 1990; Kofta and Sedek 1998; von Hecker and Sedek 1999). This model assumes that when in problem-solving situations, individuals in general tend to spontaneously engage in systematic cognitive processing. In a typical, controllable situation this cognitive mobilization enables the usage of complex cognitive strategies such as finding important pieces of information, detecting task inconsistencies or integrating information into mental models. All these activities lead to an effective solution of a task at hand. However, in uncontrollable circumstances, when the task at hand is not solvable or a teacher cannot explain the subject matter, such a constructive approach cannot lead to a real progress in cognitive processing. Consequently, after prolonged cognitive mobilization without any substantial cognitive gain, a cognitive exhaustion phase appears. In school settings, the state of cognitive exhaustion is known as intellectual helplessness. Intellectual helplessness is a domain-specific state so the level of intellectual helplessness in one domain (e.g., mathematics) tends to be weakly related to the level of intellectual helplessness in another domain (e.g., history) (Sedek et al. 1993).

2 Aims of the present study

Following Woodcock and colleagues’ reasoning (2012), we posit that stereotype threat is a dynamic process and its effects accumulate over time. Corroborating a previous study on chronic stereotype threat in girls in mathematics (Bedyńska et al. 2018), we use a representative sample of secondary school boys to examine a path model with chronic stereotype threat as an antecedent of achievement and identification with language arts. Similarly to the model proposed by Bedyńska et al. (2018), we test two parallel indirect effects involving working memory and intellectual helplessness.

The present study advances literature on chronic stereotype threat in two important ways. Firstly, the central aim of this study is to deliver a tentative evidence of the dynamics of chronic stereotype threat by analyzing the model described above in three age cohorts of 14-, 15-, and 16-year olds. We hypothesize that as experiences of stereotype threat repeat in educational settings, the cognitive and motivational effects of chronic stereotype threat are likely to be more pronounced. This should be especially true for motivational correlates of chronic stereotype threat such as intellectual helplessness and domain identification. Therefore, borrowing the logic from intellectual helplessness studies, we assume that the indirect effect linking chronic stereotype threat to domain identification through intellectual helplessness would be stronger in older cohorts of boys. In contrast, predictions based on the intellectual helplessness studies lead to the assumption that the cognitive mechanism based on working memory would be quite stable across age cohorts. Thus, the indirect effect with working memory as a mediator should be of a relatively similar magnitude in all age cohorts of boys.

Secondly, our aim is also to test the explanation formulated by Stoet and Geary (2018) for the weak link between gender gap in academic achievement and political, economic or social equality in the PISA data. In their results, showing gender-equality paradox, researchers found that interest in the domain is a factor that explains the persisting gender gap in more gender-equal countries. Thus, we decided to examine the link between domain identification and achievement in language arts. We predict that this association, being a dynamic motivational process, may be also stronger in older cohorts.

3 Materials and methods

3.1 Participants and procedure

Data from 619 boys from three levels of classes in gender mixed secondary schools were analyzed in the study. There were 231 first age cohort pupils (Mage = 13.55, SD = 0.42), 204 s age cohort pupils (Mage = 14.5, SD = 0.37), and 184 third age cohort pupils (Mage = 15.54, SD = 0.38). The sampling procedure involved two steps. In the first step, 24 secondary schools were randomly sampled with a stratification based on a region (two regions of the country) and school location (village, small city, medium city). In the second step, classes were randomly selected from each school. All pupils belonging to the class were invited as participants. Around 5% of the selected pupils did not take part in the study due to their absence at school, therefore 633 pupils took part in the study. The study was presented as aimed at testing new online educational games. None of the children resigned from the participation during the study.

The research protocol was approved by the Ethical Committee of the Educational Research Institute (Warsaw, Poland). The present study was conducted in compliance with ethical standards adopted by the American Psychological Association (APA 2010). Accordingly, prior to participation pupils were informed about the general aim of the research and the anonymity of their data. The participation was voluntary, and the pupils did not receive compensation for their participation in the study. Additionally, parents signed a written consent for their children to participate in the study.

Data were collected during regular school hours in a single session that lasted 45 min. After explaining the aim of the study, pupils took a computerized working memory test and then online questionnaires which among others consisted of scales measuring chronic stereotype threat, intellectual helplessness, domain identity, language achievement and pupils’ attitudes towards school. All survey questions were administered in the native language.

3.2 Measures

3.2.1 Chronic stereotype threat

Chronic stereotype threat scale was constructed of the scale proposed by Bedyńska and colleagues (2018). We rephrased the original seven items changing the focus of the scale from mathematics to language. Examplary items of the scale are: ‘Other pupils in my class feel that I have a lower language ability because of my gender’, ‘I worry that if I fail my teacher will attribute my poor performance to my gender’, ‘I worry that if I fail during a language test, it will prove that all boys are poor at language’. Participants provided their answers on a 6-point scale, ranging from 1 (strongly disagree) to 6 (strongly agree). To evaulate the level of chronic stereotyp threat, student responses were averaged to form a chronic stereotype threat index. Similarly to Bedyńska et al. (2018), the reliability of the scale was high (α = 0.88). The construct validity tested with confirmatory factor analysis (CFA) indicated that the scale was unidimensional: Χ2 (12) = 60.848, p < 0.001, CFI = 0.976, TLI = 0.958, SRMR = 0.03, RMSEA = 0.081, 90% CI (0.061, 0.101), p = 0.006.

3.2.2 Working memory

Working memory was measured with a computerized Functional Aspects of Working Memory Test (Sedek et al. 2016). The test consisted of three tasks that measured mutually related functions of working memory: simultaneous storage and processing, supervision, and coordination functions. Tasks involved visually attractive natural objects (ladybugs, balls, cartoon-styled faces). The test was used previously in educational research and proved its validity in predicting school achievement (Sedek et al. 2016). The proportion of correct answers was the measure of working memory. Due to the heterogeneous nature of this test, the working memory measure was entered to the model as a latent variable, allowing the evaluation of the input of each of the tasks in the relation between predictor and outcome variables.

3.2.3 Intellectual helplessness

Intellectual helplessness was evaluated using nine items selected from the Intellectual Helplessness Scale (IHS, Sedek and McIntosh 1998). Boys were asked to assess on a 6-point Likert type scale (ranging from 1—never to 6—always) to what an extent they experienced intellectual helplessness symptoms during native language class. The statements described feelings and thoughts during native language classes: ‘I feel tired’; ‘I feel helpless’. Similarly to the full version (Sedek and McIntosh 1998), the short version of the scale was highly reliable (Cronbach’s α = 0.90).

3.2.4 Language achievement

We used Grade Point Average (GPA) values in language from two semesters before the study to measure achievement in language. GPA values were averaged. The higher value reflected higher achievement, with 1 meaning not passing, and 6 excellent.

3.2.5 Domain identification

We used a single-item measure of domain identification (‘It is important for me to be good at language’) adopted from the work of Aronson et al. (1999). Boys used a 6-point Likert type scale ranging from 1 (strongly disagree) to 6 (strongly agree) to rate the importance of the native language domain.

3.3 Data preparation and analytical approach

The multi-group structural equation modeling analyses were conducted using Mplus 7.3 (Muthén and Muthén 2015) with a grouping variable defining age cohort. We used Maximum Likelihood Robust (MLR) approach to deal with continuous but non-normally distributed variables. Since all data were obtained in classes with boys being clustered by class membership, this nested structure was defined by using the function ANALYSIS = COMPLEX in Mplus with class membership as the cluster variable (Muthén and Satorra 1995). To prepare data for analyses, all classes smaller than three pupils were excluded (four classes, seven participants) and seven pupils were excluded due to missing values on at least one of the measured variables. Then, a preliminary omnibus model with three age groups modelled simultaneously was constructed with one predictor (chronic stereotype threat), two parallel mediators (intellectual helplessness and working memory) and two dependent variables: language achievement and identification with language. All indirect effects were evaluated using the INDIRECT function in Mplus. The aim of the latter was to examine intellectual helplessness and working memory as potential mediators of the association between chronic stereotype threat and language identification and achievement. We proposed two mediational paths of chronic stereotype threat to domain identification with language through (1) intellectual helplessness (2) working memory and three mediational paths from stereotype threat to language achievement through (1) intellectual helplessness, (2) working memory, (3) intellectual helplessness and domain identification. All indirect effects were examined using 95% confidence intervals (CI) method with the indirect effect being significant if the CI does not include zero (MacKinnon et al. 2002).

Evaluation of the structural model was based on robust Χ2 statistic and the Root Mean Square Error Approximation (RMSEA), the Standardized Root Mean Square Residual (SRMR), the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI) as recommended by Kline (2011). We used the most widely recommended cut-off values indicative of adequate model fit to the data, respectively: RMSEA and SRMR < 0.06 and < 0.08, CFI and TLI > 0.95 and > 0.90 (Lance et al. 2006).

4 Results

4.1 Descriptive statistics

The relation between variables included in the model and the associated descriptive statistics for three levels of classes are shown in Table 1. Generally, in all three cohorts, chronic stereotype threat was positively correlated with intellectual helplessness, and negatively correlated with working memory.

Table 1 Means, standard deviations and correlation matrices for the target variables

4.2 Path model with working memory and intellectual helplessness as mediators

4.2.1 Evaluation of the model

Results indicated that the model with intellectual helplessness and working memory as mediators (see Fig. 1) achieved a good fit to the data. The general test of fit for the model was non-significant Χ2 = 36.195, df = 35, p = 0.41. This general fit was also supported by the values of specific fit indices: CFI = 0.997, TLI = 0.995, RMSEA = 0.013, 90% CI (0.001, 0.052), p > 0.05 (p = 0.935). Only SRMR exceeded the acceptable value of good fit and was 0.133. The model explained around 2% of variability in domain identification with language (R2 = 0.017) in the first age cohort, 5.5% (R2 = 0.055) in the second age cohort, and 2% (R2 = 0.020) in the third age cohort. For language achievement, 19% of variability was explained in the first age cohort (R2 = 0.194), 26% in the second age cohort (R2 = 0.260), and 17% in the third age cohort (R2 = 0.169). Chi square contributions from each group were the following: 8.699 for the first, 17.560 for the second, and 9.936 for the third cohort. The path coefficients for all three age groups are displayed in Table 2.

Fig. 1
figure 1

The full mediational models for three age cohorts predicting language achievement and identification by chronic stereotype threat with two mediators: working memory, and intellectual helplessness. (Non-significant paths are indicated by dashed lines, while significant paths are indicated by solid lines; all coefficients are standardized solutions; p < 0.05)

Table 2 Path coefficients in the multiple mediators model with working memory and intellectual helplessness as mediators of domain identification in language and language arts achievement for three age cohorts

4.2.2 Model for the first cohort (13.5 years)

In the first age group, stereotype threat was negatively related to working memory and to language achievement. Working memory was positively associated with language achievement. The higher level of stereotype threat was also associated with a higher level of intellectual helplessness. None of the variables was a significant predictor of domain identification with language. To analyze the mediational role of working memory and intellectual helplessness, statistics for indirect effects were calculated. The analysis of indirect effects showed that only one indirect effect was significant in this group—indirect effect from stereotype threat via working memory to language achievement β = − 0.049, p = 0.007 with CI (− 0.085, − 0.013).

4.2.3 Model for the second cohort (14.5 years)

A slightly different pattern of the relationships between the variables was obtained for pupils in the second age cohort. Again, stereotype threat was negatively related to working memory and positively to intellectual helplessness. Both postulated mediators, intellectual helplessness and working memory, were related to language achievement—working memory was correlated positively while intellectual helplessness negatively. Learned helplessness was a stronger predictor of domain identification in language than of language achievement. Finally, two indirect effects were significant: the indirect effect via intellectual helplessness to language identification [(β = − 0.051, p = 0.013), CI (− 0.091, − 0.011)] and the indirect effect via working memory to language achievement [(β = − 0.152, p = 0.003), CI (− 0.253, − 0.051)]. The indirect effect of stereotype threat through intellectual helplessness to language achievement was significant at the level of statistical tendency (β = − 0.034, p = 0.062).

4.2.4 Model for the third cohort (15.5 years)

Results for the third age cohort showed that stereotype threat was positively associated with intellectual helplessness but not with working memory. The latter variable was significantly related to language achievement. Domain identification was predicted negatively by intellectual helplessness. Although weak, there was also a relation between language identification and language achievement. The only significant indirect effect explained language identification via intellectual helplessness [β = − 0.070, p = 0.040, CI (− 0.137, − 0.003)]. Although domain identification was significantly related to language achievement, this link was too weak to present as a significant indirect effect with two mediators: intellectual helplessness and domain identification linking stereotype threat and language achievement.

To sum up, the path model examined in three age cohorts suggested that working memory was a significant mediator of language achievement in younger groups but not in the oldest. In contrast, the link between stereotype threat and domain identification in language was stronger in older cohorts but not in the youngest. It is also important to notice that the association between stereotype threat and intellectual helplessness was the strongest in the oldest age group. In this group we can also start to observe a weak but significant relation of domain identification and achievement in language.

5 Discussion

Bearing in mind that the experiences of stereotype threat appear repeatedly in educational settings, it is highly important to understand psychological processes involved in the theoretical underpinnings of the adaptation to chronic stereotype threat. We believe that the present research substantially contributes to the debate on the dynamics of chronic stereotype threat and its mechanisms.

First, we provided an answer to the theoretical and empirical question about a mechanism that links chronic stereotype threat and domain identification with the achievement of boys in language arts. Corroborating previous findings which showed working memory to be associated with chronic stereotype threat (Bedyńska et al. 2018), this study proposes a preliminary empirical test of a new mechanism of domain identification based on intellectual helplessness. We found that chronic stereotype threat is positively linked to intellectual helplessness which in turn is negatively related to domain identification with the native language. Thus, our results support the prediction that intellectual helplessness may transmit an influence of chronic stereotype threat not only into mathematical underperformance of the minority group (Bedyńska et al. 2018) but also into disidentification with language observed in the majority group. This result enhances the understanding of the processes underlying the lack of domain identification and domain abandonment observed in longitudinal studies on chronic stereotype threat in Latino and African American students (Woodcock et al. 2012).

Second, we offer probably the first empirical comparison of two mediational paths presented in the literature, that is cognitive (working memory) and motivational (intellectual helplessness) deficits, in inducing lower domain identification. The results show that although working memory capacity is related to chronic stereotype threat and language achievement, it is not a significant mediator of domain identification. This can be interpreted as a preliminary evidence of two parallel and probably independent mechanisms leading to underachievement and the lack of identification with a domain. The question remains, however, whether the underachievement can be observed only together with the domain disidentification or is it possible for students to show disidentification without underachievement. We hope that this initial evidence about these two processes being separate will provide an inspiration for future research.

Third, by testing the magnitude of the two mediational paths across three age cohorts, our work contributes to the understanding of the dynamics of the reactions to chronic stereotype threat. As mentioned before, the indirect effect involving working memory linking stereotype threat and language achievement was significant only in the youngest cohort. This result suggests that the cognitive mechanism is not sufficient in explaining identification with language. A different pattern emerged for intellectual helplessness which can be perceived as the motivational mechanism. There was no significant indirect relationship between stereotype threat and language identification via intellectual helplessness in the youngest cohort. However, this path turned out to be significant in the older cohorts. Such a pattern of results supports the hypothesis about the cumulative nature of the motivational mechanism of chronic stereotype threat based on intellectual helplessness and explains domain disengagement (Woodcock et al. 2012).

5.1 Limitations and additional directions for future research

The present findings should be interpreted with respect to some limitations. First, language identification was estimated with a single-item self-report measure. Although a single-item measure is widely used in experimental research on stereotype threat (e.g. Aronson et al. 1999; Stone 2002), different aspects of this self-schema may be evaluated, such as the importance and likeability of the subject, having high abilities and achievement in the domain or enrolment in particular classes (Smith and White 2001). Future research should use more complex measures with different dimensions of domain identification to provide a broader understanding of a student’s educational choices.

Second, the study used a correlational and cross-sectional design, thus the ability to determine the directionality of the effects as well as their temporal order is limited. Future research should examine the relation between chronic stereotype threat and domain identification using longitudinal designs to ascertain the change in magnitude of the proposed mediational paths. Third, the present study would have benefited from additional information regarding participants’ characteristics identified in previous research as important moderators of stereotype threat, such as gender identification (Schmader 2002), stereotype endorsement (Schmader et al. 2004) or test anxiety (Tempel and Neumann 2014). Future investigation should consider examining or at least controlling some of these variables.

5.2 Conclusions

Despite the aforementioned limitations, the present findings have important theoretical and practical implications. From the theoretical standpoint, they offer a meaningful insight into a novel mechanism of chronic stereotype threat through intellectual helplessness—a reliable explanation of lower identification with a domain. The inclusion of this mediational path may be also a valuable source of new hypotheses regarding the dynamics of accumulative changes in motivational and cognitive aspects of repeated experiences of stereotype threat.

The proposed mechanism is important as well for teachers and policy makers as it can shape boys’ interest in and a further motivation to pursuit education in humanities and social sciences. Additionally, as language skills are highly correlated with mathematics and science, they show their relevance for success in STEM domains (Stoet and Geary 2015).