1 Introduction

Pink doll dresses and blue toy blocks—children are surrounded by genderFootnote 1 specific, often gender stereotypical material every day of their life (MacPhee & Prendergast, 2019; Murnen et al., 2016). Whilst a lot of research has been done on children’s preference for gender-stereotypical toys (e.g. Spinner et al., 2018), very little is known so far about the effects of gender-stereotypical task materials on academic performance. However, there has been a growing trend in the last decades—especially in the United States—to argue for gender-specific education (for an overview see Bigler & Signorella, 2011). Corresponding policies reach from single training programs for prospective teachers in gender-specific instruction via individual gender-segregated classes to ongoing gender-segregated schooling in all subjects (Pahlke & Hyde, 2016). Some proponents of these ideas argue that sexism in coeducation hinders girls and boys from performing up to their full potential (e.g. Salomone, 2004). Other advocates state substantial biological gender differences between boys and girls, resulting in gender-specific learning styles (e.g. Gurian et al., 2001; Sax, 2006).

Similar approaches to gender-specific education have been discussed in Germany (Faulstich-Wieland, 2011), predominantly with a focus on school subjects, showing gender-related differences resistant to change in large-scale assessment. As a result, gender-segregated classes for girls were most often the subject of research with regard to mathematics (e.g. Rudolph-Albert & Keller, 2007) and physics (e.g. Hannover & Kessels, 2002; Häussler & Hoffmann, 2002). Meanwhile, separate language classes are also suggested for boys (Budde et al., 2016), as they are considered as “left behind” in the educational system (Hannover & Kessels, 2011, p.89), consistent with the “boys’ crisis” in the United States (e.g. Kleinfeld, 2009, p.113). Supporters of these positions in Germany argue that, on the one hand, instruction in the natural sciences is designed to appeal only to boys (Budde et al., 2016), while on the other hand the “feminisation” of the whole school context (Heyder & Kessels, 2013, p.605) is discussed, discriminating against boys systematically (Guggenbühl, 2008). However, in contrast to the United States, in Germany most gender-specific programs are offered outside of the school system (Budde et al., 2016). In this spirit, gender-specific task materials for home learning environments were also developed, aimed at a growing market of popular science literature on educational questions. Such educational books have stereotypical titles such as ‘100 mathematical tasks that really engage girls’ interest’ (Speicher, 2009a) or, respectively, offer essay exercises and dictations specifically designed for boys. Whereas girls are asked to calculate how many tickets are left for a ballet show, boys have to find out the number of remaining tickets for a soccer match (Speicher, 2009b). Empirical research concerning the success of gender-specific educational policies revealed mixed results in Germany (Faulstich-Wieland, 2011) and only small effect sizes in an international meta-analysis (Pahlke et al., 2014), whereas gender-specific materials are seldom systematically explored. Furthermore, some critics even suspect gender-segregated education to foster stereotypical thinking in children and adolescents instead of promoting academic interest or a performance change between girls and boys (Datnow et al., 2001; Fabes et al., 2013, 2015; Hilliard & Liben, 2010). Similarly, gender-specific materials and policies like Girls’ Day (www.girls-day.de), which has been held in Germany over twenty years, have also recently been suspected to foster gender stereotypes by reproducing them implicitly instead of reducing them (Wienkamp, 2018).

There is also a theoretical assumption pointing out the detrimental effects of stereotypes on performance. Stereotype threat theory (Steele & Aronson, 1995) argues that people suffer in their performance when negative stereotypes about their own group are salient. Steele and Aronson’s original experiments showed that frequently observed SAT performance differences between African American and Caucasian American students decreased when demographics were not assessed until after the test. They suggested that reducing the relevance of ethnicity eliminates the salience of the negative stereotype about intellectual inferiority of African Americans, which otherwise decreased their performance. Based on this study, stereotype threat effects have been demonstrated in several domains and discriminated groups (for a review see Smith & Hung, 2008; Spencer et al., 2016). Besides ethnicity, gender has been the most frequently explored group category in stereotype threat research worldwide until today.

1.1 Effects of gender stereotypes on performance

Women and girls have been shown to be prone to stereotype threat in mathematical and spatial tasks (for a meta-analysis, Doyle & Voyer, 2016), natural sciences such as physics (Marchand & Taasoobshirazi, 2013) as well as information technology (Cooper, 2006). Taken together, stereotype threat appears to be a robust phenomenon with small effect sizes in girls d = 0.22 (Flore & Wicherts, 2015) up to medium effect sizes in women d = 0.48 (Walton & Spencer, 2009). Although mediating mechanisms of stereotype threat effects are still being discussed (Pennington et al., 2016), a working memory overload caused by a complex interaction of physiological, cognitive and affective processes (Schmader et al., 2008) is most commonly suggested and assured (Bedyńska et al., 2019).

While the first experiments regarding this phenomenon predominantly focused on adult women, usually students in laboratory contexts (for a meta-analysis see Nguyen & Ryan, 2008), stereotype threat was subsequently also shown in field research with children and adolescents (e.g. Hermann & Vollmeyer, 2017; Keller, 2007), even at primary school level (e.g. Hermann & Vollmeyer, 2016; Neuville & Croizet, 2007), down to the age of four (Shenouda & Danovitch, 2014). Furthermore, most studies on younger school girls used implicit methods to detect stereotype threat effects (for a review see Régner et al., 2014), proving gender stereotypes to become easily activated in learning environments. For example, it was shown that simple situational cues like coloring stereotypical pictures (Neuville & Croizet, 2007), thinking about one’s own gender (Ambady et al., 2001) or being confronted with the minority status of female mathematicians (Muzzatti & Agnoli, 2007) limited young girls’ performance. Meanwhile, boys seemed to benefit from salient gender stereotypes in some stereotype threat experiments (Ambady et al., 2001; Neuburger et al., 2012) that highlighted supposed male superiority in mathematics and spatial abilities—a phenomenon already labeled stereotype lift (Walton & Cohen, 2003). Regarding this effect, meta-analysis revealed small (d = 0.24, Walton & Cohen, 2003) but statistically significant performance increases under stereotype threat for indirectly upgraded persons. In contrast to negatively stereotyped girls, boys are supposed to be more self-confident in corresponding test situations due to downward comparison processes, facilitating their performance and potentially increasing the gender gap even more. Taken together, studies focusing straightly on stereotype lift in young boys seem to be rare, highlighting the importance of further research in this direction.

At least, studies with children have most often induced stereotype threat via implicit cues, for example by activating the gender category somehow, without stating anything explicitly about girls’ or boys’ performance. Whereas some studies instructed children to work with gender-specific task materials (coloring stereotypical pictures; Ambady et al., 2001; Hermann & Vollmeyer, 2016; Neuvielle & Croizet, 2007) others gave them questions about their gender (Ambady et al., 2001) or presented them a story about a stereotypically feminine girl (Tomasetto et al., 2011), before their performance was assessed. Furthermore, studies have simply varied the task description (reading task vs. a game) to manipulate a threat in boys, who are stereotyped to have lower reading abilities, what should be activated implicitly when their reading skills are at stake (Pansu et al., 2016). Others confronted primary school children predominantly male mathematicians (9 out of 10) to point on the female inferiority in the domain (Muzzatti & Agnoli, 2007).

1.2 Effects of gender stereotypes on motivation

Contrasting the negative effects of stereotypes on women’s and girls’ mathematical performance, their influence on underlying motivational processes seems to be much more complex. Whereas some authors suggest that motivation increases due to stereotype threat (Jamieson & Harkins, 2007), others suspect a motivational decrease (Shapiro & Williams, 2012; Thoman et al., 2013). On the one hand, there exists evidence that females under stereotype threat make a greater effort (Jamieson & Harkins, 2009; Seitchik & Harkins, 2015) as they are motivated to combat the negative expectations about their group and worried about mistakes (Brodish & Devine, 2009; Chalabaev et al., 2012; Smith, 2006). On the other hand, it was shown that negatively stereotyped females report lower self-efficacy (Cadaret et al., 2017; Deemer et al., 2014; Spencer et al., 1999), performance expectations (Cadinu et al., 2003; Smith, 2006), interest (Smith et al., 2007) and motivation to improve their skills (Fogliati & Bussey, 2013). Considering these multiple aspects, motivational patterns under threat seem to be multifaceted and are still being discussed (for a review see Pennington et al., 2016).

In fact, some confusion about motivation under stereotype threat may be due to several methodological reasons. First, studies differed in their temporal perspective, as they focused immediate (Brodish & Devine, 2009; Cadinu et al., 2003; Chalabaev et al., 2012; Fogliati & Bussey, 2013; Jamieson & Harkins, 2007; Seitchik & Harkins, 2015; Smith et al., 2007) or long-time effects (Cardaret et al., 2017; Deemer et al., 2014; Thoman et al., 2013) of negative stereotyping. Second, some studies involved field research (Cardaret et al., 2017; Deemer et al., 2014), whereas others were run in the lab (Brodish & Devine, 2009; Chalabaev et al., 2012; Jamieson & Harkins, 2007; Seitchik & Harkins, 2015). Therefore, the salience of the negative stereotypes varied across studies, although it is questionable whether implicit threat cues (e.g. activating gender by assessing demographic data) and explicit stereotype threat activation (e.g. explicit statements about female mathematical inferiority) induce comparable reactions (e.g. Nguyen & Ryan, 2008). Finally, all of these studies on motivational change under stereotype threat explored adult women, making it important to find out how younger girls’ motivation is shaped by gender stereotypes. As motivation under stereotype threat seems to be quite complex it might not be captured by a single motivational aspect. Therefore, we decided to run exploratory analyses based on the well-established self-determination theory (Deci & Ryan, 1985) which integrates several motivational aspects to explain the development of intrinsic motivation. Furthermore, intrinsic motivation was already defined as one central aspect of the stereotype threat process in the Motivational Experiences Model of Stereotype Threat (Thoman et al., 2013).

As stated by Deci and Ryan (1985, 1993), intrinsic motivation defined by the experience of interest and enjoyment is based on the fulfilment of three fundamental basic needs. First, people want to feel competent in their actions. Second, they want to have the feeling of acting autonomously instead of being under external pressure. Third, they want to feel connected with others and to belong in their social context. In a stereotype threat situation, all of these needs run the risk of being frustrated. Due to stereotype activation girls feel less competent in mathematics, they come under pressure to refute the negative expectations about their group and finally their belonging to their social context is called into question. Until now, stereotype threat research has predominantly focused on belonging uncertainty (Walton & Cohen, 2003), referred to the basic need of social belonging, which is disappointed when women enter STEM careers or contexts (for an overview see Thoman et al., 2013). However, most of these studies focused on adult women rather than younger girls. Immediate negative effects of stereotypes on the basic need of competence and autonomy, or rather perceived pressure, have not yet been tested explicitly in a stereotype threat situation specifically on younger girls. Although, Deci and Ryan (2000, 2002) postulated more proximal effects from these basic needs for the experience of situational intrinsic motivation, social belonging was seen as a more distal factor. In line with these assumptions in self-determination theory we wanted to explore how stereotypical mathematical tasks influence perceived interest and enjoyment as an immediate indicator for situational intrinsic motivation. Furthermore, situational effects on both basic needs will be tested by assessing children’s perceived competence and perceived pressure and tension while working with stereotypical tasks.

1.3 Research question and hypotheses

To sum up, in the present study we want to explore how gender-stereotypical tasks implicitly influence primary school children’s performance and motivation in the classroom. For this purpose, we focus on potential negative effects of gender-stereotypical task materials in mathematics, originally intended to foster girls’ motivation (e.g. where girls have to calculate ballet tickets). This is because large-scale assessment studies like PISA still show girls underperforming compared to boys in mathematics (OECD, 2016) although a lot of energy is invested in programs to foster girls’ motivation in STEM. At the same time, detrimental effects of implicit activated stereotypes for girls’ mathematical performance have been well documented (for a review see Régner et al., 2014) even though their effects on motivation are less well documented for young ages. Based on stereotype threat theory (Steele & Aronson, 1995) and in line with former stereotype threat studies with young children (Ambady et al., 2001; Hermann & Vollmeyer, 2016; Neuville & Croizet, 2007) the following hypotheses were investigated:

H1

Compared to gender-neutral mathematical tasks, we expect gender-stereotypical mathematical tasks to…

  1. (a)

    Decrease girls’ performance (stereotype threat effect)

  2. (b)

    Increase boys’ performance (stereotype lift effect).

Furthermore, we want to explore how gender-stereotypical tasks influence girls’ and boys’ motivational aspects, defined as interest and enjoyment, perceived competence and perceived pressure and tension based on self-determination theory (Deci & Ryan, 1985).

2 Method

2.1 Sample and design

Altogether N = 151 primary school children (47.7% female; mean age: M = 9.81, SD = 0.60) participated in the study. They were attending fourth grade in three different schools in Germany. The collection of the data took place during a regular school session in a class context, with all children who were permitted to attend the test by their parents. The children were randomly assigned to one of two conditions as they either calculated gender-stereotypical or gender-neutral mathematical tasks. Hence, the study followed a 2 (sex: female vs. male) × 2 (task: gender stereotypical vs. gender-neutral) design. Mathematical performance and motivational aspects were assessed as dependent variables.

2.2 Material and procedure

Based on the two gender-specific mathematics textbooks (Speicher, 2009a, b) we composed three different tests with mathematical word problems in which we manipulated gender stereotype salience implicitly. Whereas words, numbers and calculating operations were adjusted and held constant in all versions, design, pictures and the thematic embedment of the tasks was varied according to gender-stereotypical character (Themes of all tasks can be seen in Appendix A). For example, in the gender-stereotypical test version boys had to find out how many coins the pirates had captured, while girls calculated how many pearls were needed to make jewelry (see Fig. 1). Meanwhile, in the gender-neutral version students had to calculate the number of sweets collected at a carnival parade, which is traditionally a very popular event for children in Germany, like Halloween in the United States.

Fig. 1
figure 1

Implicit experimental manipulation. Note: Carnival celebrations include dressing up in costumes, dancing events and parades. Every town celebrating carnival boasts at least one parade with floats making fun of the themes of the day. Usually, sweets are thrown into the crowds lining the streets

Overall, we selected six tasks which required different mathematical skills typically taught in the third and fourth grade in Germany (addition/subtraction up to 1000, multiplication tables, multiplication and division). Before testing the children, the tasks were checked by an expert teacher with respect to their difficulty. As the six tasks had several partial solutions every single result was rated 0 (incorrect) or 1 (correct), resulting in a score representing the percentage of correct outcomes. The reliability of the test was very good with Cronbach’s α = 0.85. To complete the test children had 20 min. Childrens’ motivational aspects were assessed afterwards with a nine-item questionnaire, taken out of a German short version of the Intrinsic Motivation Inventory (Kurzskala Intrinsischer Motivation, KIM; Wilde et al., 2009) by Deci and Ryan (2003). According to self-determination theory (Deci & Ryan, 1985, 1993) the KIM contains four subscales to assess intrinsic motivation: interest and enjoyment, perceived competence, perceived choice as well as pressure and tension. Except for the perceived choice scale, which does not fit the test situation in our study, we used all subscales and adapted the item formulations as well as the design of the Likert scale to the younger age of our participants (see Fig. 2). Figure 2 also illustrates sample items for all subscales, each consisting of three items, showing acceptable to good reliability. All in all, the children had to answer nine items before giving their personal data (gender & age). To check our manipulation the children finally had to indicate whether they believed boys or girls to be better at mathematics. Here it was also allowed to mark both sexes if no gender difference was perceived. For a successful manipulation we expected boys to be seen as superior by all children who worked on stereotyped mathematical tasks, whereas no difference was expected in the control group.

Fig. 2
figure 2

Sample items of the motivational subscales

3 Results

Before testing our hypotheses, we checked the success of our manipulation by conducting a Chi-Square test on the crosstab group x assigned competence. The results in Table 1 reveal significant group differences in the children’s evaluation of girls’ and boys’ mathematical abilities, χ2 (6, N = 146) = 13.58, p < 0.05, Φ = 0.30. However, contradicting our expectations, boys and girls assigned greater mathematical ability to their own gender after working on gender-stereotypical tasks. That means, the manipulation check turned out as expected only for boys.

Table 1 Observed and expected frequencies (in brackets) of the gender rated superior at mathematics, separated by experimental group

As the different motivational aspects were not consistently correlated with each other and with performance, we ran separate ANOVAs with mathematical performance and motivational aspects, defined as interest and enjoyment, perceived competence as well as pressure and tension as dependent variables. Intercorrelations can be seen in Table 2.

Table 2 Intercorrelations of all dependent variables

The results regarding mathematical performance revealed a significant interaction between group and gender, F(1,150) = 7.32, p < 0.01. Surprisingly, the direction of this interaction was unexpected, as can be seen in Table 3: Whereas girls working on stereotypical mathematical tasks outperformed girls in the control group, boys’ performance decreased while calculating the stereotypical tasks, compared to the boys in the control group. Nevertheless, focused contrast analysis showed that only the difference between the girls’ test scores was significant, t(147) = – 2.40, p = 0.02. Therefore, Hypothesis 1a and 1b have to be rejected. Neither did girls experience a stereotype threat in their performance nor did boys a lift, due to the stereotypical character of the tasks. Instead, girls’ performance increased significantly, whereas boys’ performance did not drop.

Table 3 Means and standard deviations (in brackets) for all dependent variables

Regarding the motivational aspects, we did not observe any main effects for the experimental group. In other words, children’s estimated interest and enjoyment, perceived competence and perceived pressure and tension did not vary by the stereotypical design of the mathematical task. There was merely a significant main effect for gender as girls felt more pressure after the mathematical test in both groups, F(1,150) = 4.56, p = 0.03. Girls were also less interested in the tasks and felt less competent while doing the test compared to the boys, independent of the experimental group. However, both main effects were slightly short of being significant and small in magnitude (interest and enjoyment, F(1,150) = 2.86, p = 0.09, d = 0.28; perceived competence, F(1,150) = 3.19, p = 0.08, d = 0.29).

4 Discussion

The present study explored the effect of gender-stereotypical tasks on children’s performance and motivational aspects in mathematics in primary school. Based on the phenomenon of stereotype threat (Steele & Aronson, 1995) and stereotype lift (Walton & Cohen, 2003) we assumed gender-stereotypical tasks to result in decreased performance for girls (stereotype threat), while boys were supposed to excel in contrast to a control group working on gender-neutral mathematical tasks (stereotype lift). As effects of stereotypes on motivation have yielded mixed results in the past, various motivational aspects referring to self-determination theory (Deci & Ryan, 1985, 1993, 2002), i.e. interest and enjoyment, perceived competence as well as perceived pressure and tension were assessed for exploratory analysis. Contradicting our hypotheses, girls calculating stereotypical tasks outperformed girls in the control group, while boys’ performance was slightly but not significantly lower than in the control group. Accordingly, the results contradicted our hypotheses, which therefore had to be rejected. Regarding motivation, only one significant gender difference appeared, as girls in both groups reported more pressure and tension while calculating the tasks, irrespective of the gender-stereotypical design. Theoretical and methodological explanations are discussed for girls’ and boys’ performance results separately before effects on motivation are addressed in a third section. Finally, a conclusion is derived, also containing implications for practice and further research.

4.1 Effects of stereotypical task performance

Results show that girls calculating gender-stereotypical tasks outperformed girls in the control group. This result contradicts our hypothesis and most stereotype threat studies in primary school children (for a review see Régner et al., 2014). However, studies have repeatedly failed to detect the detrimental effects (e.g. Flore et al., 2018), also in 10-year-old girls (Ambady et al., 2001) as in our study. Therefore, critical voices questioning the phenomenon of stereotype threat (Flore & Wicherts, 2015; Ganley et al., 2013; Stoet & Geary, 2012) have not become silent. Apart from that, these objections cannot explain the positive effects of stereotypical tasks on girls’ mathematical performance we observed in our results. Therefore, three different explanations should be discussed below.

One reason for this effect could be that younger children temporarily see their own gender as superior, which is called in-group favoritism (Heyman & Legare, 2004), also fitting our manipulation check. In addition, female children working on stereotypical tasks more often assumed that girls were better at mathematics, whereas choices in the control group were equally distributed for both genders. As stereotype threat predominantly operates on an unconscious level, it seems important to shed more light on studies exploring children’s explicit and implicit maths stereotypes, where there have been mixed results in the past. Whereas some studies found traditional gender stereotypes favoring boys in mathematics on an explicit and implicit level in primary school girls (Cvencek et al., 2011), others found girls to see themselves as inferior only implicitly, while they explicitly assume they are exceling (Galdi et al., 2014; Passolunghi et al., 2014). Again, others found girls to see boys at an advantage in maths on an explicit level, whereas their implicit associations revealed that they saw themselves as superior (Steffens & Jelenec, 2011). However, last but not least results exist showing girls to feel superior implicitly and explicitly (Heyman & Legare, 2004; Nowicki & Lopata, 2017), contradicting traditional stereotypes altogether. Although explicit stereotype endorsement is not essential for stereotype threat to appear (Huguet & Régner, 2009; Spencer et al., 1999), the stereotype should at least be taken as valid (Jamieson & Harkins, 2010). In this regard, it was shown that girls indeed believed adult women to be inferior at mathematics, while they were not convinced that this stereotype held true for themselves (Martinot et al., 2012). Corresponding our results these authors also observed girls to perceive their mathematical performance as higher, when gender was made salient (Martinot & Désert, 2007). Therefore, we failed to detect stereotype threat effects on mathematical performance as girls of this age do not see themselves as negatively stereotyped either explicitly nor implicitly. Due to in-group favoritism the stereotyped tasks may heighten girls’ self-assurance instead of harming their performance.

Another explanation could be that gender-stereotypical tasks facilitate girls’ mathematical performance as they are less distracting, and instead more familiar. In line with this, studies exist on adults’ reading performance and comprehension (Oakhill et al., 2005; Reynold et al., 2006), showing stereotyped content to be more easily processed, whereas counter-stereotypical content rather impeded performance due to higher cognitive load, harming working memory capacity. Correspondingly, it was recently shown in a primary school context that gender differences in mental rotation—stereotypically a boys’ domain—disappeared when children had to work on gender-stereotyped objects (Ruthsatz et al., 2019). If children had to rotate a doll or a hair brush, girls performed as well as boys. Similar effects could also be detected when the traditional cube figures were changed into pellets (Ruthsatz et al., 2014). Again, the authors explained their results by girls’ higher familiarity with handling the objects which could be more easily memorized by increasing the use of more holistic strategies (Ruthsatz et al., 2019), thus relieving working memory load. Paradoxically, limited working memory capacity has also been discussed as one central cognitive mediating mechanism in stereotype threat research (Schmader et al., 2008). However, which stereotypical content hampers or fosters working memory has not yet been systematically explored.

Therefore, it should at least also be considered that different gender-stereotypical cues could vary in their destructive potential. There are studies showing that particularly sexualized gender stereotypes have detrimental effects on girls’ academic motivation (Brown, 2019), making it important to distinguish more between different categories of stereotype in future research. These sexualized gender stereotypes teach girls to prioritize their physical attractiveness at the cost of other supposedly incompatible traits such as intelligence (Stone et al., 2015). Furthermore, it was also recently shown that the endorsement of these sexualized stereotypes was associated with lower academic outcomes even after controlling for general ability (Nelson & Brown, 2019). Similarly, primary school girls’ mathematical performance decreased after they had been exposed to sexualized advertisements, compared to girls who saw non-sexualized materials (Pacilli et al., 2016). Therefore, it seems important to distinguish in future research between stereotyped content associated with sexualization, which potentially harms motivation and performance, and stereotypical cues which increase girls’ familiarity and self-assurance when handling a task.

Regarding boys’ mathematical performance in gender-stereotypical tasks, we did not observe a lift effect compared to the control group. Instead, boys in the control group tended to slightly outperform boys working on stereotypical tasks, although this trend did not reach significance. Contradicting stereotype lift theory (Walton & Cohen, 2003) and our hypothesis, this result is in line with other studies failing to detect stereotype lift effects in younger boys (e.g. Neuville & Croizet, 2007). Perhaps, another effect appeared for some male students, called choking under pressure (Baumeister, 1984)—a phenomenon which is defined as a performance drop people experience when they feel forced to fulfil extraordinarily high expectations of their group. According to this, boys have been more under pressure at school in general in the last decade (Kessels & Hannover, 2011). In this regard, it was also shown that boys experience stereotype threat due to their supposed academic shortfalls compared to girls (Hartley & Sutton, 2013). Boys even lost the stereotypically supposed advantages in their “favorite discipline” of mental rotation, when it was stated that girls achieved similar or even better results in the task (Neuburger et al., 2012) or stereotypically female objects had to be rotated (Ruthsatz et al., 2019). Indeed, boys showed faster rotation with stereotypically male objects (e.g. truck or gun), however, they also made more mistakes. Similar to girls, boys also suffer in their mathematical performance when exposed to gender stereotypes (Pacilli et al., 2016). However, irrespective of sexualization, most male gender stereotypes teach boys to be agentic, aggressive and dominant, all of which are characteristics associated with physical movement, maybe conflicting with boys’ power of concentration in cognitive tasks. Therefore, future studies should explore in more detail how gender-stereotypical content and cues influence cognitive aspects, such as attention, distraction and concentration.

4.2 Effects of stereotypical tasks on motivational aspects

The results showed no significant differences in motivational aspects between the experimental conditions. Instead, the explored motivational aspects only varied by gender and not all differences reached significance. Whereas boys reported slightly but not significantly higher enjoyment and interest in the task as well as higher perceived competence, girls felt significantly more pressure and tension, independently of stereotypical task design. These results are in line with previous studies showing that girls experience more anxiety in mathematics (e.g. Erturan & Jansen, 2015) while boys report more self-confidence, although girls receive comparable results (e.g. Hargreaves et al., 2008). However, similar to results showing intrinsic motivation failing to continuously predict performance throughout primary school (Garon-Carrier et al., 2016), the different aspects of intrinsic motivation explored in our study were also not consistently related to performance. That our stereotypical task design did not affect children’s motivational aspects could at least also be due to a methodological reason. Bearing in mind the young age of our participants, it is also possible that the children thought that the motivational items referred to doing mathematics in general, instead of considering their stereotypical make up. Therefore, in a replication study—which is mandatory due to our hypothesis-contradicting results—it should be ensured that children keep the stereotypical task design in mind, while estimating motivational aspects.

4.3 Conclusion and implications

Taken together, our results neither revealed a destructive effect of stereotypical tasks on girls’ mathematical performance, nor substantial advantages in performance for the boys. Contradicting our hypothesis, girls calculating stereotypical tasks outperformed girls in the control group, while boys’ performance did not differ between both experimental conditions. Furthermore, for motivational aspects we merely found gender differences, confirming past results showing traditional gender disparities in mathematics, instead of effects of the stereotypical tasks. Regarding this, girls experienced significantly more pressure and tension while calculating, irrespective of the experimental condition. In line with the discussion about the nature of the stereotype threat, as a “cold” (cognitive) or “hot” (motivational) phenomenon (Schmader et al., 2008, p.348), performance-increasing effects of stereotypical tasks are more likely to stem from “cold” aspects in the shape of heightened familiarity-induced working memory relief than of advances in motivation. Otherwise, it is also possible that different stereotypical cues vary in their destructive potential, depending on whether they activate sexualized gender conceptions or just increase girls’ familiarity and self-assurance in handling a task. Taking into account the numerous studies confirming implicit stereotype threat effects in the past (Régner et al., 2014), future studies should explore both, effects of stereotypes on children’s cognitive processing and differences in their potential destructiveness. Therefore, maybe more qualitative research is needed, to enlighten children’s perception of and associations with several stereotypical cues. In this regard, it would be also important to explore relevant moderating aspects, not yet considered in our study. Weather children’s performance increases or decreases due to stereotypes could also depend on their parents’ stereotypes (Tomasetto et al., 2011) or their implicit gender–math stereotypes, which have been shown to moderate stereotype threat and lift effects in female undergraduate students (Franceschini et al., 2014). However, studies focusing younger girls revealed explicit counter-stereotypical believes not to prevent stereotype threat (Huguet & Régner, 2009). Corresponding results would at least be important to develop safe interventions to combat stereotype threat, at which female role models have been shown to be helpful (for a review see Lawner et al., 2019) although effect sizes vary substantially (for a meta-analysis see Liu et al., 2021). Thus, if “pink gives girls permission” to explore typical boy toys (Weisgram et al., 2014, p.401), a gender-stereotypical task design or role models may also function as an important gatekeeping step to foster girls’ mathematical performance and identification with STEM domains in the future.