Introduction

In recent years, there have been a substantial growth of studies investigating the effects of personalized gamified educational systems in terms of students’ perception (e.g., students’ motivation, performance, and flow experience) (Tondello et al., 2017; Toda et al., 2019b; Hallifax et al., 2019b; Oliveira et al., 2020). The results of these studies are contradictory (Klock et al., 2020). Some studies identified that the personalized gamified educational systems provided positive results [e.g., improving motivation and collaboration (Vidergor, 2021), competence and satisfaction (Sailer et al., 2017), interaction time (Lavoué et al., 2018)]. Meanwhile, other studies show that personalized gamified educational systems provided neutral/negative results [e.g., low motivation (Khoshkangini et al., 2017) or no significant difference in terms of flow experience (Oliveira et al., 2020)].

Especially in education, one of the most important and investigated phenomena is the flow experience (Oliveira et al., 2018), which is a feeling of deep engagement that a person can achieve during a given activity (Csikszentmihalyi, 2000). At the same time, flow experience is highly related to the learning experience (Csikszentmihalyi, 2014a). This means that when a student achieves the flow experience in an educational system, they will also be washed into a deep learning experience (Csikszentmihalyi, 2014a; Erhel & Jamet, 2019; Buil et al., 2019), the reason why it has been widely investigated in recent studies in the field of education (Hassan et al., 2020; Yoshida et al., 2013; Rodríguez-Ardura & Meseguer-Artola, 2019). This also indicates that when the personalization of the educational system is not done well and users need to interact in a system with stereotype threats, the flow experience can be low, negatively affecting the learning experience (Oliveira et al., 2021). However, even with this growing interest, the effects of personalization and gender stereotypes threats on students’ flow experience are still poorly understood and the results differ between studies (both, positive and negative results) (Oliveira et al., 2018; Hallifax et al., 2019a; Oliveira et al., 2021).

One of the hypotheses for these contradictory results is that, in general, the studies are dedicated to analyzing only the effects of the gamification elements (isolated in or group) in the students’ perception according to their gamer type/user type or demographic information (Koivisto & Hamari, 2019; Rapp et al., 2019; Bai et al., 2020), forgetting, therefore, other highly important aspects, such as gender stereotype threats [e.g., a situation raises concern that one will be judged in terms of group stereotypes (Steele, 2011)] (Albuquerque et al., 2017; Oyibo & Vassileva, 2020). In particular, these aspects can be important for the personalization of gamified educational systems, since from older theoretical studies (Deaux & Lewis, 1984; Greene & Gynther, 1995) to more recent empirical studies (Dorji et al., 2015; Carrasco et al., 2017; Albuquerque et al., 2017; Oyibo & Vassileva, 2020; Santos et al., 2022), results demonstrate that stereotyped features such as colors and avatars (with specific gender stereotype characteristics) can positively or negatively affect users’ experience, depending on their gender, for example, by increasing or decreasing concentration or anxiety (Albuquerque et al., 2017; Kollmayer et al., 2018; Komalawardhana & Panjaburee, 2018).

To meet this challenge, we present in this article the results of an experimental study (N = 307) analyzing the effects of gender stereotype-based interfaces on users’ flow experience and performance in a gamified educational system. In this study, we seek to answer the following research questions: “Do gender stereotype-based interfaces affect the users’ flow experience in gamified educational systems?” and “Do gender stereotype-based interfaces affect the users’ performance in gamified educational systems?” By answering these research questions, we move toward the resolution of a global challenge, identifying whether gender stereotype-based interfaces in gamified educational systems can affect the users’ flow experience and performance.

The main results indicate that the gender-stereotyped gamified educational system affected the users’ action–awareness merging (i.e., one of the flow experience dimensions), however, does not affect users’ performance and the overall users’ flow experience. Thus, our results allow us to advance the state-of-the-art, providing a basis for new studies and challenging the state-of-the-art for more thorough future research attempts. Our study, therefore, contributes to the areas of educational technologies, gamification, and gender studies, through insights related to the design of gender-based personalized gamified education systems.

Background

In this section, we explain the main contents labeled in this article (i.e., Gender stereotype in gamified educational systems and Flow Theory). At the same time, in this section, we present and compare the main related publications.

Gender stereotype in gamified educational systems

Gamification i.e., the idea of “transforming systems, services, and activities to better afford similar motivational benefits as games often do” (Koivisto & Hamari, 2019; Hamari, 2019), when applied in education, often aims to improve the students’ experience (e.g., students’ engagement, motivation, and flow) in educational environments (Oliveira & Bittencourt, 2019; Toda et al., 2019a; Janelli & Lipnevich, 2021). The idea of personalized gamified educational systems arose from the fact that people have different personalities and characteristics (e.g., gender, age, gamer/user type) that may influence their preferences regarding gamification design (Tuunanen & Hamari, 2012; Belk et al., 2014; Vail et al., 2015). Thus, to provide a more suitable gamification design it is important to personalize it to the user’s preferences (Oliveira & Bittencourt, 2019; Masthoff & Vassileva, 2015). Then, personalized gamified educational systems is a novel research field (Hallifax et al., 2019b), with few empirical studies conducted (Klock et al., 2020) and with conflicting results in terms of the student experience (Hallifax et al., 2019a), with room for further research that can further investigate different effects of personalized gamified educational systems on students’ experience.

In recent years, within the personalization of gamification domain, few studies have also aligned discussions related to gender-based personalization (i.e., gender-based personalization strategies, according to the genderFootnote 1 self-reported by users) and gender stereotype threats (i.e., situation in which people of a certain gender do not feel represented in a certain system (Steele, 2011; Albuquerque et al., 2017; Santos et al., 2022)) as a determining factor to affect the students’ experience (Albuquerque et al., 2017; Göbl et al., 2021; Denden et al., 2021). Especially in gamified education, this is because, commonly, gamified educational systems are designed/implemented as a male-stereotyped system (e.g., with only male-stereotyped avatars, male-stereotyped colors, or male-stereotyped language) (Wanner et al., 2020).

This tendency tends to negatively affect the perception of certain users (Oyibo et al., 2017), who may, for example, have a high level of anxiety when using a stereotyped interface for another gender. At the same time, this can lead to students giving up on using a system (Albuquerque et al., 2017). Despite these theoretical assumptions, there are still a lack of empirical/experimental studies that investigate the effects of gender stereotype-based interfaces on the students’ experience (e.g., flow experience) and performance.

Flow theory

The Flow Theory was proposed in the 1970s by Csikszentmihalyi and Csikszentmihalyi (1975) and represents an experience of deep engagement in certain activities (Csikszentmihalyi, 1997a). To achieve a flow experience, one has to go through nine different dimensions which when achieved together represent the so-called flow experience: (i) challenge–skill balance; (ii) action–awareness merging; (iii) clear goals; (iv) unambiguous feedback; (v) total concentration on the task at hand; (vi) sense of control; (vii) loss of self-consciousness; (viii) transformation of time; and (ix) autotelic experience (Csikszentmihalyi, 2000).

The idea of challenge–skill balance can be defined in a personal way, separated from any structures of activity, and the perception of the defined challenge is critical to flow occurring Jackson and Marsh (1996), Jackson and Eklund (2002), and according to Csikszentmihalyi and Csikszentmihalyi (1975), when in flow, a dynamic balance exists between challenges and skills. At the same time, the idea of action–awareness merging, according to Jackson et al. (2011), is a sense of effortlessness and spontaneity, associated with the flow dimension of action–awareness merging (further details next), coming about through a total absorption in what one is doing.

The clear goals dimension is a necessary part of achieving something worthwhile in any endeavor and the focus that goals provide to actions also means that they are an integral component of the flow experience (Jackson et al., 2011; Jackson & Eklund, 2002). In this way, the unambiguous feedback dimension is closely associated with clear goals and is represented as the processing of how performance is progressing in relation to these goals (Jackson et al., 2011).

The total concentration on the task at hand is a dimension that represents a person totally connected to the task in which one is engaged and optimizes the flow state and this connectedness relies on a present-centered focus (flow resides in being in the present moment, rather than in the past or future) (Jackson et al., 2011; Jackson & Eklund, 2002). At the same time, having the experience of total control (i.e., sense of control dimension) is likely to move an individual away from the experience of flow and into relaxation or boredom.

The loss of self-consciousness dimension is related to liberating to be free of the voice within our head that questions whether we are living up to self- or other imposed standards Jackson et al. (2011), while the transformation of time dimension can be represented as the intensity of focus may also contribute to perceptions of time slowing, with a feeling of having all the time in the world to execute a move, that is, in reality, time limited (Csikszentmihalyi & Csikszentmihalyi, 1975; Csikszentmihalyi, 2014b; Jackson et al., 2011; Jackson & Eklund, 2002).

Finally, the autotelic experience is often recognized as the most important flow experience dimension. Csikszentmihalyi (1997b) coined the term “autotelic” experience to describe the intrinsically rewarding experience that flow brings to the individual. It is generally after completing an activity, upon reflection, that the autotelic aspect of flow is realized and provides high motivation toward further involvement (Jackson et al., 2011).

Different studies conducted over several years show that this experience is highly linked to students’ learning experience because when in a flow experience in an educational activity, the learning experience is also good (Csikszentmihalyi, 2014b; Buil et al., 2019; Rodríguez-Ardura & Meseguer-Artola, 2019). Furthermore, recent studies have drawn attention to creating educational activities that may lead to the flow experience (Gao et al., 2019; Oliveira et al., 2019, 2020). At the same time, a system with stereotype threats can negatively affect the user experience (Dorji et al., 2015; Albuquerque et al., 2017; Santos et al., 2022), and if it negatively affects the flow experience, it can also harm the learning experience.

Given the direct relationship between the flow experience and the learning experience (Csikszentmihalyi, 2014a; Heutte et al., 2016), it is important to identify which gamification aspects (e.g., colors, design dos elements, personalization) can lead students to a flow experience (Oliveira et al., 2020). Thus, one important topic within Flow Theory in educational technologies studies is to identify the gamified educational systems design aspects (e.g., gender stereotype) that can lead to this experience (Kiili et al., 2012; Hsieh et al., 2016; Erhel & Jamet, 2019).

Related works

To identify the related works, we analyzed recent systematic literature reviews in the field of gamification (Koivisto & Hamari, 2019), personalized gamification (Hallifax et al., 2019a; Rodrigues et al., 2020), and Flow Theory (Oliveira et al., 2018, 2021). We focus on presenting studies dealing with gender stereotype threats or Flow Theory in gamified educational systems. We also present studies regarding the effects of color temperature in users’ experience.

Concerning studies analyzing the effects of gender stereotype in gamified educational systems on participants’ experience, Albuquerque et al. (2017) conducted an experimental research investigating if gender stereotype threat in a gamified educational system affects the users’ anxiety and performance. They executed a three-stage survey where participants were asked about their anxiety after they used an online gamified educational system (to solve a logic quiz) and finally they were asked about their anxiety during the system usage (thus, comparing the students’ anxiety before and during the system usage). One of their results indicated that the male-stereotyped system increased the females’ anxiety (Albuquerque et al., 2017). This study opens space for conducting new studies investigating different aspects of user experiences in this kind of system (e.g., concentration and flow).

Using the same system, Santos et al. (2022) investigated the effects of the stereotype threats using a quantitative experiment in three gamified environments (stereotypical male version, stereotypical female version, and control environment). The study was conducted with a sample size composed of 150 high school and undergraduate students. They identified that the participants randomly assigned to the male learning environment present an increase in aggressiveness level. Among other results, they also identified that the stereotypical male and female learning environments increased the participants’ performance level (Santos et al., 2022).

Wanner et al. (2020) investigated how male priming in STEM subjects affects female emotions during gamification tasks. Female students who were primed with Anti-STEM achieved better results in the programming gamified task. It could be observed that the girls enjoyed the gamification tasks (Wanner et al., 2020). The result confirms that gamified activities can affect participants differently based on gender.

Denden et al. (2021) examined the effect of gender and personality differences on students’ perception of gamification in education. In a study with 189 undergraduate students, they observed that gender and personality can affect students’ perception of specific game elements. Especially, females are more likely to find feedback useful than males. They also observed that gender moderates the effect of personality on students’ perception of the implemented game elements (Denden et al., 2021). The results of Denden et al. (2021) also demonstrate the importance of analyzing different design aspects that affect the students’ experience.

Other studies have been devoted to analyzing the effects of personalizing educational games and gamified educational systems on participants’ flow experience. Erhel and Jamet (2019) conducted a study investigating the relationship between the flow state and learning within an educational game. Their results indicate a positive influence of participants’ flow experience in their memorization and comprehension. The study, however, was carried out in the context of games (and not gamification) and did not consider any type of gender stereotype (Erhel & Jamet, 2019).

Oliveira et al. (2020) investigated the effects of a player type-based personalized gamified educational system in terms of students’ flow experience. They executed an experimental study with a sample composed by 121 participants comparing a tailored with a counter-tailored version. It was not possible to notice a significant difference in the students’ experience when using the system (Oliveira et al., 2020). These results contradicted some other similar studies and also draw attention to the importance of conducting new studies to investigate the flow experience in other types of personalization.

Studies have also investigated the effects of colors on users’ experience. Oyibo and Vassileva (2020) conducted an empirical study to investigate the effects of color temperature and layout design in terms of technology acceptance and user experience (UX). They conducted an experiment with a sample composed of 323 respondents. Their finds suggest that the colors blue and green are more useful than orange and red in tourism websites (Oyibo & Vassileva, 2020). The study conducted by Oyibo and Vassileva (2020) focused on mobile interfaces for tourism applications, opening space for similar investigations in the field of educational systems.

In summary, the related studies are subdivided into (i) analyzing the effects of gender stereotypes on users’ experience, (ii) analyzing the effects of personalized gamification (e.g., user type-based personalization) on the users’ flow experience, or (iii) analyzing the effects of using colors on the participants’ experience. However, there are a lack of studies that investigate the effects of gender stereotype on users’ flow experience and performance in a gamified educational system. Thus, as far as we know, our study is one of the first to investigate whether gender stereotype affects users’ flow experience and performance in a gamified educational system. Table 1 presents a comparison between the related works.

Table 1 Related works comparison

Study design

This study aims to analyze the effects of gender stereotype-based interfaces on users’ flow experience and performance in a gamified educational system. To conduct our experiment, the Goal/Question/Metric (GQM) framework was used. The GQM was chosen because its efficiency is sustained by (i) define project goals; (ii) verify the goals and define how the study goals will be achieved; and (iii) provide an adequate interpretation of the results based on the established goals (Caldiera & Rombach, 1994). At the same time, this framework is used and recommended for experimental studies in different areas (Van Solingen et al., 2002; Aljedaibi & Khamis, 2019; Tuah & Wills, 2020).

Hypotheses definition

Over the years, several studies have shown that human beings have different perceptions/preferences regarding colors (Hallock, 2003; Karniol, 2011; Beigpour & Pedersen, 2015), as well as different perceptions/preferences regarding gender stereotypes (Basow, 1992; Deaux & Lewis, 1984). Studies conducted by Greene and Gynther (1995), Coursaris et al. (2008) and others, for example, identified that males tend to prefer and feel better in environments with blue colors, while girls feel better in environments of pink color, at the same time that gray color is a neutral color for both males and females. Other recent studies also show that the stereotypes of personalization of systems can also influence (positively or negatively) in the users’ perception (Albuquerque et al., 2017; Oyibo & Vassileva, 2020).

Studies also show that possibly, people tend to have different preferences for avatars, according to the stereotype of those avatars (e.g., avatars with characteristics related to their gender) (Albuquerque et al., 2017; Carrasco et al., 2017). Different studies have shown that when someone needs to use a stereotyped system for another gender, they tend to feel uncomfortable in the system, for example, having a significant increase in anxiety (Albuquerque et al., 2017). At the same time, other studies highlight that these characteristics can directly influence students’ learning/performance when interacting with educational systems (Orji et al., 2013, 2014). Thus, we hypothesized that gender stereotype-based interfaces affect the students’ flow experience in a gamified educational system (H1), as well as gender stereotype-based interfaces affect the student’s performance in a gamified educational system (H2).

Participants

Our participants were 307 (173 self-reported as male and 134 self-reported as female) subjects with an average age of 24 years old (SD = 3262), from 26 different countries. Our participants were recruited through the Amazon Mechanical Turk (MTurk)Footnote 2 and Prolific platformFootnote 3, which are crowdsourcing marketplace services highly used to conduct human-based experiments (Paolacci et al., 2010; Palan & Schitter, 2018; Orji et al., 2014; Hallifax et al., 2019b). Each participant from MTurk received 25 cents, while each participant from Prolific received 0.63 £ for their participation. For MTurk participants, the value was decided based on a pilot study, conducted with 10 participants, where they were asked about the value they considered fair for participation in the experiment. For Prolific participants, the value is automatically defined by the platform, based on the average response time of the participants. We also invited totally voluntary participants from email lists and social networks (only 5 were volunteers). This data collection strategy aimed to ensure maximum heterogeneity in our sample. Table 2 describes our sample.

Table 2 Participants per country

The study is suitable for different criteria for sample size validation. According to Bentler and Chou (1987) it is necessary to have at least five participants for each construct measured (in our study, we have nine constructs and 307 participants, thus \(\approx\) 34 for each construct). Hair et al. (1998) suggests the same rule for factor analyses. Loehlin (1998) suggests that at least 100 participants are required for a complete sample size.

Materials

To identify the participants’ experience, we used the open-source-gamified educational system called “Removed for anonymous review,” which was implemented to help participants in studying different educational contents in a simple and fun way, through a simple design and the use of gamification elements (Albuquerque et al., 2017). Initially, the system allows the users to choose an avatar to represent themselves and personalize their experience in the system (see Fig. 5a in Appendix 1). Then the users can begin to study a particular subject (e.g., equation) and answer questions about that subject (see Fig. 5b in Appendix 1). The system was chosen because it uses the most common gamification elements in the field of education (i.e., points, badges, ranking, levels, progress bar, and avatar) according to different recent literature reviews (Hamari et al., 2014; Sousa Borges et al., 2014; Nah et al., 2014; Koivisto & Hamari, 2019; Oliveira et al., 2021).

The system has three versions. Figure 5a and b (in Appendix 1) shows the neutral version of the system. In the neutral version, only neutral colors (which are usually associated with no gender (Greene & Gynther, 1995; Karniol, 2011)) are used. In this version, there are avatars with male, female, and neutral stereotypes to be chosen by users. Similarly, gamification elements are presented in neutral colors. Figure 5c and d (in Appendix 1) shows the male-stereotype version of the system. In this version, the color blue [the main color associated with the male gender (Greene & Gynther, 1995; Karniol, 2011)] was used. In choosing avatars, only characters with male stereotypes were used. Finally, Fig. 5e and f (see Appendix 1) show the female-stereotype version of the system. For this version, the color lilac [the main color associated with the female gender (Greene & Gynther, 1995; Karniol, 2011)] was used. In choosing avatars, only characters with female stereotypes were used. The system personalization was based on different studies on gender color design stereotypes (Greene & Gynther, 1995; Hallock, 2003; Hurlbert & Ling, 2007; Coursaris et al., 2008; Karniol, 2011). These studies show that people prefer different colors according to their gender and indicate which colors are most associated with each gender.

To identify the flow experience of participants during the system usage was used the short flow state scale (short FSS) was developed and validated by Jackson and Eklund (2002). The scale was developed based on the nine original Csikszentmihalyi and Csikszentmihalyi (1975) Flow Theory dimensions. This scale was also validated for the gamification domain (i.e., to be used in gamified systems) by Hamari and Koivisto (2014). For this study, we used the short FSS following the original “Manual for the Flow Scales” (Jackson et al., 2011), presenting the scale in a 5-point Likert Scale (Likert, 1932). This scale was chosen because according to the systematic literature review conducted by Oliveira et al. (2018), the scale uses the nine original flow experience dimensions proposed by Csikszentmihalyi (1997a) and is the most used scale in the field of educational technologies. We also included an “attention-check question” (i.e., if you are filling out the form carefully, answer 4) to remove participants who were not paying attention when answering the scale. Appendix 2 presents the FSS, including information regarding how each flow experience dimension is measured in the scale. SPSS 27 software program was used to conduct the analysis.

Procedure

The study procedure was organized in three different steps (pilot studies demographic survey, system usage, and flow experience report):

  • Pilot studies In this step, we conducted an initial study, where we evaluated the system used in this experiment with six Master of Business Administration (MBA) students (with voluntary participation in the study). In this pilot study, we applied the think-aloud protocol (Alhadreti & Mayhew, 2016) just to make an initial qualitative analysis of the student’s perception regarding the system personalization and to evaluate the possibilities of changes in the conduct of the main experiment. The think-aloud protocol was chosen to be used in this step because it is a widely used approach to user interface analysis, capable of perceiving details even with small samples (Charters, 2003; Alhadreti & Mayhew, 2016). Then, we conducted a second pilot with 10 participants in the MTurk. The sole purpose of this pilot study was to assess the fairest amount to pay for those participating in the main study.

  • Demographic survey In this step, participants answered a demographic survey asking about their gender, age, academic/educational level, and birthplace (country). Especially, self-reported gender information was important to analyze the flow experience level according to the user gender and the system version used. These data were chosen because it is widely used in demographic studies in the field of educational technologies [e.g., (Orji, 2014; Hallifax et al., 2019b; Oliveira et al., 2020)]

  • System usage and flow experience report In this step, participants used logic quizzes (within the gamified educational system) for about 25 min. Upon entering the system, a randomization algorithm automatically drew one of the system versions (male stereotyped, female stereotyped, or neutral) for each participant. Then, immediately after finishing the system usage, participants answered the FSS according to their experience during the system usage.

We initially collected and grouped the data from all responses. We removed responses of participants who missed the attention-checking question and participants who choose different options from males or females on the demographic survey. We received originally 330 responses, 17 were removed because answered wrong the “validation question.” No participant selected any different option of male or female and prefer not to inform the gender, which has been removed from the analysis. In our analysis, we considered three personalization settings (male stereotyped, female stereotyped, and neutral), which is our independent variable, as well as, the participant’s performance and all flow experience dimensions (our dependent variables). In our study, we compared all possible personalization settings (i.e., male using the neutral version (MD), female using the neutral version (FD), male using the male-stereotyped version (MM), male using the female-stereotyped version (MF), female using the female-stereotyped version (FF), and female using the male-stereotyped version (FM)).

Data analysis

To analyze the data, first, we compared the different flow experience dimensions [i.e., challenge–skill balance; action–awareness merging; clear goals; unambiguous feedback; total concentration on the task at hand; sense of control; loss of self-consciousness; transformation of time; and Autotelic experience (Csikszentmihalyi, 2000)] in each group of participants and the overall flow experience [obtained by the FSS (Jackson & Eklund, 2002; Hamari & Koivisto, 2014)]. To calculate the participants’ flow experience in each version, comparing its difference, we calculated the median, standard deviation (SD), and normality of the data for all flow experience dimensions.

As the population of each group was less than 50, following the recommendations of Wohlin et al. (2012), we decided to use the Shapiro–Wilk test to analyze the data normality (Shapiro & Wilk, 1965). After this, following the Shapiro–Wilk test results (where we detect that the data do not follow a normal distribution), we analyzed the variance between each flow experience dimension for each group previously defined. Then, to verify our hypotheses, considering that we cannot affirm that the data are within a normal distribution, following the recommendations of Wohlin et al. (2012), we performed the Kruskal–Wallis test.

Results

In Table 3, we present the descriptive statistics of the study, showing the mean and SD for all participants’ flow experience dimensions in the different versions of the gamified educational system (including overall flow experience). Figure 1 presents a graphical comparison of the users’ experience and Fig. 2 presents a graphical comparison of the users’ performance during the system usage.

Table 3 Participants’ flow experience analysis
Fig. 1
figure 1

Users’ flow experience analysis

Fig. 2
figure 2

Users’ performance analysis

Table 4 presents the Kruskal–Wallis test, while Fig. 3 presents the pairwise comparisons for the Kruskal–Wallis. Results of Kruskal–Wallis test showed that gender stereotypes in gamified educational systems does not affect students’ performance and overall students’ flow experience, however, affect students’ action–awareness merging (k = 16,393; p = 0.006) that is one of the flow experience dimension. So, we carried out the post hoc tests for action–awareness merging. Table 5 presents the post hoc results and Figure 4 presents the independent samples Kruskal–Wallis test for the action–awareness merging.

Table 4 Kruskal–Wallis test
Fig. 3
figure 3

Pairwise comparison

Table 5 Post hoc tests
Fig. 4
figure 4

Independent samples Kruskal–Wallis test

Discussion

In this article, we present the results of an experiment analyzing the effects of gender stereotype-based interfaces on users’ flow experience and performance in a gamified educational system. The main results indicate that gender-stereotyped interfaces do not affect users’ performance and overall users’ flow experience, however, affect users’ action–awareness merging.

When analyzing the trend presented in Fig. 1 and Table 3, it is possible to notice a similarity in terms of the effects of stereotypes in the different flow experience dimensions. That is, when an experience was high or low in one of the system’s settings, it was also similar in the other settings. Our result is similar to the results of the study conducted by Oliveira et al. (2020) (also analyzing the participants’ flow experience in a gamified system). Both studies corroborate that different designs (personalization settings) can affect the flow experience dimensions differently; however, this difference is not always significant and it is not yet possible to identify exactly what can cause the effects.

One of the most exposed differences (visually in Fig. 1) was in the dimensions of the transformation of time and autotelic experience, where male participants using the default version had high experiences in comparison to the other participants. However, this difference also was not significant, that is, despite the visual difference, transformation of time, and autotelic experience were not affected by any of the personalization (see Table 4). Similar, different studies, for instance, the study conducted by Erhel and Jamet (2019) and the study conducted by Oliveira et al. (2020), most of the flow experience dimensions were not affected for different gamification designs.

The only dimension where results presented a significant difference was the action–awareness merging dimension, where females using the male-stereotyped version had their action–awareness merging negatively affected (see Tables 4, 5, Figs. 3, and 4). This result can be bought from the study conducted by Albuquerque et al. (2017), which found that male-stereotyped environments increased females’ anxiety. Initially, the action–awareness merging dimension, according to Jackson et al. (2011), represents a unit of complete awareness of the task being performed. Thus, it can be related to anxiety which, depending on the anxiety level, can result in a loss of consciousness in the task that is being performed.

In both studies [Albuquerque et al. (2017)’ study and our study], female participants using the male-stereotyped version had what we might call a negative experience (in the case of Albuquerque et al. (2017), with increased anxiety, in our case with action–awareness merging being negatively affected). This result draws attention to the fact that different experiences (e.g., anxiety and action–awareness merging) of females when using male-stereotyped systems can be negatively affected. Such a result may have implications because in general systems that are not personalized tend to have characteristics more similar to the male audience (Wanner et al., 2020; Albuquerque et al., 2017). So our result also draws attention to the importance of improving personalization systems for females.

Concerning users’ performance, the results indicate that performance was not affected by the different stereotyped versions (see Table 4). This result may have occurred due to different factors. One of these factors is the fact that the users’ performance can be linked to the intrinsic motivation (Bodkyn & Stevens, 2015; Chen & Law, 2016) to carry out the activities, that is, the users’ worked to do the activities regardless of the system design. Another factor may be related to the type of activities performed by users in the study (quizzes on logical reasoning). That is, users may have been motivated only by the type of question being answered.

In recent studies, other researchers have found similar results, for example, both in the study of Oliveira et al. (2020) and in the study of Albuquerque et al. (2017), the different designs of the system did not affect the participants’ performance. This result may indicate that participants’ performance is not directly related to other experiences and that different studies need to be done to analyze how different types of personalization can affect users’ performance in educational systems.

Of the nine flow experience dimensions, only one was affected by the stereotypes in the gamified educational system. Some general factors may be related to the results of the study. One of the factors is that in this study we chose to stereotype the system only in terms of colors and avatars, not personalized, for example, the gamification elements presented in the system. Recent studies have hypothesized that for system personalization to positively affect different types of users’ experiences, personalization needs to be based on different factors [e.g., gender, user types, and type of activities, among others (Hallifax et al., 2019a, b; Rodrigues et al., 2020)].

In summary, the results of our study demonstrate that stereotyped gamified educational systems affect the users’ action–awareness merging and do not affect the flow experience and performance of users. These results highlight the importance of conducting new studies that can carry out similar experiments considering different types of personalization and analyzing different experiences.

Threats to validity and limitations

Because we conducted a study with humans, some threats inherent in the type of study were generated. To mitigate these threats as well as facilitate the reproduction of the study in different situations, we will describe below the threats to validity identified in this study. Initially, our study evaluated a subjective human experience (i.e., flow experience) that can be difficult to measure. To mitigate this threat, we used only validated instruments widely used in scientific studies [i.e., the FSS proposed by Jackson and Eklund (2002) and validated by Hamari and Koivisto (2014) for gamification domain].

In this study, we asked the participant’s gender. In this sense, several genders could be available as an option, yet we used only two gender options (male and female), which may not have contemplated some participants. To mitigate this limitation, we enter the options “other” and “I prefer not to inform” for participants who do not feel represented in the male or female options. No participant marked any option other than “male” or “female.” We intend to include other options in future studies, to obtain the possibility of new insights.

At the same time, we understand that gender in this context is superficially linked to sex and that numerous studies address the issue in a much deeper way, such as the work of De Lauretis (1987). However, as the first study in this line in the area of gamification applied to education, we believe that we are setting a precedent for future work to investigate the subject, which is so complex, in greater depth.

People from different countries answered the survey. These people may have different preferences about “what is a stereotyped system for males or females?” To mitigate this threat, we stereotyped the system according to more general aspects (e.g., colors and avatars) based on international studies, considering data from people from different countries. However, even so, we suggest comparative studies between people from different countries (cross-cultural studies) be done. Studies replicating the experiment conducted by Oyibo et al. (2017) in the field of education can help advance the state-of-the-art and understand the influence of culture on the effect of age and gender on educational technologies design.

Finally, the system gamification design was not planned based on systematic studies (e.g., gamification design frameworks) which can negatively influence the quality of the system and how the participants perceived the system. Likewise, the system’s personalization may have been limited concerning the real needs of the participants (in terms of stereotypes). To mitigate this threat, we used a system already used in other experimental studies that based most choices on other studies in the area.

Concerning the study limitations, the study was conducted by people of different ages and cultures, not allowing us to identify whether the results apply to people of specific ages or cultures. This limitation opens space for conducting new studies that replicate this experiment in controlled environments (e.g., with participants of the same age group in a specific school). Another limitation of the study is in the type of system and the time the study took place, as the system is for quick use (casual system), for example, not allowing the user to login and log out of the system several times and can use the system in different moments. This limitation opens space for the replication of the study in larger systems, which allow, for example, the conduct of longitudinal studies, where the participants’ flow experience can be measured at various times throughout the use of the system.

Building the future

Based on our study results and limitations, we identified different insights to drive future studies in this field. Thus, in this section, we present a research agenda to advance the research field.

  • Multiple factor-based personalization In recent studies (Albuquerque et al., 2017; Oliveira et al., 2020; Erhel & Jamet, 2019), instruments used in the experiments were personalized/stereotyped based on specific factors (e.g., gamer types, gender, age). However, few studies have analyzed how different factors applied together (e.g., gender-based and age-based personalization) affect the users’ experience. Thus, we suggest that future studies should analyze how multiple factors-based personalizations affect users’ experience.

  • Analysis of new experiences As it is a relatively new area, with potential for growth, so far few experiences have been analyzed in the studies carried out (e.g., motivation, anxiety, and flow). However, several other experiences can be related to the learning experience and that can be evaluated in future studies. For this reason, we recommend that future studies also focus on analyzing the effects of customizing gamified educational systems on other user experiences.

  • Cross-cultural studies Recently, studies have shown that personalization/stereotypes affects users’ experience according to cultural variables (Oyibo et al., 2017; Toda et al., 2020; Oyibo & Vassileva, 2020). Concerning the effects of stereotyped gamified educational systems on users’ experience according to cultural factors, results are still incipient. Thus, we suggest that future studies analyze how personalized gamified educational systems affect users’ experience according to cultural factors.

  • Longitudinal studies Most of the studies conducted so far are studies carried out in a short period. With the advances in the area, we consider it important that future studies can analyze the effects of gender-stereotyped educational systems within long-term interventions. Therefore, we suggest that longitudinal studies can be conducted in future.

Concluding remarks

In recent years, different studies have been conducted to analyze whether gender stereotype-based interfaces affect users’ experience in educational systems. However, there is still no concrete answer to this question and many studies have highlighted the challenge of understanding when the stereotyped gamified educational systems affect users’ flow experience and performance. Thus, in this study, we analyzed the effects of gender stereotype-based interfaces on users’ flow experience and performance in a gamified educational system.

Answering our first research question (Do gender stereotype-based interfaces affect the users’ flow experience in a gamified educational system?), we identified that gender-stereotyped personalization in gamified educational systems affects users’ action–awareness merging, however, does not affect users’ overall flow experience. Answering our second research question (Do gender stereotype-based interfaces affect the users’ performance in gamified educational systems?), the results indicate that the gender-stereotyped gamified educational systems do not affect users’ performance.

From the results obtained, we intend, in future works to perform more in-depth studies involving the design of interfaces based on the most current theories of gender, which go beyond sex. This vision ca advance the literature on gamification applied to education and toward an equitable design. We also aim to replicate the study by considering other aspects of personalization (e.g., considering other gamification elements and content-based gamification), as well as evaluating other aspects of user experience. Finally, we will replicate the experiment through a longitudinal study, get the users’ flow experience at different times, and analyze it using different machine learning techniques.