Introduction

In the last ten years, the use of game-based learning has increased significantly (for a review, see Hwang & Chen, 2022). One line of research indicates that game-based learning (i.e., using game elements in learning materials) seems to encourage learners to invest more effort in an instructional task (e.g., Plass et al., 2015; for meta-analyses see Clark et al., 2016; Sailer & Homner, 2020), often referred to as engagement (Fredricks et al., 2004; Schwartz & Plass, 2020). This increased engagement might also improve learning outcomes, as demonstrated in meta-analyses (Sailer & Homner, 2020; Wouters et al., 2013). The use of game elements, however, might also be considered to be in contrast with common multimedia learning principles, which typically suggest keeping learning material as minimalistic as possible to reduce the cognitive load and working memory demands (e.g., Mayer & Fiorella, 2014). That is, additional (game) elements embedded in the learning material might distract learners and/or divert attention away from the essential content of the learning material and thus impair learning and/or performance—such distractors are often referred to as seductive details (e.g., Rey, 2012). Although recent studies have shown important advances in this research field, more details about the use of game elements in learning material are still necessary (e.g., Zainuddin et al., 2020).

As such, the current study set out to evaluate whether the integration of game elements to an already established math learning task (i.e., number line estimation) moderates learning and performance. Accordingly, in the following, we will first provide a brief overview of different theoretical rationales arguing for and against the inclusion of game elements in learning tasks. Subsequently, we summarize and discuss different research design approaches in the field of game-based learning and briefly describe the theoretical basis of using the number line estimation task as a training tool within the context of basic research on numerical cognition.

Engagement, cognitive load, and emotional design in game-based learning

The use of game elements or game-based learning is often justified by increasing the engagement of learners (e.g., Garris et al., 2002; Greipl et al., 2021; Mekler et al., 2017; Ninaus et al., 2015; Plass et al., 2015). Typically, engagement refers to actively participating in a task or learning activity, as opposed to being disinterested, apathetic, or only superficially involved (Newmann, 1992). In other words, engagement is considered to be the active and focused investment of effort in a task or learning environment (Schwartz & Plass, 2020). While there may be some variations in how engagement is defined, the most common view is that it involves three dimensions: behavioral, emotional, and cognitive (Fredricks et al., 2004). Although these dimensions are separate, they are also interconnected. Here, we will focus on cognitive engagement, which is often referred to as the degree of mental investment that learners put into their learning activities, which also includes thoughtful and strategic thinking to overcome problems, master difficult skills, or comprehend complex ideas (Fredricks et al., 2004; Meece et al., 1998). Accordingly, cognitive engagement can be assessed by, for instance, the number of solutions generated or the amount of time spent on a task (e.g., Schwartz & Plass, 2020). In the context of game-based learning, game elements might moderate users’ willingness to invest mental effort during task execution which leads to generative processing, i.e., cognitive processing aimed at making sense of the presented (learning) material (Mayer, 2019), and better performance. In game-based learning, different game elements or, following Bedwell’s taxonomy (e.g., Bedwell et al., 2012), game attributes such as immersion, game fiction, and challenge can be used. When a learning task with game elements becomes a learning game or remains “only” a gamified learning task is still a matter of debate (Plass et al., 2020; Tokac et al., 2019) and is beyond the scope of the current article.

Previous research on emotional design demonstrated that adding game elements to learning and cognitive tasks can increase cognitive engagement, thereby modulating learning and performance (Bernecker & Ninaus, 2021s; Ninaus et al., 2015; Plass et al., 2015). However, using game elements may also have detrimental effects (for a review see Toda et al., 2018), such as lower scores in exams (de-Marcos et al., 2014), reduced motivation (Hanus & Fox, 2015), or even distract learners from the actual learning task (Kocadere & Çağlar, 2015).

According to the seminal cognitive load theory (Sweller, 1988) and cognitive theory of multimedia learning (Mayer, 2005), any information entering our cognitive system is processed in working memory before it is transferred to long-term memory. As working memory resources are limited, learners’ information processing capacity is limited and must be allocated among extraneous, essential, and generative processing (for a cognitive perspective on game-based learning, see Mayer, 2020). In this context, extraneous processing refers to cognitive processing that does not serve the instructional goal of the learning task or the game. Essential processing, on the other hand, is needed to hold and manipulate incoming information in working memory, while generative processing reflects making sense of this incoming information (learning material). Accordingly, instructional designers usually aim to optimize learning by decreasing extraneous processing, managing essential processing, and increasing generative processing. Importantly, it needs to be considered that this also means—at least according to the cognitive theory of multimedia learning (Mayer, 2005)—that although game elements increase extraneous processing, they may nevertheless foster generative processing by increasing engagement and the willingness to exert mental effort (Mayer, 2020). As such, a careful balance of game elements and instructional features is necessary not to hinder learning when using games. Intrinsic integration is a game design approach that is suggested to enhance learning (Habgood & Ainsworth, 2011; Kafai, 1996). It refers to the general idea that there must be an intrinsic association between the game’s core mechanics and the learning content or learning mechanic (Habgood & Ainsworth, 2011). Further, according to the RETAIN model (Gunter et al., 2008) which was developed to design and evaluate educational games, the tight coupling of learning content and the game’s fantasy and/or story (i.e., embedding content into the fantasy) is essential. In other words, learning effects might be enhanced if subject matter and learning content is reflected in the respective game actions in an engaging and pedagogically meaningful way. In line with this, Kiili et al. (2019b) have suggested that also learning domain-specific instructional knowledge is another relevant factor to be utilized in intrinsic integration. All in all, the aim is that intrinsic integration leads to an interaction that facilitates engagement and immersion in academic learning content (Gunter et al., 2008). However, there is emerging evidence that many learning games suffer from a lack of intrinsic integration between the learning content and the core game mechanics (Kiili et al., 2019b). It seems that either instructional or game design aspects dominate the design of learning games too strongly. However, the concept of intrinsic integration has not been studied extensively yet.

Nevertheless, Habgood and Ainsworth (2011) identified that an intrinsically integrated version of a math learning game improved learning significantly more than an extrinsically integrated version. The extrinsic game version was based on quiz mechanics, a convenient and thus popular way to integrate instructional material into a game. In this game version, participating children needed to answer a multiple-choice quiz solving divisions at the end of each game level in which zombie skeletons had to be defeated. In contrast, in the intrinsically integrated version of the game, math learning content was directly integrated into the game mechanics as players had to choose different types of attacks that represented a divisor (e.g., a single swipe of a sword to divide by 2, a shove with a shield to divide by 3) to defeat zombie skeletons with numbers printed on their chests by means of a division into equally sized portions. Interestingly, learning performance and long-term outcomes were better and children spent more time using the intrinsic version of the game.

In line with the cognitive theory of multimedia learning (Mayer, 2005), extrinsically integrated game elements might increase extraneous processing, as these game elements do not directly serve the instructional goal. In contrast, intrinsically integrated game elements or games might reduce extraneous processing and ensure that players process essential learning content, a prerequisite for learning. The desired motivational effects of intrinsically integrated game elements are assumed to facilitate generative processing. Further, one might speculate that intrinsic integration should also support managing essential processing if relevant instructional approaches are successfully utilized in the integration. However, research on this conceptual issue is sparse, particularly to what extent intrinsic integration interacts with essential processing goes beyond the current study and needs to be investigated with well-controlled experimental manipulations in other studies. For instance, studies might want to systematically manipulate the degree of intrinsic integration and assess the different types of cognitive processing.

Another issue of the game-based learning research field seems to be the lack of systematic investigations of the specific benefits of game elements (cf. Boyle et al., 2016). In particular, a meaningful comparison between game-based and “conventional” learning is not a trivial task to achieve. Often, a game-based learning approach is compared to regular classroom activities, such as listening to lectures or working with books and PowerPoint slides (for meta-analyses see Sailer & Homner, 2020; Wouters et al., 2013). Importantly, however, this ignores many factors such as the underlying instructional approach, feedback opportunities, interpersonal interaction, use of technology, or instructor bias (Chen et al., 2021; Liao et al., 2010; Mayer, 2014). Consequently, even when positive/negative effects are found for game-based learning, it is often not clear whether this originated from the use of a game or specific game elements or might have other reasons (e.g., the presence of real-time feedback in game-based learning vs. delayed or even no feedback in regular classroom teaching). Accordingly, a well-matched and thus well-comparable control condition needs to be implemented to investigate the effects of game-based learning more specifically.

Number line estimation as a basic learning mechanic

Fractions are considered a particularly challenging topic in mathematics instruction in schools (Benbow & Faulkner, 2008; Booth & Newton, 2012), but even adults frequently fail to process them correctly (for a review, see Siegler et al., 2013). One of the major difficulties when dealing with fractions is the actual understanding of fraction magnitude. Importantly, fraction understanding was observed to be predictive for later more complex mathematical skills such as algebra (e.g., Booth & Newton, 2012). A lack of proper fraction understanding often persists into adulthood (Stigler et al., 2010).

A (mental) number line is a frequently employed metaphor for the mental representation of number magnitude. Consequently, the so-called number line estimation task is an often-used approach to measure and train the understanding of (fraction) number magnitude in children (e.g., Fazio et al., 2016) as well as adults (e.g., Sidney et al., 2019a). While one might assume that the majority of research was done on children (cf. Schneider et al., 2018 for a review), age ranges vary considerably depending on the research question at hand, this means from children in kindergarten/preschool (e.g., Praet & Desoete, 2014) over primary (e.g., Link et al., 2013) and secondary school (Gross et al., 2018, see also Nuraydin et al., 2023 for evidence from large-scale data) to adults (e.g., Gallagher-Mitchell et al., 2018; Sullivan et al., 2011) and elderly (e.g., Greipl et al., 2020; Matthews, et al., 2022) or even brain-damaged participants (e.g., Mihulowicz et al., 2015).

In this well-evaluated and established task (Link et al., 2013; Siegler & Opfer, 2003), the spatial position of a target number (e.g., 3/4) on a number line (e.g., ranging from 0 to 1) needs to be estimated (e.g., where goes 3/4 on a number line from 0 to 1?). It was repeatedly found that learners become more accurate (e.g., Fazio et al., 2016) and/or faster in their responses (e.g., Obersteiner et al., 2013; Wilson et al., 2009) when estimating or comparing the magnitude of numbers with training (for a review see Moeller et al., 2015). Importantly, number line-based training has also been shown to be effective in improving (conceptual) magnitude understanding of fractions and rational numbers (Fazio et al., 2016; Gunderson et al., 2019; Opfer et al., 2016; Siegler et al., 2013; van’t Noordende et al., 2016). Interventions using number line estimation as a basic learning mechanic have shown higher and better outcomes as compared to other instructional methods, such as area models (Hamdan & Gunderson, 2017). Moreover, compared to interventions working with part-whole and area models, only number line-based training approaches also led to transfer effects onto untrained tasks such as fraction magnitude comparisons and fraction arithmetic (Gersten et al., 2017; Hamdan & Gunderson, 2017; Sidney et al., 2019b). Consequently, number line estimation constitutes a well-established and validated learning task with simple learning mechanics.

Present study

The use of game elements or game-based learning in education needs to be investigated carefully (Plass et al., 2020). On the one hand, some theoretical approaches argue that the use of game elements might increase extraneous processing (e.g., Mayer, 2005; Sweller, 1988). On the other hand, approaches motivated by the emotional design perspective (e.g., Plass et al., 2015; for a meta-analysis see Wong & Adesope, 2021), argue for the use of game elements to increase different aspects of engagement. Furthermore, the way game elements are integrated or how learning content is reflected by the game’s core mechanic (i.e., intrinsic integration) seems to be another important aspect to be considered in game-based learning research (e.g., Habgood & Ainsworth, 2011; Gunter et al., 2008; for a systematic review see Kiili et al., 2019b). Finally, a lack of appropriate control conditions in game-based learning research further impeded clear recommendations on the use of game-based learning so far (for meta-analyses see Sailer & Homner, 2020; Wouters et al., 2013).

Therefore, the present study employed a simple yet effective math learning task (i.e., number line estimation) as the basic mechanic when designing a learning game to allow for intrinsic integration. That is, we utilized number line estimation as the game’s core mechanic and compared it to a conventional number line estimation task without game elements included. In particular, we compared differences in learning effects from a pretest to a posttest and performance during the training between a game-based version of the number line estimation task and its non-game-based equivalent.

Overall, we expected that both the game-based as well as the non-game-based version of the number line estimation training should yield significant learning effects. That is, participants in both training conditions should increase their fraction estimation accuracy from a pretest to a posttest using a paper–pencil version of the number line estimation task (Hypothesis 1) (Fazio et al., 2016).

On the other hand, as suggested from an emotional design and intrinsic integration perspective, we expected that learning and thus accuracy gains from pretest to posttest should be more pronounced for the game-based as compared to the non-game-based intervention (Hypothesis 2) (Habgood & Ainsworth, 2011; Wong & Adesope, 2021).

Finally, in line with the previous hypothesis and the assumption that the use of game elements leads to increased (cognitive) engagement (e.g., Bernecker & Ninaus, 2021; Plass et al., 2015), participants might invest more mental effort in solving the tasks in the game-based training. Accordingly, we expected that participants will perform better in the game-based version of the number line estimation during the actual training as compared to its non-game-based version (Hypotheses 3), as reflected by higher accuracy and faster response duration.

Methods

Participants

For this study, 90 students from a German university were recruited and randomly assigned to the game-based (n = 41) or the non-game-based training (n = 49). We excluded 5 participants from further analyses. Two participants (non-game-based training) were excluded because of extremely poor performance (more than 4 SD below the sample mean). Three participants (all in the non-game-based condition) were excluded because of problems with data acquisition (i.e., pretest items not recorded for one participant; training sessions missing for two participants).

Consequently, 85 adult students were considered in the analyses (game: n = 41; non-game: n = 44). All of these participants were German native speakers with normal or corrected to normal vision and reported no history of psychiatric or neurological disorders or drug abuse. Due to technical problems, records on the age of 37 participants were lost. The remaining participants had a mean age of 23.60 years (SD = 3.56, range 18 to 33 years). As all participants were recruited from a student population, it may well be assumed that age should be distributed comparably across those with missing data. The local Ethics Committee approved the study. All participants provided written informed consent before the study and received monetary compensation.

Study design

The experiment was conducted as a pre-post training study with five consecutive days of training between the pretest and posttest. For each training session, which lasted for approximately 15–20 min, depending on the participants’ performance, participants came to the lab at the university. The participants were randomly assigned to either the game-based (n = 41) or the non-game-based (n = 44) training condition. A paper–pencil-based number line estimation task was used as a pretest and posttest (see more below).

Training procedure

We used two different versions of a fraction number line estimation training (i.e., game-based vs. non-game-based training version). In both versions of the training, participants had to indicate the correct spatial position of a target fraction (e.g., 9/20) on a number line ranging from 0 to 1. That is, through the training, participants should improve in understanding the number magnitude of fractions. Both versions of the training employed the same task and numerical content. However, the game-based training utilized typical game elements (cf. Bedwell et al., 2012), such as visual aesthetics (i.e., game-like visual design), game fiction (i.e., game narrative), as well as a virtual incentive system (i.e., coins & life points). The non-game-based version of the training did not include any of these elements. The items in both versions were identical and used fractions with a numerator ranging from 1 to 25 and a denominator ranging from 2 to 30. Each training session employed the same 48 items which were completed twice (i.e., 96 trials) and in randomized order. In total, one training session consisted of 12 levels, each one involving eight items. Participants had to come to the lab for each training session and work on the items using a computer. Participants were instructed to answer as quickly and accurately as possible.

To evaluate potential performance improvement across all five training sessions, we calculated the mean Percentage Absolute Error (PAE; [|estimate—actual location| /numerical range] * 100; cf. Booth & Siegler, 2006) and response duration for the first answer (time of item onset to response) on each trial for each session (see below). The following will give a more detailed description of the game-based and non-game-based training tasks.

Game-based number line estimation training

For the game-based training version, we used the Semideus rational number environment, which was already successfully utilized in previous studies for evaluating fraction knowledge in Finnish and German schoolchildren (Kiili et al., 2018a, 2018b; Ninaus et al., 2021).

In the game-based version, the number line from 0 to 1 was implemented and visualized as a walkable platform in the game (see Fig. 1). Regarding the game’s fantasy context, participants controlled the character Semideus, who tries to recover gold coins that a goblin has stolen from Zeus and hidden along the trails of Mount Olympos. Semideus has discovered notes that mark (in the form of fractions) where the gold coins are hidden along the path. Regarding the core mechanic of the game, participants could move the avatar along the walkable platform—i.e., the number line—to the correct position with the help of the arrow keys of a computer keyboard. By walking on the path, participants could experience fraction magnitudes in a meaningful context (walking distance on the path). After reaching the estimated position, participants should press the space bar to lock that position and dig coins from the ground. Participants had 10 s to estimate the correct position. In case they failed to answer or answered incorrectly (i.e., 94% accuracy or less), the avatar was struck by lightning and lost virtual life points, which were represented by an orange bar on the right of the screen. For correct answers, participants found coins, i.e., earned points—the more precise their answer, the more points they earned (i.e., 100–99% accuracy would earn 500 coins, 98–97% accuracy would earn 300 coins, 96–95% accuracy would earn 100 coins). Moreover, as feedback, the correct position (100% accuracy) was shown with a vertical, green line (see also Fig. 1, Panel C). Participants had to estimate the location of each fraction/trial until it was estimated correctly before they could proceed to the next task. On the left of the screen, a green bar showed participants how far they had progressed in the level they were currently working on. At the end of each level, participants also received delayed feedback about their performance (3-star rating). For completing the level, they earned one star; for collecting at least 2000 points, they could earn a second star; and for maintaining at least 80% of the life points, they could earn a third star.

Fig. 1
figure 1

Examples of the game-based version. The onset of the number line estimation task (A); negative feedback (B); positive feedback (C); overall feedback showed at the end of each level (D)

Non-game-based number line estimation training

For the non-game-based version, we again used the Semideus rational number environment but stripped from all game elements described above. That is, we created a conventional, digital version of the number line estimation task with minimalistic implementation. In particular, we used a simple grey background with a black number line from 0 to 1 (see Fig. 2). The non-game-based training did neither utilize the narrative of Semideus (fantasy context) nor any virtual incentives (i.e., coins, life points). Participants had to move a white slider along the number line to the position of the respective target fraction by using the arrow keys of the computer keyboard—identical to the game-based training. Again, participants had 10 s to estimate the correct position by pressing the space bar at the selected position. In case they failed to answer, or their estimation was not accurate enough (i.e., 94% accuracy or less), a red cross was shown, and—identical to the game-based training—they had to repeat the trial until it was solved correctly. For correct estimations (i.e., 100–95% accuracy), participants were rewarded by simply showing a green checkmark, and the correct position (100% accuracy) was shown with a vertical green line as feedback (see also Fig. 2).

Fig. 2
figure 2

Examples of the non-game-based version. The onset of the number line estimation task (A); negative feedback (B); positive feedback (C); overall feedback showed at the end of each level (D)

Pre-post measures

Participants performed a paper–pencil version of a number line estimation task before (pretest) and after (posttest) the training. The number line ranged from 0 to 1 with the start and endpoint specified and included 96 items/fractions. Participants had to indicate the correct spatial position on the number line for all items. The items used comprised fractions with a numerator ranging from 1 to 25 and a denominator ranging from 2 to 30. That is, the items included one (e.g., 3/7) and two-digit fractions (e.g., 7/27). To investigate any potential improvements, we calculated the mean percentage absolute estimation error (PAE; Booth & Siegler, 2006).

Analyses

Data preprocessing and analyses were performed using R (R Version 4.0.5, R Core Team, 2021). Statistical requirements were checked and confirmed before running the planned analyses (i.e., linearity, homogeneity of variance, and normality of residuals) using the R package “performance” (Lüdecke et al., 2021). Prior to evaluating any potential training effects, possible performance differences between the two conditions (game-based vs. non-game-based training) at the pretest were examined using a Welch Two Sample t-test. A linear mixed-effects model was fitted to examine potential performance changes in PAE from pretest to posttest. Moreover, changes in PAE and response duration across the five training sessions were also examined by fitting separate linear mixed-effect models.

Linear mixed-effects models were fitted using “lmer” from the “lme4” R package (Bates et al., 2015). Linear mixed models provide various methodological and statistical advantages over mixed ANOVAs (Hilbert et al., 2019). Importantly, simulation studies show that linear mixed-effects models are very robust even if the distributional assumptions are violated (Schielzeth et al., 2020). The p-values for linear mixed-effects models were calculated using Satterthwaite’s degrees of freedom method available with the R package “lmerTest” (Kuznetsova et al., 2017). Marginal means were extracted using the “ggeffects” R package (Lüdecke, 2018). To support reporting, we used the “report” R package (Makowski et al., 2021). The R package ggplot2 (Wickham, 2016) was used for visualizing results.

Results

Pretest differences

The Welch Two Sample t-test evaluating the difference of PAE at pretest between the game (mean = 4.33, sd = 1.56) and non-game-based training condition (mean = 4.68, sd = 1.20) did not show a significant difference (difference = 0.35%, 95% CI [-0.95, 0.26], t(74.84) = -1.15, p = 0.255; Cohen’s d = 0.25, 95% CI [-0.68, 0.18]).

Pre-post differences—learning outcome

Differences in PAE from pretest to posttest were analyzed by fitting a linear mixed model (see also Fig. 3) using Restricted Maximum Likelihood [REML]) to predict PAE with condition (i.e., game-based vs. non-game-based training) and time (pre vs. post) as fixed factors. The model also included a random intercept to account for participants’ individual differences.

Fig. 3
figure 3

PAE changes from pretest to posttest for game-based (petrol) and non-game-based conditions (grey). Points show the PAE of each individual. Squares represent the estimated marginal means. Error bars denote 1 ± standard error

The model’s total explanatory power was substantial (conditional R2 = 0.50), and the part related to the fixed effects alone (marginal R2) was 0.08. The effect of time was statistically significant (beta = -0.54, 95% CI [− 0.91, − 0.16], t(164) = − 2.81, p = 0.005; Std. beta = − 0.22, 95% CI [− 0.38, − 0.07]) and indicated that PAE significantly decreased due to the training from pretest to posttest. Neither the effect of condition (beta = 0.35, 95% CI [− 0.15, 0.85]; t(164) = 1.36, p = 0.173; Std. beta = 0.20, 95% CI [− 0.15, 0.55]) nor the interaction between time and condition was statistically significant (beta = − 0.20, 95% CI [− 0.72, 0.32], t(164) = − 0.76, p = 0.449; Std. beta = − 0.08, 95% CI [− 0.30, 0.13]).

Training session performance

Descriptive data on PAE and duration across all five training sessions are reported in Table 1.

Table 1 Overview of the means of PAE and Duration for each training session for both conditions

PAE: Differences in PAE across the five training sessions were analyzed using a linear mixed-effect model (see Fig. 4). In particular, we fitted a linear mixed model using REML to predict PAE by condition (i.e., game-based vs. non-game-based training) and training sessions (i.e., five training sessions) as fixed factors. The model included a random intercept to account for participants’ individual differences in baseline performance and a random slope to account for individual differences in learning rate.

Fig. 4
figure 4

PAE change across training sessions for game-based (petrol) and non-game-based condition (grey). Points show the PAE of each individual. Squares represent the estimated marginal means. Error bars denote 1 ± standard error

The model’s total explanatory power was substantial (conditional R2 = 0.76), and the part related to the fixed effects alone (marginal R2) was 0.17. The effect of the training session was statistically significant and indicated that PAE decreased significantly over training sessions and, thus, participants improved across training sessions (beta =  − 0.24, 95% CI [− 0.31, − 0.17], t(417) =  − 6.82, p < 0.001; Std. beta =  − 0.30, 95% CI [− 0.39, − 0.21]). Furthermore, the effect of the condition was statistically significant (beta = 0.66, 95% CI [0.18, 1.14], t(417) = 2.71, p = 0.007; Std. beta = 0.59, 95% CI [0.26, 0.92]) reflecting that participants in the game-based condition committed smaller PAE during the training. However, the interaction effect of session and condition was not statistically significant (beta = 0.00742, 95% CI [− 0.09, 0.10], t(417) = 0.15, p = 0.881; Std. beta = 0.00915, 95% CI [− 0.11, 0.13]).

Duration: Differences in response duration across the five training sessions were analyzed using a linear mixed-effect model (see Fig. 5). In particular, we fitted a linear mixed model using REML to predict response duration by condition (i.e., game-based vs. non-game-based training) and training sessions (i.e., five training sessions) as fixed factors. The model included a random intercept to account for participants’ individual differences in baseline response duration and a random slope to account for individual differences in learning rate.

Fig. 5
figure 5

Change of response duration across training sessions for game-based (petrol) and non-game-based condition (grey). Points show the response duration of each individual. Squares represent the estimated marginal means. Error bars denote 1 ± standard error

The model’s total explanatory power was substantial (conditional R2 = 0.89) and the part related to the fixed effects alone (marginal R2) was 0.13. The effect of training session was statistically significant indicating that participants became significantly faster across training sessions (beta =  − 186.55, 95% CI [− 237.79, − 135.32], t(417) =  − 7.14, p < 0.001; Std. beta =  − 0.28, 95% CI [− 0.36, − 0.21]). Furthermore, the effect of condition was statistically significant (beta =  − 421.25, 95% CI [− 822.39, − 20.10], t(417) =  − 2.06, p = 0.040; Std. beta =  − 0.46, 95% CI [− 0.83, − 0.09]) and indicated that participants in the game-based condition took more time to respond. Again, however, the interaction effect of session and condition was not statistically significant (beta =  − 3.70, 95% CI [− 74.92, 67.51], t(417) =  − 0.10, p = 0.919; Std. beta =  − 0.00565, 95% CI [− 0.11, 0.10]).

Discussion

The current training study compared learning effects for a game-based and an equivalent non-game-based version of a number line estimation task. The results indicated that participants successfully improved their fraction magnitude estimation from pretest to posttest, but no significant differences were identified between training conditions. Furthermore, looking specifically at the training sessions, results revealed that participants estimated fraction magnitudes more accurately in the game-based condition as compared to the non-game-based condition. However, estimation also took more time in the game-based condition compared to the non-game-based condition. In the following, these results will be discussed in more detail.

Pre-Posttest comparisons

In line with our first hypothesis, participants in both conditions improved their fraction estimation accuracy from a paper-based pretest to a posttest. This demonstrates that a digital number line estimation task is an effective instructional approach to foster fraction magnitude estimation skills—even in educated university students who already showed a rather high performance at the pretest. This result is in line with previous studies examining the overall effectiveness of number line-based fraction instruction in general (e.g., Gersten et al., 2017; Gunderson et al., 2019; Opfer et al., 2016; Siegler et al., 2013) and game-based instruction in particular (e.g., Fazio et al., 2016; Kiili et al., 2018a, 2018b).

However, in contrast to our second hypothesis, improvements in estimation accuracy from the pretest to the posttest were not significantly more pronounced for the game-based as compared to the non-game-based training. The lack of a statistically significant learning difference between task versions seemed to suggest those game elements—even though not beneficial—did not significantly hamper learning either. Therefore, it seems that game elements cannot always be considered seductive, or at least the potential negative effects were smaller than could be expected in light of the cognitive theory of multimedia learning (Mayer, 2005). The current game design was guided by an intrinsic integration approach. That is, the game events were tied to Greek mythology; the number line was implemented as the walkable platform (trails) on the Mount Olympus, and the avatar was seeking hidden coins on the trails. In other words, the learner could experience fraction magnitudes as specific distances on the trails. In the non-game-based version, the learner also could experience the magnitudes, but in a more abstract way, as the mountain scenery and the walking character were not available. This might be a reason that game elements did not disturb learners, which would substantiate the results of Habgood and Ainsworth (2011), who demonstrated the beneficial effects of intrinsic integration. However, in the current study, we did not compare an intrinsic and extrinsic version of the game. Therefore, to better understand the effects of intrinsic integration, future studies are necessary.

In a similar vein, Schneider et al. (2016), investigated the effect of decorative pictures and found that not all types of decorative pictures were seductive. Given the right design choices, this means, pictures that evoke motivational and/or (positive) emotional states, as used in emotional design (for a meta-analysis see Wong & Adesope, 2021), can also be conducive to learning. In their systematic review on the use of game elements in cognitive assessment, Lumsden and colleagues (2016a) also highlighted that the integration of game elements into tasks needs to be done very carefully. The concept of intrinsic integration (Habgood & Ainsworth, 2011; Kafai, 1996) might be one way of achieving a balance between the potential positive (i.e., generative processing) and negative (i.e., extraneous processing) effects of game elements (cf. Mayer, 2020).

Nevertheless, the current implementation of game elements into a number line estimation task did not yield a significantly larger training effect than a conventional, non-game-based version of the same task. We need to note here though, that the overall performance of participants at the beginning of the training can be considered rather high and thus room for improvement over the course of the training was limited. Consequently, it is also rather unlikely to identify differential effects as the skill level was already high. That is, a study using more difficult fractions or less skilled participants might reveal different results. The study should be replicated with pupils who are less experienced with the use of fractions. Moreover, as the pre-post measure was paper-based, no response duration was recorded, which could have acted as another indicator for learning (e.g. faster response times in the posttest).

The fact that the pre-post measure was paper-based and did not include any game elements, might also explain the lack of differential learning effects (i.e., higher estimation accuracy). Participants in the game-based condition as compared to participants in the non-game-based condition estimated more accurately and took more time per estimation in the training, which might be indicative of increased cognitive engagement (for a more detailed discussion see below). However, as the pre-post measure was done on paper (i.e., different modality) and without game elements, a potential positive effect of game elements on cognitive engagement and, thus, participants’ willingness to invest mental effort might not have carried over to the posttest. At the same time, the potential positive effect of game elements on cognitive engagement during the training might not have been strong enough to be reflected in the learning outcome as assessed with the pre-post measures. Future studies should also consider using delayed posttests to study potential (differential) long-term effects and consolidation processes, which are hardly studied in this field of research. That is, one might find (more) positive or negative effects of game-based learning not immediately but only after longer retention intervals.

Performance in training sessions

Even though we were not able to identify any differential effects on the learning outcome between the two training conditions, our third hypothesis was partly confirmed. Participants indeed were significantly more accurate across training sessions in the game-based training as compared to participants in the non-game-based training. Sailer and Sailer (2021) reported similar results when comparing flipped classroom learning groups with and without game elements present. They also did not find any differences between the groups in a posttest but identified beneficial effects of using game elements during the learning/training phase on performance. This underscores the importance of studying the actual learning process when investigating the effects of game elements rather than analyzing learning outcomes alone (i.e., pretest–posttest differences).

In the current study, increased performance was only reflected in higher estimation accuracy but not, for instance, by faster response durations. In fact, participants in the game-based training provided more accurate responses but also took longer to estimate their responses across training sessions. Consequently, we can only partly confirm our third hypothesis. Overall, in both conditions, accuracy increased and response duration decreased significantly across training sessions. While this general improvement in accuracy and speed is in line with previous studies on number line estimation training (e.g., Fazio et al., 2016; Kiili et al., 2018a, 2018b; Obersteiner et al., 2013; Wilson et al., 2009; Wortha et al., 2020), the higher accuracy but slower response duration in game-based vs. non-game-based training paint a more complex picture.

In hindsight, the resulting pattern of increased accuracy and response duration makes sense from a cognitive engagement perspective. That is, participants were willing to invest more effort into accurate estimations, which also required more time. In other words, cognitively engaged participants might have been more thoughtful about their estimation strategies to optimize their accuracy leading to longer response durations. Schwartz and Plass (2020), for instance, consider the amount of time spent on a task an important indicator for cognitive engagement in games. Consequently, the game narrative/fantasy, visual appearance, and virtual incentives increased participants’ willingness to invest more mental effort as compared to participants in the non-game-based training. This is in line with the RETAIN model (Gunter et al., 2008) as it emphasizes the importance of properly integrating the educational content into the game story/fantasy, which can influence the perception of relevance in a learning task, consequently impacting the learners’ level of engagement and their behavior during the learning process.

However, the pattern might also indicate that participants in the game-based training might have prioritized accuracy over response duration across training sessions. The opposite appears to be true for participants in the non-game-based training—at least compared to participants in the game-based condition. It seems unlikely that participants in the game-based condition had an overall better understanding of fraction magnitude, as neither in the pretest nor the posttest participants in the game-based condition were more accurate than participants in the non-game-based condition (see also Fig. 3). That is, only in the training participants in the game-based condition were more accurate than participants in the non-game condition.

In general, the higher accuracy in game-based training might be a result of increased cognitive engagement, which is defined as the level of mental investment of a learner in a learning activity (Fredricks et al., 2004; Schwartz & Plass, 2020). While many studies measure cognitive engagement using questionnaires (Greene, 2015), the current study needs to rely on behavioral indicators, i.e., performance in the training sessions. Accordingly, the interpretation that the higher accuracy in game-based training is the result of higher cognitive engagement needs to be treated with caution. Nevertheless, previous studies have demonstrated that game elements can increase participants’ level of cognitive engagement so that they invest more mental effort leading to deeper essential and generative processing (e.g., Bernecker & Ninaus, 2021; Chang et al., 2016; Ninaus et al., 2015; Plass et al., 2015). Bernecker and Ninaus (2021), for instance, showed that using game elements in a working memory task reduced task disengagement, which indirectly affected task accuracy via subjective effort. Similarly, Lumsden et al., (2016a, 2016b) showed that participants indicated investing less effort in a cognitive task when no game elements were present compared to when game elements were used in the task. Mekler et al. (2017) demonstrated that users invested more effort in a task (i.e., quantity and quality of tags in an image annotation task) when game elements were present.

At the same time, increased cognitive engagement might have also led to the observed increased response durations in the game-based training. Wiley et al., (2020) showed that the use of points—a popular and often extrinsically integrated game element—in a cognitive task increased response durations but also increased error rates as compared to a basic version without game elements. In a different condition, the authors included a narrative to the cognitive task, which did not affect performance metrics. The current study used multiple game elements (i.e., visual aesthetics, game narrative, and a virtual incentive system) and realized principles of intrinsic integration, which might explain the different findings. As such, specific constellations of game elements might shape participants’ behavior in diverse ways. This interpretation, however, goes beyond the current study and needs to be studied more comprehensively and in a systematic fashion in the future. That is, future studies might systematically modulate the presence or absence of single vs. multiple game elements as well as their degree of intrinsic integration.

The fact that the participants spent more time on each task/fraction in the game-based training, however, is in sharp contrast to the extensive literature on the so-called time-on-task hypothesis, i.e., a positive association between learning outcomes and time-on-task (Carroll, 1963). Recent evidence suggests, though, that the correlation between time-on-task and learning is weak and that time-on-task is necessary for learning to occur but not sufficient (Godwin et al., 2021). In fact, the authors argued that measurements used for time-on-task and learning vary considerably across studies, and so do the associations found. In the current study, we only measured the response duration for each item/fraction, which is only one part of the overall interaction time with the training. Accordingly, it might not be surprising that we did not find results that align with the time-on-task hypothesis (Carroll, 1963). An alternative interpretation related to the missing relation between time-on-task and learning outcome might be related to the used pre/posttest. Paper–pencil versions of a number line estimation task were used with no game elements present. In case game elements lead to increased cognitive engagement, participants’ willingness to invest mental effort in the pre/posttest might not have been the same as during the game-based training.

The longer response duration in the game-based learning condition might, in fact, also indicate that participants had to process the provided narrative (i.e., novelty, extraneous processing of visuals), which was not necessary for the non-game-based condition. However, response duration only reflected the time from item onset to the first answer/response of the participants. The actual narrative used in the game-based condition was already provided before starting the actual training levels. Furthermore, the more extensive delayed feedback (i.e., 3-star rating) was shown after each level only, and the coins for correct responses were awarded directly after participants’ responses, and therefore the processing of these features should not have affected response duration directly. Only the visual appearance of the task (game-based vs. non-game based), which remained constant across all training sessions, differed during task execution. Thus, the potential additional processing necessary for the visually more extensive game-based version, if indeed the case, should be limited to the first training or even just the first few trials. The response duration differences across training sessions between game-based and non-game-based conditions, however, remained rather stable (see Table 1). Accordingly, the difference in response duration should mainly indicate that participants invested more time estimating the fractions in the game-based compared to the non-game-based training. Nevertheless, future studies will need to systematically investigate the interaction between intrinsic/extrinsic integration of game elements, time-on-task, and different kinds of cognitive processing (i.e., extraneous, essential, and generative processing). The use of eye-tracking measures, for instance, might be particularly helpful to determine which game elements are processed and for how long. In the current study, we could only rely on the performance metrics (i.e., accuracy and response duration) acquired during the training. Thus, our interpretation as to why the used game elements increased accuracy and response duration remains speculative.

A speculative but more likely explanation of the observed pattern of results might be that the used game elements changed participants’ priorities or even strategies in performing the game-based or non-game-based tasks, which might explain the observed pattern in accuracy and response duration. In particular, the provided feedback during the training might have been perceived to be more salient and/or relevant in the game-based as compared to the non-game-based condition. In particular, in the game-based condition, the avatar communicated success with jubilant gestures, and participants were awarded virtual coins, while in the non-game-based condition, only a green check mark was shown. As such, the fantasy in the game-based condition made the feedback more meaningful to the participants and thereby affected participants’ way of interacting with the learning content, which is in line with the RETAIN model proposed by Gunter et al. (2008).

This effect might have been even magnified because of the overall high performance of the participants or the rather seemingly simple task. That is, participants in the non-game-based condition might have not seen practical value in optimizing their already high estimation accuracy. In contrast, the game elements in the game-based condition might have provided additional value to participants to further optimize their accuracy. That is, choosing a more challenging task for the participants might reveal a different pattern. Furthermore, in our study, correct/incorrect feedback was only determined by estimation accuracy and not response duration, participants in the game-based condition might have been more focused on receiving positive feedback than participants in the non-game-based condition. However, future studies are needed to substantiate this interpretation empirically. For instance, future studies might wish to consider utilizing think-aloud protocols or self-explanations when solving tasks in the game-based vs. non-game-based conditions to better understand the strategies employed (e.g., Kiili et al., 2019a) and whether they differ between the game-based and non-game-based condition.

Conclusions

The current study showed that a game-based, as well as a non-game-based version of the number line estimation training, improved participants’ fraction magnitude understanding, which confirmed our first hypothesis. The results substantiated and extended earlier findings that games relying on intrinsic integration can be successfully used in fraction instruction. However, contrary to our second hypothesis, participants did not learn significantly more from completing the game-based as compared to the non-game-based training. Nevertheless, participants performed better in the game-based as compared to the non-game-based training as reflected by higher accuracy. At the same time, participants in the game-based training took longer to respond as compared to participants in the non-game-based training. It seems that the processing of the task was altered by the use of intrinsically integrated game elements, which might have increased both essential and generative processing. Accordingly, participants in the game-based training might have invested more cognitive effort in their estimations, hence, were more cognitively engaged. Further, in the current task, game elements might have altered participants’ priorities. That is, participants might have prioritized accuracy over speed in the game-based condition leading to more careful selection and use of estimation strategies. Furthermore, the current study highlighted the relevance of studying the actual learning/training process rather than relying on learning outcomes alone (i.e., pretest–posttest differences).