Gameful Learning for a More Sustainable World

Municipal waste sorting is an important but neglected topic within sustainability-oriented Information Systems research. Most waste management systems depend on the quality of their citizens pre-sorting but lack teaching resources. Thus, it is important to raise awareness and knowledge on correct waste sorting to strengthen current efforts. Having shown promising results in raising learning outcomes and motivation in domains like health and economics, gamification is an auspicious approach to address this problem. The paper explores the effectiveness of gameful design on learning outcomes of waste sorting knowledge with a mobile game app that implements two different learning strategies: repetition and elaboration. In a laboratory experiment, the overall learning outcome of participants who trained with the game was compared to that of participants who trained with standard analogue non-game materials. Furthermore, the effects of two additional, learning-enhancing design elements – repetition and look-up – were analyzed. Learning outcome in terms of long-term retention and knowledge transfer were evaluated through three different testing measures two weeks after the training: in-game, through a multiple-choice test and real-life sorting. The results show that the game significantly enhanced the learning outcome of waste sorting knowledge for all measures, which is particularly remarkable for the real-life measure, as similar studies were not successful with regard to knowledge transfer to real life. Furthermore, look-up is found to be a promising game design element that is not yet established in IS literature and therefore should be considered more thoroughly in future research and practical implementations alike.


Introduction
In their set of goals for sustainable development, the UN listed targets for different areas of human and environmental wellbeing, one of which concerns waste, sustainable consumption and production (United Nations 2020). Acknowledging the insufficiency of the status quo in terms of waste management, the EU created a plan to raise EUwide recycling to 55% and decrease landfill use to 10% by 2025 (European Parliament 2018). However, recent studies have shown that global progress is slow, partly due to a lack of appropriate legislation, insufficient financial resources, poor infrastructure, poor environmental attitudes and social norms and a lack of knowledge about what goes into which bin (Schultz et al. 1995;Thomas and Sharp 2013;Filho et al. 2016;Luo et al. 2018). A contributing factor is that many recycling and waste sorting facilities are as yet unable to reach maximum efficiency without presorting measures (Bucciol et al. 2015;Hawlitschek 2020). Countries like Germany, Austria and Switzerland have tackled this issue by making domestic pre-sorting a citi-correctly and consistently dispose of their household waste continues to be a challenge for society, as it is a task requiring individuals to perform for the benefit of society often without rewards for compliance (Abdel-Shafy and Mansour 2018). Furthermore, successful compliance requires citizens to first gain the fundamental knowledge to fulfil the required task. Yet, municipal waste sorting authorities often fail in their education attempts, partly because of outdated measures of communication and information like analogue, paper-based flyers (Luo et al. 2018). Such materials are insufficient for knowledge transmission as they lack incentives to engage mentally, particularly given the amount and depth of information that people need to retain. Of the hundreds of potential waste items, more than 200 are listed on many websites for German waste management organizations as being fundamental to sufficient municipal waste sorting (e.g., Berlin and Hamburg) 1 While citizens do not have to know each item by heart, they need to understand the underlying principles that link different types of waste to their respective bin. To engrain the knowledge in the long term, such extensive amounts of information require adequate training measures.
As stated in an interview on ''The Future of Waste Management,'' Information Systems (IS) can teach citizens where exactly to dispose of different types of waste (Hawlitschek 2020). Multi-disciplinary research has shown that games in particular are successful educational tools and supplements (Fileni 1988;Van Eck et al. 2017). By applying gameful design to a real-life context, education can be effectively manipulated, whether as fully conceptualized games or as strategically implemented gamification affordances (Barata et al. 2013;Landers and Landers 2014). In their meta-analysis on digital games and learning, Clark et al. (2016) found significant correlations between quality of design and learning outcomes, highlighting the value of deliberation on specific design decisions.
We created a waste sorting training game based on best practices of game design as well as learning theories with the goal of addressing the prevalent lack of waste sorting knowledge. The game's release was in 2015 and, as of April 2021, it was downloaded over 31,684 times on the Apple, Microsoft and Windows mobile app stores. As stated by Bellotti et al. (2013), a serious game's purpose is twofold: to be fun and entertaining as well as educational. Thus, we must assess both aspects. While the field data allowed us to ascertain the game's success in matters of game fun and engagement with a certain degree of external validity, we could not reliably infer the game's efficacy in terms of the intended learning outcome. As this is the game's primary aim, we prepared a lab experiment to measure the game's learning outcome under the following research question: Does gameful design afford learning about correct sorting of waste items into their target bin?
Gamification/gameful design 2 ) is a praxis that consists of designing suitable ''service bundles'' (Blohm and Leimeister 2013) by adding game design elements to the respective core offer -or core gameplay when the gamified product is a game in itself. The core interaction -core gameplay -of our game is based on a combination of sorting and feedback, the latter being particularly beneficial for knowledge transfer and player engagement (Sicart 2008;Bellotti et al. 2013). However, during the development of the first prototype, our user tests found that the core gameplay by itself did not engage players long enough to benefit from a long-term learning effect. We decided to add optional design elements that would offer players more choices on how to engage with the learning content. We based this decision on motivational theories that highlight autonomy as a fundamental factor of intrinsic motivation (Ryan and Deci 2000). We first chose a repetition element that would allow players to repeat a level -or wavewithout penalty. The overall learning benefits of repetition are well-documented across different learning domains (Bygate 1996;Ahmadian 2012). However, as its inherent repetitiveness could interfere with the game fun, we wanted to gain insights into the potential detriments and benefits of including such a design element. We then added an index element, where waste items can be looked up penalty-free during the core gameplay (look-up element). This was inspired by testers frequently asking why certain items were assigned to bins other than expected. We conducted a literature review to find theoretical leads on the expected learning outcome of such an index-based design element. Lacking a related foundational theory, we analyzed research on related contexts: instructional explanations, dictionaries and help tools (Miller and Gildea 1987;Ryan and Shin 2011) finding mixed expectable outcomes.
Thus, we designed our experiment in a way to further answer the second research question: What effect does a repetition-based and a look-up-based game design element have on the learning outcomes of correct sorting of waste items into their target bin?
The experiment consisted of five treatment groups to reflect both research questions. The first four learned to correctly sort waste items by playing the game. They differed with respect to whether participants played only the core gameplay, one or both design elements (repetition, look-up or combined). The fifth group completed the training with common, paper-based teaching materials on waste sorting. As we wanted to ensure long-term retention -long-term memory storage of the learned content -we measured learning outcomes 10 to 12 days after the participants had been trained. Also, while the training was conducted with a game, the learning outcome was supposed to be translated into real life. To test if participants successfully managed this knowledge transfer, we measured the learning outcome in three different ways: first, by testing knowledge retention within the training medium itself in a slightly altered version of the core gameplay designed to test each training item exactly once. Second, we measured knowledge transfer in an abstracted setting via a multiple-choice test featuring only the names (written words) of the trained items. Third, we measured knowledge transfer to real life through a sorting task with real-world waste items.
Our results showed that the treatment trained with the game significantly benefited with regard to learning outcomes of waste sorting knowledge compared to the treatment group given the non-game materials. This is especially remarkable, as, in contrast with other studies (Größler et al. 2000;Luo et al. 2018), we demonstrated that knowledge transfer to real life can be successfully achieved with a gameful application. We further found that the combination of the repetition and look-up game design elements showed significantly higher learning outcomes within the original content domain as well as the reduced setting. Interestingly, this combinatory effect was lost in transfer to real life.
We contribute to the literature and practice in several ways. While growing in numbers, assessments of the effects of specific game design elements are still rare (Bellotti et al. 2013;Kim and Shute 2015). This makes it difficult for both researchers and practitioners to derive informed design decisions from research. We identified a research gap with regard to expectable effects and outcomes of an optional look-up element and found that its implementation contributed to the learning outcome, especially when combined with a repetition-based design element. We also tested learning outcome in terms of longterm retention as well as knowledge transfer to ensure that an actual learning outcome was achieved. While our game successfully achieved the transfer of content to long-term memory, the differences of outcomes between the three different measures highlighted the importance of testing and ensuring successful knowledge transfer in multimedia contexts such as ours.
By constructing and testing a serious game on teaching correct waste sorting and detailing design rationales for future reproduction, we contributed to the ongoing effort of enhancing sustainable IS (see e.g., Elliot and Webster (2017) and Stanitsas et al. (2019)). Our results showed that successful learning outcomes can be achieved through meticulous gameful design even in less intrinsically motivated and attractive domains such as waste management and even outside a socially mediated learning context.

Empirical Findings on Gameful Design in IS Research
As early as 1991, Duffield showed that computer games provide great learning opportunities for students, as games motivate them to learn, provided that they are adequately designed and that the content of the software, problems presented and instructional methods are carefully aligned (Duffield 1991;Clark et al. 2016). Later, Rieber (2005) and Gee (2003) recommended games as potential learning tools, reasoning that gaming is a complex social practice in which players engage in high-order thinking and where they need to make a complex cognitive effort. Studies have shown that games entertain, instruct, change attitudes and enable skills development. Studies successfully correlating participants' previous gameplay experiences to related real-life skills, e.g., reaction games and driving skills (Vichitvanichphong et al. 2016) and strategy games with management skills (Simons et al. 2020), supported this finding.
In terms of application domains, we found certain topics especially prevalent, such as education (Fotaris et al. 2016;Sanmugam et al. 2016), fitness (Jang et al. 2018;Kappen et al. 2018), health (Allam et al. 2015;El-Hilly et al. 2016;Kurtzman et al. 2018) and the economy (Rodrigues et al. 2016;Hamari 2017). Most sustainability-related studies are connected to broader economic domains like sustainable transport education (Putz et al. 2018) or domestic energy engagement (Gustafsson et al. 2009). Studies solely focused on topics of sustainability are scarce, particularly with regard to sustainable waste management (Elliot and Webster 2017).

Gameful Design and Waste Sorting
In a literature review of serious games in the general domain of sustainability, Stanitsas et al. (2019) found that there has been a radical increase in the development of sustainability-related games since 2010. However, of the 77 listed games (starting in 1990), only two are related to waste management: a board game that teaches about industrial waste management (Jürgen Strohm 2001) and a role-playing game that educates on irrigation management (Burton 1993). Both games provide a broad perspective on the topic but do not specifically teach about municipal waste sorting. As an extension to Stanitsas et al. (2019), we conducted an additional literature review looking for research studies with a focus on gameful design and waste sorting. In total, we found nine more research studies somewhat related to our topic (Table 1).
Eight of the nine studies did not prove entirely useful to our research efforts, as they either presented their gameful approach without actually evaluating the effectiveness of their design (Bifulco et al. 2011;Berengueres et al. 2013;González-Briones et al. 2018), evaluated their design qualitatively as a whole with a small number of users (Lotfi and Mohammed 2014;Sreelakshmi et al. 2015;Menon et al. 2017;Idrobo et al. 2018) or touched on a relevant but adjacent topic (Whalen et al. 2018). Except for one (Luo et al. 2018), these studies did not provide insights into the effect of single design choices on learning outcomes but instead looked at their gameful implementations as a whole.
The study closest to our setup was Luo et al. (2018). In their series of experiments, the authors assessed the effect of immediate feedback as a game design element on the learning of accurate recycling and composting. In their lab experiment, they asked 100 students to sort 80 pictures of different waste items into one of the four bins shown on screen. The learning condition received feedback on the correctness of their sorting, while the control condition did not. The learning outcome was tested after one week in the same game to assess long-term retention. Results showed a positive influence of immediate feedback on the learning outcome. Yet, they could not replicate this result in a reallife follow-up experiment. The authors hypothesized that the reason for this null effect could be related to the logistics of accurately measuring changes in real-life waste containers. Evaluating their design artifact -as presented in the manuscript -from a game design perspective, we believe that the lack of game design elements like worldbuilding or colorful aesthetics could also have contributed to this outcome. We came to this conclusion because usertests of the early iterations of our game indicated that feedback alone lacked incentives to continually engage with the content. We conclude our literature review with the insight that research on gameful design -particularly with regards to the analysis of game-specific design elements -in the domain of waste management and sustainability can benefit from further research in terms of expectable outcomes on learning. We will discuss this in the following section.

Hypothesis Development
We based the theoretical foundations of this work on learning theories, particularly on models introduced by instructional design and the didactic method (Wittwer and Renkl 2008;Nitsch et al. 2016). Most learning theories have been developed in and for contexts where social interaction is interwoven into the learning process (e.g., Chi et al. 2001). However, multimedia learning is often designed to work outside of social interaction contexts. We thus built on learning theories and strategies that have proven effective outside of a socially embedded learning context, namely elaborative encoding (for Hypothesis 1) and repetition (for Hypothesis 2). While these strategies afford an empirical learning process, rationalization (the process of sense-making, or understanding ''why'') is equally essential for providing meaning and context to learning matter (Wittwer and Renkl 2008). To offer explanations without overwhelming players, we implemented an optional, dictionary-inspired look-up element as a complementary game design element and elaborated on its supposed learning outcome in the development of Hypothesis 3.
Depending on context, the learning outcome can be measured with regard to different facets. In terms of memorization, the learning outcome often differs when tested immediately after the training phase rather than during a post-training transfer phase (Keith and Frese 2008). By measuring the learning outcome in terms of long-term retention, we wanted to ensure that the content was memorized in the long term to achieve successful change in real-life waste sorting behavior. Furthermore, in contexts like ours, where the training medium differed from the application context, it was particularly important to assess the learning outcome with regard to knowledge transfer (Barnett and Ceci 2002). In language transfer theory, knowledge transfer happens within the context of translation, where words and meanings are connected between two languages, typically native and second (Mahmood and Murad 2018). In the didactics of mathematics, this transfer is referred to as ''conversion,'' or the mental merging of different representations -such as graphs and functions -of the same mathematical concept (Dreher et al. 2014). Expanding on the concept of conversion, Nitsch et al. (2016) differentiate two phases of the transfer process: identification (comparison of the incorporated information and identification of similarity with existing schema) and construction (transferability of the incorporated information into new situations). Differentiating the two is important because construction measures whether the content has been understood deeply enough for reapplication in new contexts. Therefore, as later explained in the section on the operationalization of our outcome variable, we measured knowledge transfer with different measurements that capture both identification and construction.

Elaborative Encoding
The overall story and game design rationale of the game is based on elaboration strategies. Elaborative encoding belongs to the category of learning techniques known as mnemonics. In this learning strategy, loosely adjacent content items are added to the learning matter. Offering more associations that may connect to the learners' existing knowledge facilitates embedding new information into prevalent mental structures (Bradshaw and Anderson 1982). Examples of mnemonics involve meaning-enhancing additions: constructions or creations that improve one's memory of what is learnt (Levin 1988). Mnemonics can range from acronyms and rhymes to complex strategies for remembering numbers (Putnam 2015) and character designs (such as the mascot designs commonly found in Japan) 3 Elaborative encoding encompasses the purposeful addition of information, whether visual, semantic, spatial or acoustic, to create more retrieval paths in the mind of the learner from existing knowledge structures to the learning matter (Bradshaw and Anderson 1982). In his meta-analysis of elaboration studies, Mayer (1980) concluded that associative elaboration ''increases retention performance as compared with control or simple repetition procedures'' (p. 771). This technique is particularly valuable in the context of gameful design, as elaboration can occur on several layers at once: the game's mechanics (rules and systems), aesthetics (visual/auditive representation) and narrative form a multi-sensory context for knowledge transmission (Hunicke et al. 2004;Westera et al. 2008). This is particularly important in serious games like Re-Mission (Hopelab 2006) that translate a specific problem context into gameplay: adolescents are incentivized to take their cancer medication on a regular basis by transferring the game setting to their own bodies and providing them the medication as ammunition against destructive cancer cells. Re-Mission has proven highly successful in improving the health outcomes of its players (Kato et al. 2008). In sum, due to their multimodal elaborative encoding of real-life activities, principles and systems, games have been found to be an effective medium for teaching. Thus, we hypothesize that: H1: Learning waste sorting through a game rather than with state-of-the-art paper-based information on the correct sorting of items increases learning outcome.

Repetition Strategies
However, in terms of the core teaching effort -correctly sorting waste items -studies have shown that a single exposure to new content is not enough for learners to effectively encode that content into memory (Bygate 1996). Repetition is a learning activity in which students repeat individual facts to create firmly anchored connections in their long-term memory. Repetition has long been acknowledged as a powerful learning mechanism: as Horace stated more than two millennia ago, ''repetitio est mater studiorum,'' or repetition is the mother of learning. As a universal principle, it is part of all prevalent learning theories: behaviorism (e.g., in Pavlov (Dunsmoor et al. 2007) and Skinner (1936), cognitivism (e.g., in schema theory), constructivism (e.g., in Piaget (Greenfield and Savage-Rumbaugh 1993)) and social learning (e.g., Vygotsky (1967) on child development). The underlying theme relates to the formation of memory in the brain. In their research on working memory, Baddeley and Hitch (1974) stated that by repeatedly forming mental connections through reflection and deliberate recall, the stored information gets retrieved more easily and quickly. However, different studies have shown that repeated exposure to the same content does not necessarily lead to improved learning (Crowder and Melton 1965;Nickerson and Adams 1979). Memories are formed more precisely and hold for longer in long-term memory if learners are interested in the content and pay attention (Nickerson and Adams 1979). This is where games might have an additional advantage compared to learning content presented in a classroom setting.
In their study, Bygate (1996) found that repeating the trained content three days after the initial task led to improvement in fluency and accuracy as well as a marked improvement in repertoire due to growing familiarity with the content. The given reason is that on first contact with the material, learners are primarily concerned with the heuristic planning and understanding of the content matter (Bygate 1996). Ahmadian (2012) corroborated these findings, arguing that it is difficult for learners to focus on form and meaning at the same time. Thus, repetition allows them to gain understanding in both facets. Overall, studies on task-based language learning have reported repetitionbased improvements for output factors of accuracy, complexity, repertoire and task success (Lynch and Maclean 2000;Pinter 2005). According to Driskell et al. (1992), there are even benefits to repeating the content beyond perfect retention. In their study on overlearning, they found a significant overall effect: the greater the degree of overlearning, the greater the resulting long-term retention. By raising the number of occurrences in the brain, the significance of the information is enforced and so the content is retained longer. Repetition has been proven to be an effective learning strategy in learning tasks across domains (e.g., education (Johnson 2004), civic knowledge (Ivancic and Hesketh 2000) and games (Clark and Sefton 2001)). Building on the theoretical foundations of repetition, we hypothesize that: H2: Repetition as a game design element increases the learning outcome.

Instructional Design, On-Demand Help and Lookup Strategies
During the teaching process, one of the central functions of the teacher or tutor is to provide context and explanations to learners (Leinhardt and Steele 2005). In non-social contexts, this function must be substituted within the training medium. While there is no dedicated educational or psychological theory for this construct (providing relevant information/answering the ''why'' question), research on instructional explanations provides foundational insights. Empirical studies show that instructional explanations have often not been successful in terms of raising the learning outcome (Jonassen and Rohrer-Murphy 1999;Leinhardt and Steele 2005). One explanation was that learners merely engage in superficial processing of instructional explanations (Berthold and Renkl 2010) and do not attend to the content of the explanations in a meaningful manner (Roelle et al. 2014). However, the learning outcome was positively influenced through instructional explanations if learners rationally engaged with the content of the explanations (Wittwer and Renkl 2008) and if there was a meaningful follow-up activity after receiving the explanations (Webb and Farivar 1999). The way the game is designed, each interaction with the look-up element is followed by the sorting of a waste item. Thus, in our case, the instructions can be processed meaningfully.
In self-regulated learning contexts, studies have found help-seeking to be a successful strategy for learning (Ryan and Shin 2011;Webb et al. 2013) if help-seekers were oriented on independent problem-solving (Nelson- Le Gall 1981) and if the process included asking for explanations and hints (Mäkitalo-Siegl et al. 2011). In summary, if learners are invested in the learning process, giving explanations when needed raises the learning outcome. This connection produces positive indications for the success of our look-up element. However, most studies on help-seeking are embedded in a social context: the help is provided by another person. Thus, the expected effects might be weaker outside of a social context. On the other hand, the same studies found that the social context of help-seeking produced a different problem: those learners needing help the most (students with low self-efficacy) were less likely to seek it out, as they feared being perceived as lacking in ability and thus lose social standing (Ryan and Shin 2011). This negative effect could be neutralized in our case, as the game provides social anonymity within the look-up process, potentially resulting in lower inhibitions to use the look-up element and benefit from its content.
Interestingly, the IS literature on help tools (Clarebout and Elen (2009);Größler et al. (2000); Mäkitalo-Siegl et al. (2011)) did not confirm these positive expectations of optional help-seeking tools on learning outcomes. The most common reason provided was that tools were barely used (Aleven, Stahl, Schworm, Fischer and Wallace 2003;Größler et al. 2000;Liu and Reed 1994). The general unwillingness by the participants to accept help partly explained these findings, as has been found in various educational settings (Newman 2000;Ryan et al. 2001;Aleven et al. 2003). One explanation for such usage inhibitions was that the help function was sometimes perceived as cheating (Clarebout and Elen 2009). The factors found to influence how students behaved in open learning environments were the students' self-efficacy, motivation and perception of the task. If they felt the task was performance-oriented, they were less likely to use the help tools than when they perceived it as learning-oriented (Clarebout and Elen 2009). As our game is not only learning-oriented but related to a serious and meaningful task, we believe that such inhibitions regarding the look-up element might be alleviated.
Finally, while looking at the literature on cognitive psychology, we found a dichotomy of two error-related learning strategies: errorful and errorless learning. The former -also referred to as trial-and-error learning -is ''the process of making repeated trials or tests, improving the methods used in the light of errors made, until the right result is found'' (Webster's 2005). It builds on the repetition-based learning strategy that the repeat element of our artifact is based on. Interestingly, we found that this strategy was juxtaposed with an entirely opposite strategyerrorless learning -which is defined as ''an approach whereby the task is manipulated to eliminate/reduce errors. Tasks are executed in such a way that the subject is unlikely to make errors'' (Fillingham et al. 2003 p. 339). This was partially fitting for us, as the look-up element would allow players to play the game without error if they chose to use it before every decision. However, when comparing studies that used errorful vs. errorless teaching strategies, neither one was found to be more effectual (e.g., Clare et al. 1999) or the results were inconclusive (e.g., Johnson 2004) (see Table 8 in the appendix; available online via http://link.springer.com).
Looking at the volatile nature of instructional explanations and help/look-up tools, we believe that, in particular, the optional and anonymous nature of the look-up element as well as the fact that the game affords a meaningful follow-up activity (sorting the waste after the explanation has been provided) can alleviate some of the negative effects listed in the mentioned theories and studies. Furthermore, given citizens' almost daily interactions with waste, the look-up design element can add meaningful context to already existing knowledge structures. Finally, as our learning element only offers information when the learners reach an impasse and are actively inquiring for context and a solution (as recommended by instructional design theory; Wittwer and Renkl 2008), we hypothesize that: H3: A look-up game design element increases the learning outcome.

General Design Decisions
The downloadable app is a complete and complex game that was released in 2015 for the three mobile platforms Android, iOS and Windows (''Die Müll AG''/ ''Trashmonsters''). While playable on a PC, we designed the game with touch interaction in mind, focusing on mobile devices. We embedded the core gameplay into a small and interconnected world that represents the broader cosmos of waste sorting. The full game features an overarching story narrated through a consecutive quest structure. We aimed to motivate prolonged play through an interplay of unlockable minigames, collectible accessories and an underlying discoverable mystery (see Fig. 1). We added these elements for players to alternate the core gameplay with additional activities connected to the general theme of waste sorting. We made each design decision with metaphorical mapping in mind. The rationale for these design choices (and the decision against other popular game design elements like badges and leaderboards) can be found in the appendix titled ''Exclusion of Game Design Elements.''

Player Character
According to Gee (2003), effective learning involves ''playing a character.'' For example, learning in a science class works best if students ''think, act and value like scientists.'' This assumption is supported by the findings of a psychological study where participants who were given a virtual body (avatar) communicated as Einstein (signifying super-intelligence), performed significantly better than participants of the control group, considering prior cognitive ability (Banakou et al. 2018). Such studies highlight the weight of design choices concerned with the role players take within the game. For our game, we chose a first-person perspective (the players act in the game as themselves) to keep the attribution of all in-game actions and successes as close to the players as possible to facilitate and suggest reproduction of their in-game actions in real life. Research suggests that players learn best when they are engaged in meaningful, goal-directed activities within the identities of experts (Gee 2003;Shaffer et al. 2005). As such, the role of players is to serve as new and essential members of the workforce, helping the monsters in their struggle to deal with the overwhelming amounts of waste they receive for sorting.

Depiction of Knowledge Items
When deciding on the presentation of knowledge items for the game, we consulted literature on the mental representation of knowledge. During the learning process, different types of memory connections are formed (e.g., typical connections in mathematical didactics are numeric, graphic, situational and algebraic (Nitsch et al. 2016)). Two of the most common items are words (designated representation) and pictures (iconic representation) (Kolers and Brison 1984). According to Mayer's theory of multimedia learning (2002), active learning entails the coordinated stimulation of both channels of the human information processing system (visual/pictorial and auditory/verbal processing). For our game, we chose to depict our knowledge items (waste items) with a combination of iconic and designated memory connection items through sticker-like pictures and by displaying the name of the waste item when picked up (see Fig. 2). We selected the waste items used for the experiment from a list of the Karlsruhe waste sorting facilities based on the following criteria: (1) relevance (loss of precious resources if sorted incorrectly), (2) frequency of appearance in common households and (3) difficulty (frequency of missorting in real life).

Core Gameplay
Establishing a fun core gameplay is of great importance before proceeding to the design and implementation of any other design elements (Järvinen 2007;Sicart 2008). We tested the core gameplay extensively with over 20 playtesters. The game went through several iterations before the parameters were finally set. The tests were conducted in the manner of the quiet observer, as is common in user experience testing, with a follow-up session to discuss the highlights and flaws and make suggestions for the gameplay mechanism.

Setting
We set the core gameplay within a waste sorting facility. Four waste bins (paper, recycling, bio and residual wastereflecting the system in Karlsruhe are placed next to each other behind a conveyor belt. A monster that serves as a visual and charismatic representation of the subsequent process of received waste inhabits each bin, as shown in Fig. 2. For instance, residual waste is represented as a firebreathing dragon, indicating the subsequent burning of residual waste. We chose a friendly and cartoon-like visual style with a bright color scheme to overcome potential negative associations with the topic of handling waste.

Core Mechanics
As soon as the wave -consisting of 15 waste items -starts, waste items drop onto a conveyor belt that moves them from the right to the left side of the screen, where they then drop off. During this time, players need to pick up each item and sort it into the right bin. If an item drops off, it counts as unsorted and raises the counter of the waste pollution bar, leading to a littering-based Game Over. If it is sorted incorrectly, it is counted towards an air-pollutionbased Game Over. The game flow is supposed to represent the ongoing succession of choices we must make with each waste object we encounter in our daily lives as well as the consequences that come with the wrong or negligent dealing of waste.

Feedback System
Feedback should be immediate and comprehensible in terms of the failure or success of the given task (Sicart 2008), with rewards and advancement in the game carefully bound to it (Bellotti et al. 2013), which is an established rule in games. Thus, we implemented a positive/negative reinforcement system: points (? 10/-3) for right vs wrong sorting of an item, visual/audio feedback of the monsters (joy/anger), combos (? 50 points for a correct three-item streak) and combo-breakers (disruptions of the combo counter upon missorting within a streak). A numeric score and a pollution-counter (top left-hand corner in Fig. 2) provide feedback on the overall performance, warning players of an impending Game Over. This counter fills up each time an item is placed in an incorrect bin or drops off the lane and is reduced when an item is placed in the correct waste bin. An appropriate chunking of tasks helps provide a flow experience (Csikszentmihalyi et al. 2005). Inspired by the successful two-minute format of game applications like Angry Birds (Rovio Entertainment 2009), we chunked the learning content into waves that do not exceed playtimes of two minutes so as to encourage shorter but more frequent playtimes. Following advice by Wolfe et al. (1998), we implemented a structure blending the previously learnt items with newly introduced ones.

Tutorial
As is common practice within games (Gee 2003), the first three waves serve as tutorials and differ from regular gameplay. In the first wave, we present the main types of waste (recyclable, bio-degradable, paper and refuse) with an explanation of the underlying attributes with which players can infer the correct bin for each waste item (e.g., inextricably compounded materials go to residual waste). In the second wave, players are supposed to familiarize themselves with the core gameplay through representative waste items for each type. In the third wave, we introduced additional design elements that accompany the core gameplay: the look-up element and the pollution counter, which indicates how close players are to a potential Game Over. In the experiment, we introduced only the pollution counter but not the look-up element to the groups without the look-up element.

Experiment Version
For the purposes of the experiment, we compiled an abridged version of the game that only included design elements specifically designed to teach correct waste sorting: the core gameplay -including the tutorial -as well as the two additional learning enhancing design elementsthe repeat option and the look-up feature. We shortened the core gameplay from 34 levels to 10 and from 201 waste items to 108 (eight were used as exemplary items in the tutorial and the remaining 100 were distributed over the 10 waves, introducing 10 new items and reusing five previously seen ones per wave). To avoid confounding Fig. 2 Metaphorical representation (mapping) of the waste sorting process in the core gameplay of the artifact influences, we stripped the experimental version of all design elements that related to motivation enhancement (narrative elements and unlockable features). We wanted to ensure an isolated observation of the effectiveness of the core gameplay in producing a learning outcome. We kept the underlying worldbuilding and setting (monster design and waste sorting plant (see Fig. 1)) as they are integral to the game feel.

Repetition-based Design Element
If a level is not completed perfectly, the game shows players how many items they sorted incorrectly and offers them the chance to repeat the level without penalty (see Fig. 3). We strategically placed and colored the ''yes'' and ''no'' buttons to favor repetition. If players choose to repeat, their level of pollution is reset to the level when that wave was played for the first time. We were inspired by the quick trial, immediate performance feedback and low inhibition retrial-loop pattern of games like Cut the Rope (ZeptoLab 2010) and Angry Birds (Rovio Entertainment 2009).

Look-Up-based Design Element
In his article, Gee (2003) elaborated on the placement of information: that it should be given ''on demand'' and applied soon after having read it. He based this on people's poor understanding and retention of information received out of context (Brown et al. 1989;Barsalou 1999;Glenberg and Robertson 1999). The look-up element (see Fig. 4) is an index that can be used to find all previously encountered waste items. For each item, it shows the correct target bin, as well as additional information on why the item belongs there and not in another bin. It can be accessed at any point throughout the game by simply opening it or by pulling an item on top of it (it then scrolls directly to that item). It is introduced in the tutorial and its usage penalty-free. For the mechanics of this look-up design element, two game design elements that serve to offer additional information to the players inspired us. First, we drew insights from the interactive ''hint'' functions found in puzzle games and point-and-click adventures like Machinarium (Amanita Design 2009). These hints are designed to reduce frustration by guiding the players with incremental tips. They are optional, so players decide for themselves when and if they want to use them. The second inspirational game design element is the poke´dex used in the Pokémon (Game Freak, 1996) game series: a lexicon-based design element that gradually lists all monsters and their related meta-data that players encounter during the game.

Experimental Design and Independent Variables
We designed the laboratory experiment to test the effect of the game in general as well as two independent variables (look-up and repeat) on the learning outcome. We designed a between-subject experiment in three stages where the 10-12-day duration between Phases 2 and 3 served as the retention period. We designed the experiment with four treatments in a full-factorial design with an additional fifth control group (from now on referred to as non-game material) that was given exemplary teaching material as used by waste management institutions. The used nongame teaching material consisted of the three informative flyers conventionally provided by the city of Karlsruhe to teach citizens correct waste sorting. The first flyer informed on the general categories of waste that go into each of the four bins, the second served to differentiate the general waste categories in combination with the underlying rules of what waste belongs where (see Figs. 11 and 12 in the appendix ''Non-Game Materials'') and the third listed exemplary waste items for each bin (see Fig. 5 (excerpt) and Fig. 10 in the appendix ''Non-Game Materials''). An overview of the treatments' structure is provided in Table 2.

Experimental procedure
We recruited participants from a large German university using the organizing and recruiting software hroot (Bock, Nicklisch, Baetge 2012). Potential participants in the experiment had to meet three requirements to participate: they needed to own a smartphone with an Android-based operating system running on a version higher than 2.3.1 (Gingerbread), be willing to download and install the application on their phone and be fluent in German.
We conducted the experiment in three stages (see Fig. 6): the preparation phase (P1), the training phase including a subsequent passive retention phase (P2) and the testing phase (P3). Participants completed the first two phases remotely. In phase (P2) we instructed the participants on the four game-based treatments to play the game through to the end and then complete the survey. In contrast, we told the control group with the non-interactive materials to attentively read through the teaching materials provided through the link for 25 min (this time was derived from the average playtime of the experimental version of the game during the pre-tests) and to then complete the survey. We conducted the testing phase (P3) in the laboratory to ensure proper supervision of the tests (a detailed description of each phase can be found in the appendix). Participants received a flat payment of €15 for their time.

Operationalization of the Dependent Variable: the Learning Outcome
We measured the learning outcome with special regard to two factors: long-term retention and knowledge transfer. According to cognitive theory, long-term retention can be tested as soon as two or three days past the training period (Schmidt and Bjork 1992). For our study, we chose an extended retention phase of 10-12 days to ensure success of the transfer to long-term memory (see also Luo et al. (2018); Parkin and Streete (1988)). In their work on training evaluation, Kraiger et al. (1993) highlighted the importance of conceptually sound measures of learning that ensure training effectiveness with regard to knowledge transfer. We tested knowledge transfer in three ways: first by testing identification (Nitsch et al. 2016) of knowledge by evaluating if players can reproduce the learned content within the training medium. For this, we used a special version of the game (game measure) featuring one wave where all 108 trained items appear oneby-one from the right side of the screen and then have to be sorted into the correct bin before they drop off on the left side (see Table 3). We then tested knowledge transfer via a multiple-choice-based test measure as a power test (number of items answered correctly in an unlimited amount of time (Kraiger et al. 1993). We chose this testing measure because multiple-choice tests are considered best suited for measuring the retention of declarative knowledge (Gagne 1984;Bellotti et al. 2013). In this measure, participants were given the names of the waste items but not images like in the game measure. By offering only one of the two memory connection items, we could differentiate the effectiveness of the representational elements (pictures vs text) (Mayer 2002). Participants were asked to assign the right bin for each of the 108 trained items (the options were residual, recycle, biodegradable and paper waste and separate recycling) (see Table 3).
Finally, we measured knowledge transfer to the final application domain: real-life waste sorting. This measure relates to the construction item introduced by Nitsch et al. (2006), where knowledge is retained and understood in a way so that it can be reapplied to a different context. In this third measure, participants had to sort a selection of real- Seven representative waste items were chosen for the reallife sorting according to the participants' performances measured in Phase 2 of the experiment: one from the top five items of best average sorting performance (aluminum), two from the average of their sorting performance (adhesive tape and milk cartons), and four that belonged to the group of the 20 worst-performing items (CDs, thermal paper, empty ring binder and wood shavings).
To increase the comparability of the three measures in consideration of the different number of items, we decided to use percentages of correctly sorted items. Thus, for each person and measure, we divided by the number of items sorted. For example, a measure of 85.71% for the real-life sorting performance meant that the participant sorted five out of the seven items correctly (see Table 3). Control group: Non-game materials This group received non-interactive learning materials as currently provided by the municipal waste department of Karlsruhe, which consisted of two flyers introducing the general waste assignments and an exemplary list of the items with their correct bins (see Fig. 5) Game group: Core gameplay This group was given an instantiation where only the core gameplay was implemented (see Fig. 2) Game group: Repeat element On top of the core gameplay, at the end of each wave, the players of this group were given the option to repeat the wave without penalty (see Fig. 3) Game group: Look-up element On top of the core gameplay, the players of this group were introduced to and had permanent access to the look-up element, giving them the option to look up the correct bin for any waste item they encountered (see Fig. 4) Game group: Combined repeat and look-up element On top of the core gameplay, the players of this group could access the look-up element at any time and after each wave, they were given the option to repeat without penalty

Control Variables
Apart from controlling for demographic factors (age, gender, how long the participants had been living in Germany and the city in which the experiment was conducted), we controlled for the following: Gaming motivation.
Since gamified systems were previously perceived as less serious than traditional teaching content (Brigham 2015;Hanus and Fox 2015), the acceptance of the medium might influence the willingness to learn. We thus measured user attitude towards the medium in general through self-reporting (the full implementations can be found in Table 9 in the appendix). General waste sorting motivation. Since the personal attitude to the topic plays a role in the learning outcome (Garris et al. 2002), we also measured the general attitude towards waste sorting at home through two questions. System usability. The usability of the respective information system plays an equally important role as poor user experience can lead to frustration and thus have a negative impact on user interaction (Bangor et al. 2008). We decided to assess user satisfaction with Brooke's (1996) system usability scale (SUS). This decision was based on its widespread usage in IS for such purposes and to allow for comparability between our artifact and similar studies (Bangor et al. 2008).

Results
The first stage of the experiment was completed by 266 participants. Thirty-one participants did not complete all three stages of the experiment (17 participants did not start or finish Phase 2 and 14 more did not show up to Phase 3 in the lab). Of the 235 remaining participants, we had to exclude 14 further datasets because of transmission errors (e.g., the game data of the second or third stage of the experiment was missing) and one for failing a crucial control question. Finally, of the remaining 220 data sets, there was a minor data transmission error for 23 participants: not all single item sorts for the in-game performance had been transmitted completely. We decided to exclude the datasets where more than 30% of the item sorts were missing (five out of these 23). This decision was backed by a Kruskall-Wallis test that indicated that the performance of the 18 participants with more than 70% but less than 100% correctly transferred item sorts did not differ significantly from the participants with complete sets of item sorts. We thus decided to include them, leaving us with a total of 215 complete datasets. The average age of the participants was 22.72 years old (one person reported the age of 3, which we set as a missing value because this was either a typo or intentionally misreported), and the gender distribution was 66.05% of participants identifying as male vs 33.49% as female vs one person (0.47%) indicating ''other.'' Table 3 shows the descriptive statistics of the dependent measures for all treatments. For example, in the treatment with non-game materials, participants correctly sorted on average 70.8% of the items in the in-game performance measure, 59% in the multiple-choice test and 70.3% in the real-life sorting task. The pattern of having the lowest performance when measuring with the multiplechoice test compared to the other two learning outcome For all statistical tests, we computed ordinary least square (OLS) regressions with the three continuous performance measures ranging between 0 (0% correctly sorted) and 1 (100% sorted correctly) as dependent variables. All our hypotheses were directed and therefore a test was significant if p of the two-tailed tests in the presented tables of the statistical tests was below 10%. Robust standard errors were used in all regressions to account for heteroscedasticity, based on the Breusch-Pagan test (Cohen et al. 2014). Furthermore, we bootstrapped the results with a sample size of 5,000 to account for non-normality of residuals (Tibshirani and Efron 1993;Pek et al. 2018).
To compute Hypothesis 1, we had to pool the treatments core gameplay, repeat element, look-up element and combined group into one group because Hypothesis 1 compared the games' performance with the non-game materials. In contrast to the other two hypotheses, it did not focus on the effect of specific game design elements and their related individual treatments. We thus computed a binary variable ''Game'' that took the value 1 for all observations trained through the game (the pooled group) and the value 0 for the observations in the non-game material treatment. This binary variable was our only independent variable in this main analysis for Hypothesis 1. Table 4 shows the results of the three regressions of this binary variable on each of the three learning outcome measures. We found significant effects on all measures, which supported Hypothesis 1. When tested with the ingame performance measure, the game treatments were estimated to correctly sort 4.1% more items than non-game treatments. For the multiple-choice test, the effects were even larger: the game treatments were estimated to correctly sort 8.4% more items than non-game treatments. Finally, for the real-life sorting measure, the estimate was 6.9%. To sum up, we could fully support Hypothesis 1 for all three performance measures. The effect for in-game performance was surprisingly the weakest, although this was the measurement for which the medium (the digital game) of training and testing was the same.
In contrast to the analysis of Hypothesis 1, Hypotheses 2 and 3 focused on the effect of the examined design elements. Thus, we did not pool all game treatments but rather compared all five treatments with each other. We coded each treatment with a binary variable that took the value 1 if the observation belonged to the respective treatment. The reference category was the non-game material treatment which meant that all coefficients must be compared to the performance in the non-game material treatment. Table 5 illustrates the results for Hypotheses 2 and 3. When comparing the in-game performance of the treatments with the non-game material treatment, we found a significantly increased learning outcome for the look-up element treatment (estimated increase of 4% of correct item sorts) and the combined one (8%). An additional Wald-test showed that the effect of the combined treatment was larger than that of the look-up treatment (p = 0.04). However, the effect for the look-up element was not significantly larger than for the repeat element treatment (again tested with Wald test, p = 0.56). Thus, the look-up treatment performed better than the non-game material treatment but not better than the treatment with only repetition. In sum, for the in-game performance measure, Hypothesis 3 was fully supported: we found better performance for both the groups that only had the look-up element by itself or the look-up element combined with the repetition element. Hypothesis 2 was only partially supported: we did not find a stronger performance when only playing with the repetition element. Hypothesis 2 was only supported if repetition was combined with the look-up element. For the multiple-choice test, we found even stronger results and could fully support Hypotheses 2 and 3: all four treatments trained through the game performed better in the multiple-choice test than the treatment trained with the non-game materials. The largest effect was measured for the combined treatment, where on average, participants sorted 11.9% more items correctly than the participants in the control treatment without game materials. For the real-life sorting task, we interestingly found weaker effects for the combined treatment. In detail, we  found that only those treatments that had either one design element or neither of those two elements (the core game), performed significantly better than the treatment that did not play the game. Yet, the coefficients also showed that the effects for all four game treatments were rather similar, ranging between 6.3% for the combined treatment to 7.3% for the repeat element treatment. Thus, when conducting further Wald-tests comparing the coefficients of the game treatments with one another, one cannot claim that one game group performed better than another (all p [ 0.8).
Thus, all in all, we could support Hypotheses 2 and 3 and found that all game treatments did comparably well. For game or gamification designers, it is interesting to compare the effects of game design elements not only to the non-game material group, but also to the core gameplay group to gain a better understanding of which design elements to include in their gameful applications. Therefore, we want to further focus in detail on the comparison of the different game treatments to the core gameplay group in Table 6.
We found that with the in-game performance measure and the multiple-choice test, the combined treatment achieved a significantly higher learning outcome than the treatment with only the core gameplay available during training, with an increase of 6.1% correctly sorted items with the in-game performance measure and 6.4% with the multiple-choice test. When comparing the groups within the real-life sorting measure, a significantly different learning outcome cannot be discerned. This is a result already highlighted in the analysis above: the game-treatments performed comparably well in the real-life sorting task. Thus, for real-life performance, the overall effect of the game itself was much stronger than that of adding the single design elements to the core gameplay.
To further assess the robustness of our results, we also computed robust OLS regressions with these control variables: age, gender, how long they lived in Germany (''Living in Germany''), how long they lived in the city the experiment was conducted in (''Living in XX city''), their gaming motivation, their general waste sorting motivation and the SUS (for details, see Tables 12, 13, 14 in the appendix). The results were robust regarding the inclusion of these control variables. However, there was one slight change: for Hypothesis 3, the effect on the repeat treatment became significant. Thus, for the statistics with control variables, we could now fully support Hypothesis 3. Regarding the significance of the control variables, we found that the longer participants lived in Germany, the better they performed in-game and in the multiple-choice test. This control variable can be seen as a proxy for prior knowledge about the participants' waste sorting. Furthermore, the general waste sorting motivation value showed a tendency to positively affect the performance measures for all three measures (p ranges between 0.01 and 0.11). The SUS value of the game also had the tendency to positively influence the game performance (p = 0.064 for all three hypotheses).

Discussion and Conclusions
In terms of our first and overarching research question, we found that the learning outcome for the groups given the game for training was significantly stronger than for the group given state-of-the-art paper-based information during the training phase. This held true across all three measures. Interestingly, the effect was weakest within the in-game performance measure (with 4.1% more items correctly sorted than by the non-game material group) and strongest in the multiple-choice test (with 8.4% more items correctly sorted). This outcome contrasts with the literature on context reinstatement, which suggests that information encoded in one mindset is more successfully retrieved in the same mindset (Fisher and Kraig 1988). This interesting finding was also apparent in the non-game material treatment, where performance in the in-game measure was significantly higher than in the multiple-choice test (Wilcoxon signed-rank test with z = 5.16; p \ 0.01) although the games' aesthetic and interaction were new to the nongame material treatment.
To gain further insight in this matter, we were interested in whether all participants generally performed better in one measure or the other. We found that performances, when measured in the game (Wilcoxon signed-rank test with z = 12.00; p \ 0.01) and in real life (z = 7.85; p \ 0.01), were significantly higher than when measured with the multiple-choice test. We also found that the gametrained group performed comparably well in the game and in real life (z = 1.95; p = 0.06) (for the descriptives, see Table 7). A potential explanation for this finding can be linked to the forming of memory connections: the multiplechoice test offered fewer memory connection items (offering only designated connections: words) than the game measure, which presented both iconic and designated connections (words and sticker-like icons) and the real-life measure, which provides real objects. Both the game and real-life objects offered more information items that could connect to existing schemata. This might have helped stimulate memories not activated by the fewer connections offered in the multiple-choice test. This finding is congruent with studies on word and picture learning (Kolers and Brison 1984;Mayer 2002) that found that learners performed better through a combination of words and pictures/objects than with words alone. Similarly, in the domain of mathematical didactics, studies have found that using more mathematical representations (like graphs, numbers, formulas) leads to an increased learning outcome (Ainsworth 2006). Our results showed that this effect works in both directions: learners retrieved formed memories more successfully if we offered more memory connections with their mental schemata.
In terms of Hypothesis 2 -adding repetition as a game design element increases the learning outcome -our results confirmed our conjectures. The group given the additional option to repeat waves showed a significantly higher learning outcome than the non-game material group in two of the three measures (multiple-choice and real-life). This also held true for the in-game measure when inserting control variables. However, when compared with the core gameplay group, the implementation of a repeat option by itself did not increase the learning outcome significantly. The game design elements enhanced learning potential; however, this manifested within the success of the combined design elements. This suggests that the repeat elements inherently lacking in fun can be compensated for better results. This is underpinned by a study by Kim and Shute (2015), who found that changes in just one design element ''significantly impacted players' interactions with the game by changing players' mental 'operational rules' during play' ' (p.351). While the use of the repeat element was optional, it was generally well-received, as 63.95% of players who had the repeat element available used it at least once (mean: 3.88, min: 0, max: 24).
For Hypothesis 3 -the increase of the learning outcome through a look-up design element -our results showed that the group given this design element performed significantly better than the non-game material group in all three measures. In terms of usage, the players received it even better than the repeat element, as 67.86% of players who had the look-up element available used it at least once (mean: 14.42, min: 0, max: 85). These are relevant findings given that we found contradictory indicators on the potential outcome in our analysis of related literature (e.g., studies on help tools reported low usage as well as low effects Liu and Reed 1994;Größler et al. 2000;Aleven et al. 2003)). When compared to the core gameplay group, the prevalence of the look-up element by itself did not significantly enhance the learning outcome of the game. However, as mentioned above, in combination with the repeat element, this design element created a significantly stronger effect in the in-game and multiple-choice measures. This showed that look-up features should be considered as important design elements in learning-oriented gameful applications.
Literature on error management training (Chillarege et al. 2003;Keith and Frese 2008) provides a potential explanation for the success of the combination of these two design elements. In contrast to errorful and errorless learning, this method (EMT) consists of helping trainees understand why errors occur, indicating how they can be avoided (as afforded by the look-up element) and then applying that knowledge to solve the problem (as afforded by the repeat element). This offers positive indications that if both affordances are implemented at the same time, they could lead to an especially successful learning outcome. This can be further consolidated within the theory of learning styles. In a study conducted by Liu and Reed (1994), which considered affordance combinations in a hypermedia environment, learning was accomplished by offering a diverse set of tools and aides to groups of students with different learning styles. This suggests that offering different optional affordances benefits a diverse group of learners and leads to a stronger overall learning outcome. The combined effect could further assist in preventing the perception of cheating that could come with a help or hint-related design element (Clarebout and Elen 2009), as it allows players to test their own abilities in the first iteration of a wave before resorting to looking up the correct solution in the repeated wave.
In summary, the results showed that the core gameplay by itself already performed very well in comparison with the non-game materials. However, for the overall game to be more effective, it can be enhanced successfully by the two design elements that we suggested. Particularly, their combination showed their potential as building blocks for successful learning strategies by combining the mnemonic effect of repetition with easily accessible means for understanding.
When analyzing the control variables, we found that the number of years our participants had been living in Germany positively influenced their learning outcome in terms of the in-game and multiple-choice measures. This connection was expected since this particular control variable was implemented to passively enquire about prior waste sorting knowledge (to prevent priming, we decided against a full pre-measure of waste sorting knowledge -see Limitations Section). General waste sorting motivation also proved to have a significant influence over the learning outcome of the in-game measure alone. However, it is difficult to make sense of the fact that this effect was not replicated in the other measures -especially the real-life waste sorting measure. There could be influences in terms of cognitive dissonance of self-belief and self-actualization, but because of the setup of the experiment, we could not derive any personality-based indicators.

Contribution
The central goal of our research is to contribute to the rise of sustainable behavior through gameful design, specifically with regard to waste management. This goal stands in line with point 12.8 of the UN catalogue of sustainable goals: ''Ensure that people everywhere have the relevant information and awareness for sustainable development and lifestyles in harmony with nature'' (United Nations 2020). Our study showed that gameful design can successfully contribute to better municipal waste sorting, even with regard to a transfer of knowledge to real life. To the best of our knowledge, this is the first study to do so.
We further contributed to the ongoing efforts of investigating the potential of serious games and the implementation of gameful design as powerful teaching devices. In particular, the study showed significant positive learning outcomes within a domain that generally lacks incentives relating to direct personal interest -such as health or fitness-oriented games would offer -and that is hampered by disinterest or even disgust by their target group regarding the general topic. By successfully translating this into more desirable content matter, our research highlighted the benefits of gameful design for teaching under adverse conditions. In terms of theoretical contribution, by conducting a full assessment of design choices with regard to their different learning outcomes, our research added to the ongoing general efforts of methodically assessing learning through gameplay. In this, our study lined up with a growing amount of research dissipating still-existent doubts about the usefulness of game-based learning (Shute et al. 2009).
A factor that contributes to such doubts is that not all studies in gameful learning test the success of their artifact in connection with its transition to real-life knowledge and applicability (e.g., Kim and Shute 2015). This measure, however, is very important, as seen in, for instance, Größler et al. (2000), who found in their study on gamification of business simulators that ''participants were not capable of accessing the knowledge gained outside the gaming context'' (p. 271). Another example is Ball et al. (2002), who concluded that cognitive training may only improve skills that are specific to the trained cognitive domain. Also, Luo et al. (2018) conducted a study with a similar premise and goal to ours and did not manage to reinstate the learning outcome when measured in real life. In contrast, in our study, we found that our game did overall manage to overcome this hurdle. Despite this success, the difficulty of constructing knowledge could be seen in the differences in learning outcomes between the different testing media. Our study highlighted the importance of measuring in the training medium as well as the true context medium (real life) and proving that the transfer is manageable given good design choices (Van Eck et al. 2017).
We also identified a gap in the IS literature on the effectiveness of look-up/help-based design elements and added to the ongoing discussion by conducting an experimental setup that tested this element in an isolated and a combined treatment. Our results showed that affording an optional, learner-moderated look-up element can be a very promising learning-enhancing design element, especially if added to a repetition-based teaching setup. By intricately testing these specific game mechanics, we contributed to understanding how they function to produce meaningful learning experiences, which is a paradigm suggested by the Games, Learning, and Society initiative (Squire 2007). With regard to the general topic of sustainability in IS, our study was one of few to focus on challenges surrounding the domain of waste management. We hope to inspire further studies in this seminal area of research.
In terms of its practical contribution, we believe that if implemented into the teaching curriculum of sustainability classes, our artifact can have a beneficial impact on the topic of correct waste sorting. Our research aims to support the process of research informing practice and aide designers in optimizing their design decisions, as they have to make efficient decisions under time pressure (Stacey and Nandhakumar 2009). Furthermore, as stated by Clow (2013), educators need to be given insights about additional tools, as well as their strengths and limitations, which we provide in this manuscript. By affording detailed insights into the rationales behind the design decisions that went into the creation of our game and the design elements used, we facilitated easy means of reproduction for practitioners and researchers. While the mechanisms we looked at are embedded in the framework of a game, any learning or training context can serve as the foundation for the design mechanisms we analyzed in our study (Deterding 2016). Thus, we argued that in a playful setting that allows a certain degree of make-believe, a broad variety of teaching tasks (e.g., vocabulary, geography training, digital management training and onboarding) can benefit from applying the findings of our study.

Limitations and Future Work
One potential limitation concerns the fact that we omitted assessment of prior knowledge on waste sorting. Due to the three-phase setup of the experiment, we consciously decided against this assessment because of concerns about priming the participants and thus skewing the results. While it is common in the assessment of serious games to use pre-and post-testing, ''the main problem with the pre-and post-test experimental design is that it is impossible to determine whether the act of pre-testing has influenced any of the results.'' (Bellotti et al. 2013 p.3). By conducting a prior assessment like completing a survey-based multiple-choice test, we were concerned that participants would influence the actual results by looking up certain items they were unsure of before the first task. Instead, we measured ''living in Germany'' as a proxy indicator for prior knowledge, which turned out to have a significant influence on the learning outcome. Because we conducted an experiment by randomly assigning participants to treatments, we trust the internal validity of our results. Thus, the effect should be independent of confounders such as prior knowledge.
We believe the exclusion of prior knowledge as a predictor in our models, as well as the omission of measuring other variables that might influence waste sorting knowledge -e.g., participants' exposure to the topic in school or other contexts or their families' attitudes towards sustainability and eco-friendliness -are the main reasons for the rather low R 2 of our main models that included only the treatment variables. However, a low R 2 is not unusual for experimental research and does not harm the interpretation of the effect of the treatment variables. Our further analyses in the appendix also show that the inclusion of the control variables -e.g., the number of years that the participants have lived in German and general waste sorting motivation -helps reduce the unexplained variance substantially, yielding R 2 values around 0.2.
We further see that there could be an underlying cultural bias given the generally high range of results. Frese et al. (1991 p. 90) noted that errors may be perceived as especially stressful in German culture, ''where perfectionism is highly valued.'' For transferability to other cultures with different prior mentalities regarding correct waste sorting, future studies will be necessary to assess mentality as a moderating factor. Another important point is that we assessed the real-life measure with only seven items. While this arguably weakens comparability with the other two measures, practical concerns in terms of implementing a much larger number of items (limited setup and timeframe, participants' resistance to interacting with certain items) limited our options for this measure. It should also be noted that within the non-game materials used during the training phase, one of the three flyers (the one showing examples of waste items for each bin in Fig. 10 in the appendix) featured more items than were presented to the game groups during the experimental task. When we designed the experiment, we wanted to approximate a real-life scenario and thus chose to use an unabridged set of standard materials provided by the local waste management (see in appendix ''Non-Game Materials''). In hindsight, the overall experimental design would have been cleaner if we had reworked the flyer to feature the exact number of items that were trained in the game. However, it is important to note that the goal of the game was not to teach the specific relationship of each featured waste item to their bin but to help players understand and train them on the rules of the underlying waste systems. As all objects will eventually turn into a waste item, citizens need to learn how to correctly sort any object they encounter to the respective system by understanding and internalizing the underlying principles.
We wanted to test the learning outcome in a rigorous and controlled manner to obtain clear and interpretable results, so we decided to conduct a laboratory experiment to provide high internal validity. However, as our findings were based on an experimental setting including mostly students, any gained insights were only applicable for the tested age group (17-41). A future step would be to test whether the effects found are replicable in the field. Another facet of this relates to knowledge transfer. Even though we found that knowledge transfer to real life (construction) was successfully achieved in the game, we believe that this effect might be enhanced by transferring the game to a virtual/augmented reality environment by bringing the medium of training closer to the actual application context.
Finally, while we chose to separate learning from motivation to isolate our findings, this approach might have omitted an important influence on learning. On this basis and because of our overall goal to teach correct waste sorting and to boost the motivation to act upon that knowledge, we want to design and conduct another motivation-focused experiment to build on our findings and enhance gameful design-based learning even further.
Acknowledgements Many thanks to Raphael Martin, Brice Clocher and Anke Greif-Winzrieth for their practical and invaluable contributions and the reviewer team for their insightful feedback and constructive criticism.
Funding Open Access funding enabled and organized by Projekt DEAL.

Conflict of interest The authors declared that there is no conflict of interest
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.