Introduction

Over the past three decades, video games have gained increasing attention in the field of education, despite being viewed as a major distraction from more traditional educational activities such as book-reading. Educationalists began to question why video games are so engaging and how their allure can be harnessed to support modern teaching (Ibrahim & Jaafar, 2011; Kirriemuir & McFarlane, 2004). Consequently, the concept of gamification gained widespread development, which does not mean using specific video games in teaching; instead, it refers to using game design elements in non-game contexts to increase user experience and engagement, where fully-developed games are usually not present (Çeker & Özdaml, 2017; Deterding et al., 2011; DomíNguez et al., 2013; Kapp, 2012). Naturally, educational gamification, or gamified learning, refers to the use of gamification techniques in educational contexts for improved engagement, learner experience, or academic outputs.

To distinguish from video games, gamified learning tools have been highlighted as the educational website/system/software/application that use game design elements for improved engagement, learner experience, or academic outputs (Luo, 2021, 2022; Luo et al., 2021). Accordingly, gamified EFL tools refer to the website/system/software/application that use game design elements to facilitate the teaching and learning or English-as-a-foreign-language (see Table 1). Since there is a scarce of literature focusing on the EFL context, we used the two terms interchangeably (gamified learning tools and gamified EFL tools).

Table 1 The definition of related terms

Despite gathering academic interests, the actual implementation of gamification in school contexts remains limited. Researchers have highlighted the lack of focus on teachers’ acceptance of pedagogical innovations, despite their critical role in selecting, implementing, and evaluating innovations during the process (Huizenga et al., 2017; Martí-Parreño et al., 2016; McFarland, 2017; Sugar et al., 2004). Cuban (1986) emphasized the importance of considering teachers’ role in accepting new technologies, as top-down attempts to introduce innovative technologies in education have often failed to achieve long-term effects due to neglecting teachers’ involvement in the process. Beside the limited number of related studies, the extant research on teachers’ acceptance intention towards gamified EFL tools is still at the exploratory stage, involving studies with small sample sizes or un-validated surveys (Baydas & Cicek, 2019).

The current study is of significance because it addresses the aforementioned gaps as well as double-validating the scale by interviewing experts and the intended sample of secondary school teachers. Additionally, the survey was evaluated under the guidance of the book Scale Development: Theory and Application by DeVellis (2003) to ensure its validity and reliability.

The purpose of this study was threefold: first, to explore a conceptual framework of factors contributing to teachers’ acceptance intention towards gamified learning tools in secondary school contexts; second, to develop a scale to measure secondary school teachers’ acceptance intention towards gamified learning tools; and third, to conduct psychometric assessments to establish the validity and reliability. Both the framework and the scale can be used or adapted for future studies in the field of gamified learning.

Literature review

Previous game-related and gamification-related scales

We failed to collect sufficient gamification-related scales, not to mention scales about gamified learning in EFL contexts. As a compromise, we broadened the search to include related scales from the field of video games.

It is common for video-game-related scales to contain obviously game-like elements, as these scales were originally developed to measure engagement and satisfaction with video games. For example, the User Engagement Scale (UES) aiming to measure engagement during video game-play contains “aesthetics” (beautiful interface designs) (Wiebe et al., 2014); the Video Game Pursuit Scale (VGPu) by Sanchez and Langer (2020) that enlists the pursuits attracting users to play video games contains the factor “gaming behaviour”; similarly, the Game User Experience Satisfaction Scale (GUESS) by Phan et al. (2016) contains game-like elements “audio aesthetics”, “narratives” and “play engrossment”.

Among the limited number of gamification-related scales, few focused on technology acceptance. Tondello et al. (2016) proposed six types of gamification users, which are based on the characteristics and preferences of varied users (Philanthropists being motivated by purpose, Socialisers being motivated by relatedness, Free Spirits being motivated by autonomy, Achievers being motivated by competence, Players being motivated by extrinsic rewards, and Disruptors being motivated by the triggering of change). However, the user-type categorization by Tondello et al. (2016) focused on users rather than gamified learning tools. Liu et al. (2019) developed the Festival Gamification Scale (FGS) based on self-determination theory, which contains five factors: relatedness, mastery, competence, fun, and narratives; nevertheless, the FGS scale by Liu et al. (2019) has not been validated in educational contexts.

Eppmann et al. (2018) developed and validated a scale called the gameful experience scale (GAMEX), which measures gameful experience in gamification. The 27-item GAMEX scale consists of six factors, including enjoyment, absorption, creative thinking, activation, absence of negative affect, and dominance. Though containing the word “game”, gameful experience does not indicate the use of video games; instead, it refers to “the positive emotional and involving qualities of using a gamified application” “in a non-game context” (Eppmann et al., 2018, p. 100). From this perspective, the GAMEX scale is one in the field of gamification.

Unfortunately, the GAMEX scale was developed based on three game-related scales rather than gamification-related ones: the immersion questionnaire (IQ) by Jennett et al. (2008), the engagement questionnaire (GEQ) by Brockmyer et al. (2009), and the game experience questionnaire (GExpQ) by IJsselsteijn et al. (2013) that did not publish on peer-reviewed academic journals. The aforementioned scales contain items that relate to video games, such as “I was in suspense about whether I would win or lose the game” (IQ, item 5), “I sometimes found myself to become so involved with the game that I wanted to speak to the game directly (IQ, item 7), “the game feels real” (GEQ, item 5), “I play without thinking about how to play” (GEQ, item 15), and “I really get into the game” (GEQ, item 18).

What’s more, the items in both IQ and GEQ are strongly connected with the flow theory, a popular concept in video game studies. Proposed by Csikszentmihalyi (1990), flow refers to the mental state in which a person is fully immersed in a sense of deep enjoyment, which has characteristics such as complete concentration on the task and transformation of time (speeding up or slowing down). For example: “I was unaware of what was happening around me” (IQ, item 21), and “if someone talks to me, I don’t hear them” (GEQ, item 6). However, the flow state theory emphasizes the optimal experience originated from the balance between challenges and one’s competence, which is comparatively difficult to achieve in educational contexts: most of the current educational contexts are not able to automatic and dynamic detections of learners’ ability, nor the correspondent customised difficulty-adapting tasks. Therefore, the flow-based survey items need further validation in gamified learning contexts.

Generally speaking, there is a lack of scales measuring gamification-related issues, not to mention gamification for the teaching of EFL.

While developing a scale based on gamification-related theories is a promising approach, the current study has taken a different path due to the limited number of supportive frameworks. In this study, gamification is regarded as a general educational technology. As a result, the focus of the current study is mainly on the acceptance intention of a general technology, which will be elaborated upon in the next section.

Previous frameworks and scales about technology acceptance intention

Numerous frameworks have been established and validated to explain an individual’s acceptance intention towards new technologies. These include the Technology Acceptance Model (TAM), TAM2, TAM3, and the Unified Theory of Acceptance and Use of Technology (UTAUT).

Davis (1989) proposed the Technology Acceptance Model (TAM), which explains that an individual’s actual technology use is influenced by behavioral intention and attitude toward using it (see Fig. 1). According to TAM, an individual’s attitude towards technology use is determined by two main factors: perceived usefulness (PU) and perceived ease of use (PEoU). PU measures “the degree to which a person believes that using a particular system would enhance his or her job performance” (Davis, 1989, p. 320), while PEoU is defined as “the degree to which a person believes that using a particular system would be free of effort” (Davis, 1989, p. 320). In TAM, the contributing factors of the perceived usefulness and perceived ease of use were simplified into one word: the external variables.

Fig. 1
figure 1

The technology acceptance model (TAM) (Davis et al., 1989)

TAM has been criticized for being too “parsimonious” to “give practical advice on how to improve the perceived usefulness and perceived ease of use of technology” (Wong, 2016, p. 316). Therefore, researchers have attempted to extend the TAM model into more specific ones by adding factors based on the core of the TAM model (PU, PEoU, BI, and attitude), as summarized in Table 2.

Table 2 Summary of previous TAM-related frameworks

Table 2 serves as a basis for the development of the theoretical framework in the current study, as the factors in the table form the building blocks of the framework (see Fig. 2). The process of establishing the framework is discussed in detail in the subsequent section.

Fig. 2
figure 2

The proposed theoretical framework

Framework conceptualisation

Framework establishment is a core section in the current study, which was achieved by regrouping the TAM-expanding factors detailed in Table 2 with the use of grounded theory analysis techniques.

Grounded theory is a systematic method for developing theories from qualitative data that helps to establish a cohesive theoretical explanation (Lyons & Coyle, 2007). Grounded theory is suitable when little is known about what is being studied, when the relationships among the concepts are not elaborated enough, or when the relevance of the concepts and their relationships has not been corroborated for the population of the context (Birks & Mills, 2015; Vollstedt & Rezat, 2019). Given that the current study meets these criteria, grounded theory was deemed the most appropriate approach for developing the theoretical framework.

Using grounded theory techniques, we treated each TAM-expanding factor as a code and systematically grouped the factors into constructs and sub-constructs. The framework establishment was an iterative process that involved multiple rounds of factor inclusion and exclusion (Flick, 2018). The detailed steps of this process are outlined below.

Firstly, we emphasized the importance of taking a broader perspective in reviewing the topic to avoid the potential pitfalls of a narrow literature review. This approach aimed to connect theories that may not seem directly related but substantively contribute to the current topic (Younas & Porr, 2018).

During this process, we identified the construct perceived risk from a model proposed by Deng et al. (2018), which predicts patients’ acceptance intention toward mobile health services. Although the actual connotation of perceived risk may vary depending on the technology being adopted, this construct provided meaningful insight into the potential factors that could hinder technology adoption. A similar concept “side effects” can be found in the APEASE framework by Michie et al. (2011). Additionally, we added compatibility and complexity under the construct PEoU, which originated from the diffusion of innovation theory proposed by Rogers (2010).

The next step was to merge multiple TAM-related frameworks into one while eliminating factors that were not significantly relevant. A challenge arose with respect to terminology, as different frameworks used different terms to indicate the same or similar objectives. For example, while most TAM-extension frameworks use the term “attitude”, Deng et al. (2018) used the term “trust” and Cigdem and Ozturk (2016) used the term “perceived satisfaction”. To ensure consistency in terminology, we consulted with two experts, as described in the Methodology section. The resulting framework, which integrated and streamlined the relevant factors from the various TAM-related frameworks, is presented in Fig. 2.

A knotty issue is that, in assessing whether gamified learning tools are perceived useful, varied criteria can be applied from different perspectives. Therefore, it may not be reasonable to conclude the benefits of gamified EFL tools under one term alone, such as perceived usefulness. To be specific, a gamified learning tool can be useful in increasing learning inputs, boosting learning motivation, promoting academic achievements, etc. These functions may not necessarily be interdependent. Thus, summarizing them into one term may result in survey items indicating multiple objects, which can pose potential challenges to variance and reliability. Previous surveys assessing perceived usefulness have covered varied benefits, such as advantages over traditional teaching (Beggs, 2000), increased student interest (Beggs, 2000), improved performance expectancy (Venkatesh et al., 2003), enhanced student learning (Beggs, 2000), increased learning opportunity (De Grove et al., 2012; Ibrahim & Jaafar, 2011), and knowledge learning (Lin & Chen, 2013).

Although educational gamification has been associated with various benefits, two primary goals are often highlighted: improving students’ learning engagement (Landers & Armstrong, 2017) and enhancing students’ academic performance (Lin & Chen, 2013). However, Bourgonjon et al. (2009) argued that focusing solely on academic outcomes is too narrow a view of education, as it encompasses much more than just results. Therefore, they proposed two categories of perceived usefulness: “perceived usefulness-process” (U-process) and “perceived usefulness-product” (U-product). To improve readability, these were later reworded as effectiveness on engagement and effectiveness on academic outcomes, respectively (see Fig. 2, Table 3).

Table 3 Constructs and sub-constructs of the theoretical framework

Factors that do no fit into any relevant category were excluded from the analysis, such as the time needed to learn (Beggs, 2000). In addition, factors proposed for specific contexts were also excluded, such as expertise (Sun & Jeyaraj, 2013), self-taught computer literacy (Cheng et al., 2013), and control variables (education and chronic diseases) (Deng et al., 2018). The factor “interaction” was excluded too, as it is more commonly associated with student-dominated gamification tools rather than teacher-led formal education.

One possible controversial decision is to exclude self-efficacy from the theoretical framework. Self-efficacy, “a learner’s belief that he or she is capable of performing a task and reaching a goal” (Huang & Liaw, 2018, p. 24), is a common determinant in TAM-extension frameworks (Cigdem & Ozturk, 2016; Huang & Liaw, 2018; Ibrahim & Jaafar, 2011; Lin & Chen, 2013; Park et al., 2012; Zhang et al., 2017). However, as other researchers have suggested, contributing factors to technology acceptance can be categorized into individual, technological, organizational, and environmental constructs (Park, 2009; Park et al., 2012; Wong, 2016; Wu et al., 2008). The proposed framework mainly enlists technology-related factors, while the self-efficacy is an individual-related construct. Therefore, self-efficacy has been excluded from the framework.

By following the aforementioned procedures, the current study has established the theoretical framework shown in Fig. 2. The framework illustrates that attitude affects acceptance intention and the actual acceptance, while attitude consists of four constructs: perceived enjoyment, perceived usefulness (PU), perceived ease of use (PEoU), and perceived risks. PU itself contains four sub-constructs, including relatedness, PU on engagement, PU on academic outcomes, and social influence. Similarly, PEoU contains four sub-constructs, including accessibility, compatibly, complexity, and user control. Facilitating conditions were also considered, including resource, knowledge, and technique support. Control factors are the mediators of varied user acceptance intention, including gender, age, subject, experience, personal innovativeness, and voluntariness of sue (see Fig. 2).

Methodology

Research design

This research aimed to develop and validate a scale measuring the factors contributing to teachers’ acceptance intention towards gamified learning tools in secondary schools.

To achieve this goal, the study utilized the exploratory sequential mixed method, which involves the collection and analysis of both qualitative and quantitative data in a sequence of phases (Creswell & Clark, 2017). The results of the qualitative data analysis served as the basis to construct the quantitative phase, namely the item construction, which is particularly significant for scale development studies (Mihas, 2019; Moral-Bofill et al., 2020).

Research procedures

Phase 1-a: framework establishment

Phase 1-a aimed to synthesize a theoretical framework that could be applied to the current research context (the use of gamified EFL tools in secondary schools in China). We used grounded theory techniques to group the TAM-expanding factors into constructs and sub-constructs, with each factor considered as a code (as shown in Table 2). The results of this process are presented in Table 3, and the detailed process is explained in the previous section.

Phase 1-b: framework confirmation

Phase 1-b involved an expert interview with two experts who were chosen based on their publications in the gamification area. The two experts were provided with a briefing on the definition of related terms, the current research context, the research scope, the proposed theoretical framework, and the explanation of each construct in the framework. The two experts were expected to comment on the framework structure, as well as the drafted survey items.

Phase 2-a: survey development-item generation

Item generation of the current study was mainly based on interview responses of the secondary-school teachers in China. We conducted a large-scale email interview that involved 347 participants, followed by 14 face-to-face semi-structured interviews. The same with a previous study by Luo et al. (2021), the email interview responses help “gather a large amount of key concepts”, and the face-to-face interviews help “reveal in-depth opinions and comments” (p. 6344). Participants of the two rounds of interview were asked the same questions regarding their experience, perceived benefits, perceived disadvantages, and their general comments towards the use of gamified learning tools in secondary schools. We then paraphrased the interview responses to elaborate the framework. For instance, the framework’s factor “relatedness” was expanded to full sentences like “the content provided by the gamified tool is relevant in my EFL teaching” and “the gamified tool would fit the current EFL curriculum”.

Phase 2-b: survey development-item modification and confirmation

After drafted the survey based on the framework and the collected interview responses, we modified the survey items by customizing them into the current research context (gamified EFL learning in secondary schools). The revised survey items were confirmed with the two experts in Phase 1-b. The two experts checked the items including whether the items are vague, whether the wording is adequate, whether the items pose two questions at the same time, whether the items use adverbs excessively, whether the items are easy to read for the audience, and whether the items are conceptual relevant in terms of gamified learning (da Silva Brito et al., 2018; Jackson & Marsh, 1996; Younas & Porr, 2018).

Phase 3: scale evaluation

Phase 3 was a large-scale survey study, which involved 512 secondary school teachers in China. The sample size was determined following the suggestion of Comrey (1973) (a sample size of 100 is poor, 200 fair, 300 good, and 500 very good). We followed the scale validation process proposed by DeVellis (2003), as detailed in the section of Findings.

Participants, data collection, and ethics considerations

During the framework establishment (Phrase 1), the participants were two experts in the field, who were expected to provide professional comments on the framework.

To generate survey items (Phase 2-a), 347 secondary-school teachers were interviewed via emails and 14 were interviewed via face-to-face meetings. Two experts were invited to comment on the survey items (Phase 2-b). After that, an anonymous survey study was conducted, which involved 512 secondary-school teachers in China. All the teacher-related data were collected by using the convenience sampling method (Table 4).

Table 4 Research design and participants

Data collection and data analysis

Participants for the interview and the survey were recruited with the convenience sampling method. The interview responses were coded and analysed following the six-phase thematic analysis procedure proposed by Braun and Clarke (2006), which includes getting familiar with the data, generating the initial codes, searching for themes, reviewing themes, defining and naming themes, and producing the report.

The survey responses were also collected with the use of convenience sampling method. The survey responses were analysed with the Exploratory Factor Analysis (EFA) with the use of the IBM SPSS software, following the typical scale validation process proposed by DeVellis (2003).

When reporting the qualitative data, “Expert 1” and “Expert 2” were used to refer to the two experts in the expert interview; “P” was used to refer to “participant”, and Arabic numbers were used to refer to the number of the participant in the face-to-face interview (e.g., P4 refers to the fourth participant); since the email interview involved a large number of participants (n = 347), this study did not report the specific contributor of the email interview responses.

Reliability and validity

This study places a strong emphasis on reliability and validity, which involved varied validation techniques, such as triangulation, expert checking, and negative-case analysis.

One highlight of the current study is the employment of triangulation, namely the use of multiple sources of data and multiple evaluations for analysis (Salkind, 2010; Whittemore et al., 2001). To be specific, in scale item construction, we reviewed previous literature, interviewed the participants, and have the results checked with experts; in scale evaluation, we consulted experts for logical and wording issues, as well as using statistical analysis techniques to evaluation the scale. The results of each stage can challenge or explain the results of other stages, which enhances the overall research reliability.

In collecting interview responses, we avoided leading questions by asking participants’ general experience with using gamified EFL tools, rather than asking their perceptions to the PU and PEoU of gamified EFL tools. Additionally, member checking was utilized in each face-to-face interview to ensure that the information received aligned with the information participants intended to deliver.

In analyzing interview responses, a second coder was recruited. The two coders coded interview responses independently, and reached an agreement to any inconsistency. We also searched for negative cases for enhanced reliability. When adding interview responses to the theoretical framework, there were negative cases that do not fit the preliminary patterns. We discussed with the two experts about whether those negative cases can be used to shape the framework or “cast doubt” on it (Patton, 1999, p. 1192).

The drafted scale was confirmed by the two experts regarding its properness and clarity on language use. To ensure accuracy in translation, the back translation method was employed before administering the survey. A translator was recruited to translate the Chinese version of the survey into English, and the translation was compared to the original text. Any meaningful differences between the two versions were then reconciled.

Findings

Part 1: expert interview for framework validation

Both experts confirmed the significance of dividing the effectiveness of gamified learning tools into two aspects: engagement and academic outcomes. This is “especially important in the test-oriented educational system”, where effectiveness is widely regarded as an indicator of academic performance improvement (Expert 2). According to Expert 1, academic performance improvement is the result of the long-term contribution of many facilitating factors and their dynamic interactions with other factors, both facilitating and debilitating. Therefore, if researchers only change one variable for a short period of time without strictly controlling other variables, “it may be improper to assert whether the selected variable contributes to students’ academic performance”. Alternative measures should be considered in assessing the impact of gamified learning tools.

The two experts also suggested that the current scale might not be limited to the test-oriented educational contexts. Expert 1 emphasized that while it is assumed that teachers in test-oriented countries prioritize academic achievements, the reality may be different. According to Expert 1, “a considerable number of teachers care about students’ learning experience”. Expert 2 added that if provided with facilitating conditions, such as encouraging policies and essential resources, students’ willingness “is highly possible to transfer to actual behavior”.

PEoU is a wide-spread concept in technology-acceptance theories. However, according to Expert 1, the connotation of PEoU may be varied in the current context: the PEoU can indicate the PEoU of the technology or the device, or the PEoU of the implementation process. Previous scales assessing PEoU mainly address the former concept “with the latter one neglected” (Expert 2).

In other words, the perceived ease of use of gamified learning tools in secondary school contexts is not only related to the cumbersomeness of using digital devices but also the effort required to address other issues, such as classroom discipline management. This broader perspective on PEoU is important in developing a more comprehensive understanding of the factors influencing technology acceptance in educational settings.

Perceived enjoyment and the effectiveness of engagement are both important concepts in gamification research. However, Expert 2 has criticized the theoretical framework, arguing that the concepts of perceived enjoyment and effectiveness on engagement may overlap in the current context. Specifically, perceived enjoyment may lead to improved engagement, suggesting a possible causal relationship between the two concepts. Therefore, it is important to confirm whether they are measuring the same construct through statistical analysis.

Expert 1 recommended including subject as a control variable because the effectiveness and user experience of gamified learning tools depend heavily on the learning content, which can vary significantly across different subjects.

The two experts noted that while gender and age are commonly used as mediating variables in evaluating educational interventions, they are non-grouped factors, similar to height and weight, and therefore not amenable to assessment using Likert scales. Unlike ordinal factors such as satisfaction level and level of agreement, non-grouped factors lack a natural ordering, making their evaluation using Likert scales inappropriate. Likewise, teaching subject is also a nominal factor.

Regarding facilitating conditions, prior research has identified specific measuring items, including necessary resources, technique support, necessary knowledge, and fitness to the workflow. However, to provide a more comprehensive understanding of facilitating conditions, future studies should explore additional factors such as policy support and peer encouragement (Expert 1 and Expert 2).

To sum up, the two experts criticized and commented the proposed framework, as simplified in Table 5.

Table 5 Summaries of the expert comments

Part 2: teacher interview for survey item development

To create the survey, items were selected from previously validated scales that aligned with the constructs outlined in the framework. However, due to a lack of existing survey items related to the sub-construct of “effectiveness on engagement” and the construct of “perceived risks,” new items were drafted based on the insights gained from the teacher interviews. Details about the specific items are provided below.

Effectiveness on engagement

To measure engagement, one commonly used theory is the three-component engagement model, which includes behavioural engagement, affective engagement, and cognitive engagement (Ibanez et al., 2014). In this study, a significant number of interview responses (n = 39) focused on affective engagement, with participants frequently using keywords such as “interests”, “happy”, “enjoy”, and “fun”. As a result, the scale item primarily pertained to affective engagement.

A lack of support

The interview responses indicated that there was a lack of support from principals, parents or students (n = 27). P6 suggested that school leaders might be hesitant to invest in new technologies, preferring to allocate funds to other areas that provide tangible feedback, such as psychological counseling or fitness training. Inadequate investments in equipment and technical support also limit teachers’ ability to implement gamification in classrooms. When it comes to parents’ support, interviewees indicated that parents are “highly possible” to limit students’ access to digital devices or internet since they worry about digital distractions or addictions. To force their children to focus on study, some parents, even those from wealthy families, refuse to buy smartphones or tablets (P1).

Surprisingly, the survey and interview responses revealed that students are also likely to refuse using gamified tools for learning, with P14 stating “we assumed that gamification would increase student engagement. The reality says no”. P11 added that “play is for fun and education is for transformation. Students are willing to sacrifice entertainment for their goals”.

Side effects

Side effects refer to the unintended consequences of the use of a tool or an intervention (Michie et al., 2011). Thirty-six email-interview responses mentioned the side effects of using gamified learning tools, including “digital addiction” (n = 7), “distraction” (n = 9), “privacy risk” (n = 1), “eyesight damage” (n = 8), “the involvement of undesirable or vulgar information” (n = 2), “harmful radiation” (n = 1) and the general negative influence to learning (n = 7).

The interview responses revealed additional side effects, such as learners being distracted from learning (P9), a negative impact on students’ endurance for independent learning (P1), an emphasis on earning points rather than on learning itself (P2), eye strain (P9), and a potential for addiction (P3).

Teacher control

The interview responses revealed that teachers expressed concerns regarding their control over the device, the learning process, and discipline management when using gamified EFL tools. Teachers were sceptical about students’ self-control, so they worried whether the students would use the digital devices for other purposes. Additionally, teachers were uncomfortable with using gamified learning tools that had a fixed knowledge system in the classroom, which marginalized their role. Participants expressed their preference to “design(ing) the pedagogical content myself” (P2) because “I cannot accept being dominated by a tool” (P7). However, “some gamified learning tools require users to unlock the content one by one”, which “makes me feel lost control” (P14). Teachers also complained that the use of gamified EFL tools in the classroom brings challenges to discipline management. P6 mentioned having large classes with around 60 students, so “we even need to keep them unexcited: once they get excited, the classroom management will be a disaster”.

The survey items for other constructs were obtained by borrowing or adapting them from previous scales, and the sources are listed in Table 6.

Table 6 Scale items and the reference

Part 3: survey study for scale validation

Instrument validation

To assess the internal consistency of the scale, a reliability analysis (Cronbach’s alpha) was performed on the complete dataset (N = 512). The results showed that the survey measuring most of the constructs demonstrated a high level of internal reliability, with Cronbach’s alpha values exceed the common threshold of 0.70 (Hair et al., 1998; Taber, 2018). Specifically, perceived enjoyment (α = .870), perceived usefulness (α = .916), perceived ease of use (α = .916), perceived risk (α = .825), attitude (α = .836), behavioural intention (α = .787), and facilitating conditions (α = .747) showed high internal consistency.

However, the control variables had a relatively low internal reliability (α = .586). Therefore, the three items corresponding to the control variables were removed.

Exploratory factor analysis (EFA) results

The Bartlett’s test was significant (χ2(276) = 8233.012, p < .001) and the overall Kaiser–Meyer–Olkin (KMO) Measure of Sampling Adequacy (MSA) was good at .940 (cut-off criterion: .50. KMO higher than .80 is excellent). These results confirmed that the collected data were adequate to conduct factor analyses.

The correlations among the 28 items have been explored. The results show that facilitating conditions, perceived enjoyment and the perceived usefulness were strongly and positively correlated as a group (greater than .30, significant at the 0.01 level). Similarly, the perceived ease of use and the perceived risks were correlated as a group (greater than .30, significant at the 0.01 level). However, there were no significant correlations between these two groups (PU and PEoU. These findings suggest that the 28 items can be grouped into at least two distinct variables.

After confirming that the dataset’s suitability for factor analysis, an EFA was conducted using Principal Components Analysis (PCA) and Varimax rotation (see Table 7). Following the recommendations of Costello and Osborne (2005), five items have been removed due to low communalities (lower than .50), including the four items for facilitating conditions (.401, .393, .289 and .407 respectively) and item PU1 (.153). The EFA analysis extracted two factors with eigenvalues greater than one, which explained 60.11% of the total variance, with the first factor accounting for 34.07% and the second factor 26.04%. It worth noticing that the factor extraction results indicated that the PU and perceived enjoyment were statistically regarded as one factor, as well as the constructs PEoU and perceived risks.

Table 7 Descriptive data and EFA results for the scale

The Average Variance Extracted (AVE) values for each factor was checked to assess the construct validity of the scale (see Table 7). The results indicated that AVE values for PU and PEoU were 0.598 and 0.600, which is above the cut-off criterion 0.5. Both of the two factors demonstrated good internal consistency as by their Cronbach’s alpha (α) and composite reliability (CR). As shown in Table 7, the Cronbach’s alpha for the two constructs were 0.943 and 0.926, and the composite reliability for the two constructs were 0.951 and 0.937, higher than the cut-off criteria 0.70.

Besides the aforementioned four main steps, this study also involved other steps such as determining the format of the measurement as a 7-point Likert survey, including validation items, conducting a pilot study, and removing invalid surveys, as suggested by DeVellis (2003) and Robertson (2017).

Shortened scale version one

However, it is notable that a high value of alpha (> 0.90) “may suggest redundancies” because the items are may be testing the same question in different forms (Tavakol & Dennick, 2011, p. 54). Though the acceptable values of alpha vary from 0.70 to 0.95 in previous studies, the recommend alpha value is 0.90 at the maximum (Tavakol & Dennick, 2011).

According to Worthington and Whittaker (2006), when a survey contains more than the desired number of items, designers can choose to delete items that have the lowest factor loadings, the highest cross-loadings, low conceptual consistency with other items in the factor, and the ones contribute the least to the internal consistency of the scale scores.

Table 8 in Appendix 2 presents the process of cutting survey items from 13 to five for the items measuring teachers’ the perceived usefulness (PU) of gamified learning tools in secondary schools (PU-GLT). By removing the items with low factor loadings, this study reduced the number of items from 13 to eight, with the Cronbach’s alpha dropping from 0.943 to 0.923. Further reduction was achieved by both referring to the factor loadings and the function “Cronbach’s alpha if item deleted” that generated by the software IBM SPSS. Eventually, this study obtained five survey items with the value of the internal consistency of 0.899.

Following the same procedure, there were six items retained out of 10 items measuring teachers’ perceived ease of use (PEoU) of gamified learning tools in secondary schools (PEoU-GLT), as shown in Table 9 (in Appendix 2). The specific survey items for the PU and PEoU scales are detailed in the Appendix 1.

Discussion

Structure of the framework

The data from the current study revealed surprising results: the structure of the proposed framework in Fig. 2 is not as promising as expected from the statistics perspective. Firstly, the face validity (whether the items are measuring what they claimed to measure) is not satisfactory as the two constructs perceived risk and PEoU statistically regarded as one factor rather than two distinct ones. The same was observed for the constructs of perceived enjoyment and PU. Secondly, the data demonstrated that two assumed-important constructs were not as significant as expected, including facilitating conditions and control variables.

The face validity problems confirmed experts’ comments that “perceived enjoyment” and “effectiveness on engagement” are overlapping concepts. Previous TAM-related frameworks have included enjoyment as a factor, such as the Technology acceptance model 3 (TAM3) by Venkatesh and Bala (2008), the educational games acceptance model by Ibrahim and Jaafar (2011), and the gamification effects on user acceptance model by Herzig et al. (2012). Findings of the current study indicated that the measurement of perceived enjoyment can be replayed by the measurement of one type of learning engagement, such as emotional engagement in classroom (also known as affective engagement).

This interpretation of the face validity problems with the PEoU construct is plausible. It suggests that in the context of using gamified learning tools in formal education, PEoU is not just about the ease of using digital devices, but also about the ease of implementing gamification as a whole. Once the implementation of gamification is time-consuming and effort-demanding, the process becomes “not easy”, which thereby decrease the perceived “ease” of use of the gamification approach or gamified learning tools.

Facilitating conditions emphasize necessary resource, necessary knowledge, technical support, and compatibility with the workflow, as proposed by Wong (2016). The four items to measure facilitating conditions were excluded mainly because they were separated into two factors in the EFA with each factor containing two items, while each factor should contain at least three items for further analysis such as structural equation modelling (SEM) (MacCallum et al., 1999; Raubenheimer, 2004). Therefore, it is important that the removal of the four items in the current study does not indicate the improperness of involving the construct “facilitating condition” in the TAM-extension model. Future studies should construct and validate a scale measuring facilitating conditions with more initial items. Other situations besides the aforementioned four aspects can also be considered, such as easy access to the right tool, policy support, peer influence, etc.

Items measuring control variables presented high variance, which makes them not suitable for being put into one scale. However, the variables can be used to assess how people of different groups perceive the gamified learning tools. In previous studies, most of the involved variables are age and gender, such as the study of Deng et al. (2018) and Venkatesh et al. (2003). More variables can be considered, such as experience, voluntariness and personal innovativeness. As suggested by Expert 1 in the expert interview, subject can be considered because gamifying an educational activity is highly relevant to the content, while the content highly depends on the nature of the teaching subject.

Excluded items from the scale

The first item being removed from the PU scale is PU1 that adapted from the original PU scale proposed by (Davis, 1989): “my job would be difficult to perform without the tool”. In contrary, interview responses revealed that the participants do not regard gamification as a facilitating tool to their teaching efficiency, as the gamified learning tools are both effort-demanding and time-demanding on preparation for teachers (P1 and P14), while being not able to guarantee quick benefits on academic performance for students (P1, P3, P5, P7, and P14). At the same time, digital gamified learning tools as the mediums of new technology bring challenges on acceptance intention and device operation, as one participant aged over 46 reported that he as an “elder man working in a rural school” does not know, nor interested in, new technologies. He furtherly suggested to “interview young teachers working in urban schools: they might know more about educational innovations” (P13). However, almost all of the participants admitted that gamified learning tools are of great help or of great potential in making students engaged. In short, at least in the current study, gamified learning and the existing learning tools aim to boost students’ learning engagement at the sacrifice of teachers’ convenience. Therefore, from secondary teachers’ perspectives, gamified learning tools are icing on the cake (improving students’ learning experience) rather than providing timely assistance (enhancing academic performance).

Consequently, the perceived usefulness in the current research context was interpreted as the usefulness for students from the learning experience perspective rather than the usefulness for teachers from the implementation perspective. On the same principle, the factor analysis results indicated that several items should be removed, including the one related with teacher’s performance (PU2 “Using the tool improves my performance”), teachers’ teaching experience (PU9 “Teachers enjoys using gamified learning tools in schools”), as well as the two related with subjective norms (PU10 “People who influence my behaviour think that I should use gamified learning tools” and PU11 “People who are important to me think that I should use gamified learning tools”).

PU4 and PU5 were included in the 8-item scale and excluded from the 5-item scale due to redundancy concerns (PU4 “In my teaching, usage of the tool is relevant” and PU5 “The tool would fit the current curriculum”), which were designed to test the sub-construct curriculum relatedness (Adukaite et al., 2017; Venkatesh & Davis, 2000). Curriculum relatedness is comparatively not as straightforward as effectiveness when connecting with the keyword usefulness; however, “useful” can be interpreted twofold: it can be interpreted as “competent” or “effective” in achieving goals, which is in strong relation with “effectiveness”, while being interpreted as “serviceable” or “functional” for a practical purpose, which is related with “being helpful”. Previous literature indicates that the integration of game features and pedagogical needs is a huge problem for educational gamification, while the interview responses show that the secondary school teachers are highly possible to quit using the gamified learning tools if they are not directly related the current curriculum. Therefore, though PU4 and PU5 were removed from the current study, the rationale of excluding the sub-construct “relatedness” needs further validation.

All of the proposed items measuring PEoU have been validated, except the four being excluded for redundancy concerns. PEoU3 (“I find it cumbersome to use the gamified learning tool”) is similar with PEoU1 and PEoU2 (“The gamified learning tool is rigid and inflexible to interact with” and “The gamified learning tool often behaviours in unexpected ways”); the three items related with the perceived risks (a lack of support, a lack of classroom control, and risks of side effects) can be generalised as the risks of losing control and the challenges of complexity, which are measured by PEoU7 and PEoU6 (“I feel lost control when I’m using the gamified learning tools in teaching” and “The gamified learning tools are complex to use”). Again, the concept “use” interpreted by the participants is not about the interaction with the digital devices but also the implementation of gamification approach as a complete activity.

To sum up, the proposed framework was confirmed not fully valid in the current research contexts, as only two of the proposed six constructs were validated from the statistics perspective. The constructs “perceived enjoyment” and “perceived risks” should be removed; the construct “facilitating conditions” should be extended and revalidated in future studies; and the construct control variables (gender, age, subject, experience, personal innovativeness, and voluntariness) should be excluded from the scale.

Conclusion

Based on the technology acceptance model (TAM), this study constructed and validated a two-dimensional scale measuring teachers’ perceived usefulness (PU) and perceived ease of use (PEoU) of gamified learning tools for the teaching of English-as-foreign-language in secondary schools, which were named PU-gamification-EFL and PEoU-gamification-EFL respectively. One highlight of the current study is the involvement of multiple rounds of data collection for cross-validation.

The findings, namely the two developed scales (PU-gamification-EFL and PEoU-gamification-EFL), can be directly used for data collection. They can also be modified to fit varied contexts, which helps provide a wider range of results.

Further studies can use the developed scales to evaluate gamified FLL tools across a variety of courses and target groups, expanding the range of results. Moreover, future studies can increase the sample size and diversify the educational backgrounds of the participants. Additionally, new scales can be established based on other technology-acceptance-related frameworks or theories, such as the Diffusion of Innovation Theory (DIT) (Rogers, 2003).

Despite confirming the validity of the PU-gamification-EFL and PEoU-gamification-EFL scales, this study’s findings are limited by the small number of items measuring the sub-constructs. As suggested by MacCallum et al. (1999) and Raubenheimer (2004), there should be at least three items for each construct for further studies. This is also why in shortening the scale some assuming-important items were excluded from the statistics analysis, such as the items measuring the relatedness of gamified EFL tools. Therefore, to address this issue, researchers should consider designing at least three items for each sub-construct in future studies to ensure a more comprehensive understanding of gamification’s effects on EFL learning.