1 Introduction

Artificial intelligence (AI) is currently changing the world through enhancing automation with intelligent actions in many sectors (Jordan & Mitchell, 2015). In education, AI could also change people’s fundamental understanding and practices of teaching and learning. Nonetheless, research about AI in education has been focusing on the technical aspects of system development (Divekar et al., 2021), with little attention to factors that shape the use of AI in the K-12 education settings (Zawacki-Richter et al., 2019). Research on teachers’ perceptions of the use of AI has only just emerged (Chiu & Chai, 2020). In China, the Development Plan of New Generation Artificial Intelligence (The State Council of China, 2017) proposed that AI would be used to accelerate the reform of teaching methods. Several AI education demonstration districts were designated to explore the use of AI in subject matter teaching (Ministry of Education of China, 2018). To facilitate the design of future AI-assisted pedagogical models, it is obvious that more efforts should be devoted to understanding teachers’ views (Chai et al., 2021).

In English as foreign language (EFL) classes, AI provides new opportunities for EFL teachers to improve their teaching efficiency and quality (Chun, 2020). For example, based on data mining technology, AI can foster personalized learning and provide immediate feedback that may enhance learning satisfaction (Pokrivcakova, 2019). Emerging literature indicates that some tasks that EFL teachers perform are being enhanced or replaced by AI. These include analyzing learners (Tlili et al., 2021), checking homework, marking tests, and correcting pronunciation and writing (Florea & Radu, 2019).

Although AI has great potential to facilitate EFL learning, the integration of new technologies into classroom teaching generally requires teachers to overcome multilayer barriers (Tsai & Chai, 2012). EFL teachers may face many challenges in adapting to AI and making full use of AI to improve teaching. The effort needed for AI-supported language learning to emerge involves teachers’ acceptance and creative lesson designs (Geng et al., 2021). Hence, while foreign language teachers generally support the use of modern technology in the classroom (Pokrivcakova, 2019), they are also generally concerned about many factors that include external factors (lack of material equipment, insufficient technical support, inflexible curriculum, or limited time) and internal factors (lack of knowledge and skills, contradictory teachers’ beliefs, or fear of losing pedagogical roles) (Pokrivcakova, 2019). Hence, when integrating AI as a new technology in EFL lessons, it is necessary to study teachers’ acceptance of AI, and the related external and internal factors that teachers consider.

To address the emerging trend of integrating AI into teaching in K-12, this study investigated EFL teachers’ behavioral intention to implement AI-supported language learning in middle school and the related external and internal factors. This study adopted the Unified Theory of Acceptance and Use of Technology (UTAUT) as the theoretical basis for external factors influencing teachers’ behavioral intention to use AI, and Technological Pedagogical and Content Knowledge (TPACK) as the theoretical basis for teachers’ internal factors related to AI use. Understanding the interplay between the external usability factors and internal pedagogical factors can contribute to the co-development of pedagogy and technology.

The UTAUT has been widely used to study the effects of people’s perceptions of technology on their behavioral intention to use a technology (Venkatesh et al., 2003). On the other hand, TPACK studies assess teachers’ knowledge of technology integration with teaching (Zhou et al., 2017), which has been regarded as an important supplement to the UTAUT (Lim & Harwati, 2021). The UTAUT accounts for usability and usefulness factors pertaining to technology from users’ perspectives, while the TPACK framework accounts for teachers’ knowledge and design expertise needed for technology integration. Previous studies have pointed out that integrating technology for teaching and learning is a complex problem with multiple forms of barriers; and teachers’ acceptance and design efforts are interrelated dimensions that need to be addressed to facilitate their development of expertise for the pedagogical use of emerging technologies (Ertmer, 1999; Geng et al., 2021). In particular, while usability and usefulness are the outcomes of technological design, they are also dependent on the users’ perceptions of the affordances of the existing technological design. For instance, many teachers perceive PowerPoint as a useful information delivery tool and use it for teacher-centered teaching. For teachers with a strong inclination for student-centric pedagogies, PowerPoint can be used as a multimedia knowledge construction tool for students to construct understanding of a topic based on multiple information resources (Teo et al., 2008). While both uses indicate teachers’ acceptance of the technology, teachers who are well versed in both traditional and student-centric pedagogies (in other words, teachers who have stronger TPACK) could have stronger acceptance as they are more able to find pedagogical applications for the technology. In other words, teachers’ technological and/or pedagogical knowledge could shape their assessment of the usefulness and usability of a technology, which could in turn shape their intention to use the technology. Hence, the technology acceptance model and TPACK are interrelated, although the interrelationships could be contextual (see later). Teo et al. (2019) combined these theories to predict preservice teachers’ intentions to use web 2.0 technologies for teaching. The combination of the two models should predict teachers’ intention to use new technology to a large extent (Bardakci & Alkan, 2019). Based on UTAUT and TPACK, this research studied the predicting factors of teachers’ behavioral intention to use AI in teaching.

2 Literature review and hypotheses

2.1 UTAUT and behavioral intention

The theory of reasoned action (Fishbein & Ajzen, 1975) proposes that human behaviors are reasoned action following from Behavioral Intention. Behavioral Intention is formed by the information or beliefs people possess about the behavior under consideration (Fishbein & Ajzen, 1975). On this basis, the Technology Acceptance Model (TAM) is proposed to predict users’ Behavioral Intention and behavior. In TAM, Perceived Usefulness and Perceived Ease of Use can predict users’ attitudes towards technology; Perceived Ease of Use predicts users’ Behavioral Intention through Perceived Usefulness, and Behavioral Intention can predict their actual use (Davis, 1989). The UTAUT was proposed based on TAM and seven other models (Venkatesh et al., 2003). The model comprises four key factors to predict users’ acceptance of technology: Performance Expectancy (corresponding to Perceived Usefulness in TAM), Effort Expectancy (corresponding to Perceived Ease of Use in TAM) (Zhou et al., 2022), Social Influence, and Facilitating Conditions. Because the UTAUT model integrates the advantages of the existing eight models, the predictive power of the model reaches 70%, surpassing any previous technology acceptance model (Venkatesh et al., 2003). Performance Expectancy refers to an individual’s belief about how much a system will enhance his/her job performance. Effort Expectancy denotes the degree of ease associated with the use of the system. Social Influence refers to the degree to which an individual perceives that important others believe he or she should use the new system, and Facilitating Conditions means the degree to which an individual believes that an organizational and technical infrastructure exists to support the use of the system (Venkatesh et al., 2003).

As a new technology, AI has begun to be integrated into language learning. AI systems in language learning mainly involve natural language processing, expert systems, speech recognition, robotics, intelligent agents, and others (Liang et al., 2021). There are different kinds of AI tools that could support language learning. These tools include chatbots, machine translation tools, text-to-speech or vice-versa, and writing assistants (Jiang et al., 2021; Pokrivcakova, 2019). However, most studies to date have focused on system development and students in higher education, while few have paid attention to K-12 and teachers (Liang et al., 2021). The only relevant research explored Spanish and British teachers’ perceptions of using mobile assisted language learning and natural language processing technologies as open educational resources (Pérez-Paredes et al., 2018). The UTAUT is usually used to study users’ Behavior Intention to use new technology with related factors. Previous empirical studies have shown that the UTAUT can predict the intention to use AI among stakeholders in higher education (Chatterjee & Bhattacharjee, 2020; Kazoun et al., 2022) proposed that the UTAUT could be used to explore the factors affecting AI agents/chatbots applications usage. Therefore, it is reasonable to conjecture that the UTAUT is also applicable to predict the intentions of middle school EFL teachers to use AI technology for teaching and learning. In addition, studies associated with technology acceptance could focus on measuring behavioral intention as the dependent variable instead of actual use of AI. Behavioral intention has demonstrated a strong connection with actual technology use in many empirical studies, and can be adopted as a reliable predictor of actual behavior (Davis, 1989; Teo et al., 2019; Venkatesh et al., 2003). Besides, measuring users’ behavior through a questionnaire survey may be less reliable, so many studies regard Behavioral Intention as the result variable (Bardakci & Alkan, 2019; Lim & Harwati, 2021).

According to UTAUT, Performance Expectancy, Effort Expectancy, and Social Influence predict users’ Behavioral Intention to use a technology or system, and Facilitating Conditions directly predict users’ behavior, but not their Behavioral Intention (Venkatesh et al., 2003). Based on Venkatesh et al.’s (2003) empirical findings, H1-3 were formulated. In a follow-up study, Effort Expectancy was found to have a prediction effect on Performance Expectancy (Abbad et al., 2009). Hence, H4 was formulated for testing. Chatterjee & Bhattacharjee (2020) found that in higher education, Facilitating Conditions could positively predict Behavioral Intention. Hence, H5 was formulated for testing. Nonetheless, Chatterjee & Bhattacharjee (2020) reported that Performance Expectancy could not significantly predict Behavioral Intention. To clarify the factors that are positively associated with Behavioral Intention, it is necessary to carry out empirical research. To summarize, past empirical studies of UTAUT support the following hypotheses:

H1: EFL teachers’ Performance Expectancy of AI teaching systems could predict their Behavioral Intention to use AI.

H2: EFL teachers’ Effort Expectancy of AI teaching systems could predict their Behavioral Intention to use AI.

H3: EFL teachers’ Social Influence of AI teaching systems could predict their Behavioral Intention to use AI.

H4: EFL teachers’ Effort Expectancy of AI teaching systems could predict their Performance Expectancy of using AI.

H5: EFL teachers’ Facilitating Conditions of AI teaching systems could predict their Behavioral Intention of using AI.

2.2 TPACK and behavioral intention

In previous studies about Behavioral Intention, TPACK was usually regarded as the important external factor in the Technology Acceptance Model (Hsu, 2017;Yang et al., 2021), and a significant supplement to the UTAUT (Bardakci & Alkan, 2019; Lim & Harwati, 2021). TPACK is widely used to describe teachers’ knowledge in integrating technologies into teaching (Koh et al., 2013). In essence, teachers’ TPACK is a form of designed knowledge that is context sensitive (Angeli & Valanides, 2009) and as such, it is a form of dynamic knowledge constructed for specific topics and students. Teachers who possess strong TPACK are able to make sense of emerging technologies and create new lessons and practices that enhance students’ learning (Geng et al., 2021).

Teachers’ TPACK is built based on three basic types of knowledge: technological knowledge (TK), pedagogical knowledge (PK), and content knowledge (CK) (Mishra & Koehler, 2006). These three kinds of knowledge interrelate to form technological content knowledge (TCK), pedagogical content knowledge (PCK), technological pedagogical knowledge (TPK), and technological pedagogical content knowledge (TPACK). Teachers with different backgrounds are likely to construct TPACK in a different way (Hsu, 2017; Koh et al., 2013). Investigating the interrelations of teachers’ AI-TPACK knowledge, and how they are associated with teachers’ intention to use AI could provide valuable information about how to develop teachers’ ability to design pedagogical use of AI.

Previous empirical research found that teachers’ TPACK would significantly and positively influence their behavioral intention to use other technology (Bardakci & Alkan, 2019; Lim & Harwati, 2021). It is reasonable to speculate that teachers’ TPACK will have an impact on their behavioral intention to use AI in education (i.e., H8). In-service teachers usually possess good PCK from teaching experiences, and need to transform PCK to TPACK (Hsu, 2017). In the case of AI, teachers need to understand basic concepts of AI technologies and be familiar with current AI-supported language technologies including applications such as automated speech and text recognition, grammar checking, translation machines, and so on (Jiang et al., 2021). These language processing AI technologies are referred to as AI language technological knowledge (AIL-TK). How AIL-TK can be employed to support language learning tasks may draw ideas from the AI-TPK (e.g., using AI as an intelligent tutor), which contribute to the formation of AI-TPACK (e.g., using speech recognition to practice oral presentations), which forms H9-H11, respectively. Acquiring and developing these three forms of knowledge is foundational for EFL teachers to integrate AI technology, and possessing these different types of knowledge implies that the teachers are more able to teach with AI, and thus their willingness to use AI should be enhanced (i.e., H6-H8). These hypotheses were supported theoretically and empirically by previous research that did not specify the technology (Koh et al., 2013). Therefore, we conjecture that these three kinds of knowledge may have an impact on teachers’ intention to use AI. The following hypotheses were proposed:

H6: EFL teachers’ AIL-TK could predict their Behavioral Intention to use AI.

H7: EFL teachers’ AI-TPK could predict their Behavioral Intention to use AI.

H8: EFL teachers’ AI-TPACK could predict their Behavioral Intention to use AI.

H9: EFL teachers’ AIL-TK could predict their AI-TPK.

H10: EFL teachers’ AIL-TK could predict their AI-TPACK.

H11: EFL teachers’ AI-TPK could predict their AI-TPACK.

2.3 UTAUT and TPACK

Researchers have reported that teachers’ TPACK could be an external factor in the Technology Acceptance Model (TAM). TPACK had a significant impact on users’ Behavioral Intention through the mediation of Perceived Usefulness (corresponding to Performance Expectancy in UTAUT) and Perceived Ease of Use (corresponding to Effort Expectancy in UTAUT) (Hsu, 2017; Yang et al., 2021). These studies provide support for H12-H17. In recent years, as UTAUT was proposed based on TAM, researchers have found that TPACK could be an important supplement to UTAUT (Bardakci & Alkan, 2019; Lim & Harwati, 2021). The research about pre-service teachers using an interactive whiteboard found that only Performance Expectancy had a high explanation (0.91) for Behavioral Intention, while Effort Expectancy, TK, TPK, PK and other factors were not significant predictors of intention (Bardakci & Alkan, 2019). However, the research on pre-service teachers using general technology found that when combining the two models, only TPACK would significantly predict Behavioral Intention, while Performance Expectancy, Effort Expectancy, and Facilitating Conditions did not predict intention (Lim & Harwati, 2021). This shows that in different contexts, the factors predicting teachers’ intention to use technology are not always the same; the aim of this study was therefore to clarify the predictive role of these factors. The hypotheses were formed as follows:

H12: EFL Teachers’ AIL-TK could predict their Performance Expectancy;

H13: EFL Teachers’ AI-TPK could predict their Performance Expectancy;

H14: EFL Teachers’ AI-TPACK could predict their Performance Expectancy;

H15: EFL Teachers’ AIL-TK could predict their Effort Expectancy;

H16: EFL Teachers’ AI-TPK could predict their Effort Expectancy;

H17: EFL Teachers’ AI-TPACK could predict their Effort Expectancy.

As for the relationship between Facilitating Conditions and TPACK, Cheung et al., (2016) found that Facilitating Conditions, Performance Expectancy, and Effort Expectancy were positively correlated with TPACK. Furthermore, Lachner et al., (2021) found that teachers’ perceived support for technology integration, which is conceptually akin to Facilitating Conditions in UTAUT, would predict their TPACK. Therefore, it was hypothesized that Facilitating Conditions would predict teachers’ AIL-TK, AI-TPK, and AI-TPACK in this research. The hypotheses were as follows:

H18: EFL teachers’ Facilitating Conditions could predict their AIL-TK;

H19: EFL teachers’ Facilitating Conditions could predict their AI-TPK;

H20: EFL teachers’ Facilitating Conditions could predict their AI-TPACK.

The research hypotheses are shown in Fig. 1.

Fig. 1
figure 1

The hypothesized relations among the eight research constructs

Note: AIL-TK (AI Language Technological Knowledge), AI-TPK (AI Technological Pedagogical Knowledge), AI-TPACK (AI Technological Pedagogical Content Knowledge)

3 Methodology

3.1 Participants

A survey was adapted to test the research model. The survey was conducted in an AI education demonstration district in China. This district has been promoting the use of AI applications for English teaching and learning since December 2020. With the joint support of the local government, teacher training centers, teacher research centers, and universities, all teachers participated in research projects, focusing on one or more aspects of AI-supported language learning, such as listening, speaking, writing, and reading. There were three experts for every five projects, including an educational technology expert, an EFL expert, and a researcher. Expert guidance meetings about how to design and implement lessons using AI in EFL were held on a weekly basis. The teachers implemented the newly constructed AI-supported language learning lesson activities, which were equivalent to newly constructed TPACK. Hence, they were purposively selected to participate in this study.

There were 40 middle schools in this district. From November 2 to 4, 2021, questionnaires were sent to 20 randomly chosen schools with 254 EFL teachers, and 219 valid responses were received. These responses constituted subsample 1 (n = 219) that was used for Exploratory Factor Analysis. After obtaining the findings of the EFA which provided support for the validity and reliability of the scale, another independent set of data was collected for Confirmatory Factor Analysis (CFA). From November 5 to 7, 2021, questionnaires were sent to the remaining schools with another 303 EFL teachers, and 251 valid responses were received. These responses constituted subsample 2 (n = 251), which was used for CFA. A total of 470 valid responses were collected in this survey. The respondents included 6.6% males and 93.4% females, which reflects the current gender distribution of English teachers in China. The descriptive statistics of the participants’ demographic data are shown in Table 1.

Table 1 Demographic Information of Participants

3.2 Instrument

The questionnaire consisted of two parts: the first part collected demographic information, and the second part measured teachers’ Performance Expectancy, Effort Expectancy, Social Influence, Facilitating Conditions, AIL-TK, AI-TPK, AI-TPACK, and Behavioral Intention. The survey employed a 5-point Likert scale that ranged from 1 (strongly disagree) to 5 (strongly agree). The scale consisted of eight constructs in the survey that were developed from previous research and the interview records of EFL teachers. Fifteen EFL teachers were interviewed and their experiences of AI-supported language learning were included for the preparation of the questionnaire. Table 2 shows the definitions of each construct and sample item.

Table 2 The definitions and sample items of each construct

As some items were substantially changed from the previous scales, we invited three educational technology professors to check the content validity of the scale, and then asked six English teachers in middle school to complete the questionnaire and advise us on any necessary revisions. The revised scale was used to survey the teachers.

3.3 Data analysis

Before data analyses were performed, normality was tested. All the measured items had appropriate skewness (ranging from − 0.639 to 0.209) and kurtosis (ranging from − 0.409 to 1.507), smaller than the requisite maximum values of |1| and |2| respectively, indicating that the data of all items were close to the normal distribution (Noar, 2003).

Data analysis consisted of four stages: Exploratory Factor Analysis (EFA), Confirmatory Factor Analysis (CFA), reliability analysis, and Structural Equation Modeling (SEM). EFA and reliability analysis of the scale were conducted with SPSS20.0, and CFA and SEM were conducted with Mplus 8.3. Subsample 1 was used for EFA, and subsample 2 was used for CFA. In the reliability analysis and SEM, all 470 samples were used.

In EFA, principal axis factoring analysis (PFA) and the Direct Oblimin Rotation method were used to extract the factors, and components were extracted with eigenvalues greater than 1. Items with cross factor loadings or low loadings (< 0.5) were deleted (Deng et al., 2017).

In CFA and SEM, the standards recommended by Hair et al., (2014) were adopted. Accordingly, indices of χ²/df (< 5), Root Mean Square Error of Approximation (RMSEA) (< 0.10), Comparative Fit Index (CFI) (> 0.90), and the Tucker-Lewis Index (TLI) (> 0.90) were used to check the model fit degree. Then Average Variance Extracted (AVE) (> 0.5) and Construct Reliability (CR) (> 0.7) were calculated using factor loadings (\(\lambda\)) to check the convergent validity of the scale. The square root values of AVEs of components were compared with the correlations between components to check the discriminant validity of the scale. The correlations between all factors were tested for significance before SEM.

In reliability analysis of the scale, the internal consistency coefficients’ Cronbach’s α values were calculated, where the whole scale and all constructs needed to be higher than 0.7 (Fornell & Larcker, 1981).

4 Results

4.1 Validity and reliability

The EFA results showed good validity of the scale. The Kaiser-Meyer-Olkin measure value was 0.94 (p < 0.001), indicating that it was suitable for factor analysis. A total of eight factors were obtained, and the total explained variance was 77.45%. The loadings of each item on the factor were between 0.53 and 0.94 (see Table 3).

To further verify the structural validity of the scale, CFA was carried out. The model fit indices of χ²/df was 2.88 (< 5.0), RMSEA was 0.09 (< 0.10,), CFI was 0.91 (> 0.90), and TLI was 0.91 (> 0.90), indicating that the fit for the items of the scale was acceptable. As shown in Table 3, all standardized factor loadings were in a good range of 0.72 to 0.96. The values of AVE were higher than 0.5, and CR was higher than 0.7, indicating good convergent validity. All the square root values of AVE of each component were higher than the correlations between it and other components (see Table 4), indicating good discriminant validity.

The internal consistency coefficient test showed good reliability of the scale. The Cronbach’s α of the whole scale was 0.98, and those of the components are shown in Table 3. All the construct reliabilities were higher than 0.7, indicating good reliability. Means of components were all above the midpoint 3, and standard deviations of all components were between 0.59 and 0.74, as shown in Table 3.

Table 3 Means, standard deviations, factor loadings (λ), AVEs, and construct reliability
Table 4 Correlations between components and AVE of the components

4.2 SEM for hypothesis testing

The model fit indices of the structural equation model (SEM) were good. The value of χ²/df was 3.57 (< 5.0), RMSEA was 0.07 (< 0.10), CFI was 0.92 (> 0.90), and TLI was 0.92 (> 0.90), indicating a good fit of the structural equation model. The verification of the research model was as shown in Fig. 2.

The hypothesized paths of Performance Expectancy, Social Influence, AIL-TK, and AI-TPACK to Behavioral Intention were all significant. The hypotheses H1, H3, H6, and H8 were supported in this research. The hypothesized paths of Effort Expectancy and AI-TPK to Behavioral Intention were not significant. This means that Effort Expectancy and TPK did not have a direct significant predictive power on Behavioral Intention. Hypotheses H2 and H7 were therefore not supported.

The hypothesized path of Effort Expectancy to Performance Expectancy was significant, but the path of Facilitating Conditions to Performance Expectancy was not significant. Hypothesis H4 was supported, but H5 was not. The paths of AIL-TK to AI-TPK, AIL-TK to AI-TPACK, and AI-TPK to AI-TPACK were all significant, indicating that AIL-TK and AI-TPK had a positively significant predictive power on AI-TPACK, and AIL-TK had a positively significant predictive power on AI-TPK. Hypotheses H9, H10, and H11 were supported.

As for the relationship between AI-TPACK and UTAUT for AI, the hypothesized paths of AIL-TK and AI-TPK to Performance Expectancy were not significant, so H12 and H13 were not supported. The path of AI-TPACK to Performance Expectancy was significant, so H14 was supported. The paths of AIL-TK, AI-TPK, and AI-TPACK to Effort Expectancy were significant, so H15, H16, and H17 were supported. The paths of Facilitating Conditions to AIL-TK, AI-TPK, and AI-TPACK were significant, so H18, H19, and H20 were supported.

The explanatory power (R2) is the interpretable variation or total variation to judge the explanatory degree of the model. In this study, the R2 values of Performance Expectancy, Effort Expectancy, AIL-TK, AI-TPK, AI-TPACK, and Behavioral Intention were respectively 0.41, 0.37, 0.25, 0.50, 0.74, and 0.54, which shows that the variables of each facet had explanatory power of the model, as they were above the threshold of 0.3 (Cohen, 1977).

Fig. 2
figure 2

The path coefficients of the structural model

Note: ***p < 0.001, **p < 0.01. AIL-TK (AI Language Technological Knowledge), AI-TPK (AI Technological Pedagogical Knowledge), AI-TPACK (AI Technological Pedagogical Content Knowledge)

5 Discussion

To examine EFL teachers’ intention to implement AI-supported teaching, we investigated teachers’ TPACK associated with language-based AI tools and factors associated with UTAUT. A total of 15 hypotheses were supported, while five were not. The hypothesized theoretical model is hence generally supported. The findings of this study are further discussed below.

This study used EFA and CFA to establish a valid and reliable scale to measure EFL teachers’ perceptions of TPACK, UTAUT factors, and behavioral intention to use AI in teaching (see Appendix). For this scale we developed items about EFL teachers’ AI-TPACK, which are innovative in the TPACK field and can be used in future studies. This questionnaire can also be used to study how to improve teachers’ behavioral intention in future research.

5.1 EFL teachers’ perceptions of and behavioral intention to use AI

In this study, the means of all factors are above the midpoint 3, indicating that EFL teachers have positive perceptions of external factors about AI, adequate AI teaching knowledge, and strong behavioral intention to use AI in education. The findings match the expectations of the researchers as the teachers had received professional development guidance about AI-supported teaching. Theoretically, the model we hypothesized indicates that Facilitating Conditions support teachers’ development of three types of TPACK, which in turn promoted the teachers’ acceptance of AI for EFL, and hence teachers’ intention to use AI for EFL.

Specifically, the EFL teachers perceived that AI technology was very helpful to their teaching and was easy to use. They thought that schools and colleagues have supportive attitudes towards AI in education, and organizations and technicians provide adequate support. They also believed that they had developed an understanding of AIL-TK, AI-TPK, and AI-TPACK. They were inclined to continue to learn and use AI in the future. The survey results indicated that these teachers in the AI education demonstration district in China are well-positioned to experiment with the emerging language-based AI applications for the teaching of EFL.

5.2 Factors that are positively associated with behavioral intention

As the results showed, the most influential factor on teachers’ Behavioral Intention to use AI is Performance Expectancy. When teachers have high Performance Expectancy, they believe that AI can help them teach well, such as improving their teaching efficiency and quality. This shows that if the government or schools want to promote the integration of AI and EFL teaching in middle school, they need to help the teachers to understand the usefulness of AI for their teaching. Diverse AI applications enable EFL learners to practice meaningful interaction without limitation of time and place (Bibauw et al., 2019). It can also reduce learners’ anxiety (El Shazly, 2021), which is the main barrier for EFL in China (Jiang et al., 2021; Zheng et al., 2021). When teachers realize the usefulness of AI technology for their teaching, it is likely to improve their Behavioral Intention to use AI.

Effort Expectancy cannot directly predict teachers’ Behavioral Intention to use AI. This indicates that EFL teachers would not want to use AI just because AI products are easy to use. This is similar to the results of previous studies about other technologies (Abbad et al., 2009; Venkatesh et al., 2003). As indicated by the results, Effort Expectancy can predict teachers’ Behavioral Intention through Performance Expectancy. Post hoc analysis indicated that the indirect effects of 0.11 (p < 0.001) are significant, which is consistent with the previous research (Abbad et al., 2009). Previous studies have shown that teachers sometimes think new technology may increase their burden and reduce their efficiency (Ozgur, 2020), because of the time and energy needed to adapt to it. In this research, the ease of use of AI technology is indirectly predicting the teachers’ behavior intention.

Teachers’ AI-TPACK is another important factor in predicting teachers’ Behavioral Intention to use AI directly. When EFL teachers think they have the knowledge to integrate AI with pedagogical knowledge and content knowledge, they are more likely to want to use AI in their teaching. This is similar to Lim and Harwati (2021) study about pre-service teachers’ integration of technology for English learning in Malaysia. It can be seen that for both pre-service teachers and in-service teachers, TPACK will have a significant positive association with their intention of technology use behavior. In addition, AIL-TK can also predict teachers’ Behavioral Intention directly, indicating that when EFL teachers have more Technological Knowledge about AI-based language applications, they are more likely to continue to use AI to support their teaching. Hence, language teacher educators may need to begin introducing these useful technologies to teachers. AI-TPK did not have a significant direct effect on Behavioral Intention, but post hoc analysis shows that it had an indirect effect on Behavioral Intention through AI-TPACK, with a total indirect value of 0.18 (p < 0.01). This indicated that teachers who knew how AI could facilitate teaching and learning in general may not have always wanted to use AI unless they knew how AI can be used to support specific EFL teaching.

5.3 Relationship among AIL-TK, AI-TPK, and AI-TPACK

As the results show, AIL-TK predicts AI-TPK, while AIL-TK and AI-TPK predict AI-TPACK. This is consistent with a previous study about ICT (Koh et al., 2013). The findings contribute an initial set of items to measure the effectiveness of teacher professional development activities for AI-supported teaching and learning in the field of language education. In addition, the SEM revealed that AIL-TK is foundational to teachers’ development of AI-TPACK, likely mediated by AI-TPK. We suggest that teacher professional development activities should introduce relevant technologies and guide teachers in creating self-directed learning activities for students as AI technology apparently supports such activities well (Bibauw et al., 2019). AI technologies can provide immediate feedback to students (Jiang et al., 2021), and well-designed platforms can provide suggestions for students to remediate unsuccessful learning or move on to more advanced learning (Pokrivcakova, 2019). In other words, AI-TPK could be focused on supporting students’ self-directed learning. More attention can be devoted to guiding teachers as to how to skillfully integrate AI technical knowledge with existing PCK, especially in areas where students need more practice and feedback to improve their language skills.

5.4 Relationship among UTAUT and TPACK

As the results show, AIL-TK, AI-TPK, and AI-TPACK all had direct predictive power on Effort Expectancy, but only AI-TPACK had a direct effect on Performance Expectancy. This finding extends past findings which employed TPACK and the Technology Acceptance Model as the theoretical models (Yang et al., 2021). Previous research found that teachers’ TPACK would influence their Effort Expectancy and Performance Expectancy. In this research, teachers with better AIL-TK, AI-TPK, and AI-TPACK would perceive the AI teaching systems as being easy to use, which is similar to previous studies (Hsu, 2017; Sug & Ko, 2020; Yang et al., 2021). However, teachers with good AIL-TK and AI-TPK would not think that AI is useful unless they believe they could integrate AI technology, pedagogy, and EFL content well, which differs from the findings of previous research. In other words, teachers’ pedagogical competence as represented by TPACK can be an independent source of teachers’ behavioral intention to use AI for EFL while it enhances teachers’ acceptance.

In creating new AI-TPACK for students’ learning of EFL with AI, teachers need support when they encounter technical hiccups. As a factor, Facilitating Conditions is positively associated with the TPACK knowledge. It reveals the importance of technical and expert supports in the early stage of technology adoption (Geng et al., 2021). The model in this research clarified how UTAUT and TPACK could be synthesized in a new way.

6 Implications

Based on the research of UTAUT and TPACK, this study investigated 470 middle school EFL teachers’ perceptions of using AI for language teaching in an AI education demonstration district in China, and shows that EFL teachers are positive with regard to the measured factors. This provides evidence that using AI in EFL is supported and welcomed by teachers when they are facilitated to develop the necessary knowledge associated with TPACK.

In terms of the factors influencing teachers’ Behavioral Intention to use artificial intelligence in teaching, the complex interrelations (see Fig. 2) have been mapped out to provide teacher educators and policymakers with a theoretically grounded and empirically tested scheme to foster teachers’ Behavioral Intention to use AI in English teaching.

7 Limitations and future research

This study explored the structural relationships between factors that predict teachers’ Behavioral Intention to use AI technology. Limited by the method, this research could not explain causality. To verify the causes of teachers’ intention, an experimental design is needed in the future. It is also suggested that teachers’ actual behaviors be included in addition to their intentions. For this purpose, computer logs and classroom teaching videos could provide data for future research.

For the measurement of teachers’ AI-TPACK, the results may reflect the TPACK self-efficacy of teachers, but not the actual technological pedagogical content knowledge (Schmid et al., 2020). Other objective measurement methods of teachers’ AI-TPACK can be developed. When we compiled the questionnaire, the AIL-TK factor contained seven items, but finally only three which related to specific AI technology were retained, while other items summarized with reference to the expression of the previous questionnaire were deleted. In addition, the correlation between AI-TPK and AI-TPACK was high (0.84), which calls for further development. Meanwhile, not all seven knowledge areas of TPACK were studied in this research, leading to the limitation of the exploration of the relationship between UTAUT and AI-TPACK. In future research, it can be studied more comprehensively.

This study investigated seven factors that were positively associated with teachers’ Behavioral Intention, but it did not consider possible negative factors, such as AI anxiety (Wang & Wang, 2019) and teachers’ concerns (Geng et al., 2021). Considering that some teachers worry that AI may weaken the communication characteristics of foreign language learning (such as body and facial expression) (Amaral & Meurers, 2011), more negative factors can be taken into account in future research. In addition, the sample of this study only selected teachers in an AI education demonstration area in China, which needs further verification from other regions and countries in future research.