1 Introduction

Technological Pedagogical and Content Knowledge (TPACK), as essential teacher knowledge, has been considered the key to effective teaching in this era of information explosion (Koehler & Mishra, 2008; Mishra & Koehler, 2006, 2007; Niess, 2008). With education worldwide emphasizing 21st-century competencies (e.g., EU, 2018; MOE, 2014; OECD, 2018; UNESCO, 2015; US Department of Education, 2015), teachers nowadays need to develop students’ thinking skills such as complex problem solving, critical and creative thinking (Mishra et al., 2011; Voogt & Roblin, 2012). Therefore, using technology to support such pedagogical improvements has become essential to all teachers, especially in this post-COVID-19 era. Different scholars (e.g., Koehler et al., 2014; Mishra & Kereluik, 2011; Tsai & Chai, 2012) have suggested that TPACK in the twenty-first century should emphasize learning thinking skills.

Research on TPACK that explores how to support teachers to teach 21st-century competencies or thinking skills has boomed. For example, Mishra et al. (2011) delineated a list of cognitive tools to integrate technology in developing higher-order cognitive minds. Beriswill et al. (2017) explored specifically the contents, pedagogies, and technologies that promote 21st-century skills. Valtonen et al. (2017) further devised a TPACK-21 questionnaire for the 21st-century educational context. In addition, Shafie et al. (2019) proposed establishing the relationship between TPACK and 21st-century skills. However, none of the above studies dealt with domain-specific TPACK.

Developing TPACK needs to situate in a specific domain and context (Chai et al., 2011; Rosenberg & Koehler, 2015; Voogt et al., 2012). In English as a foreign language (EFL), there has been a need to understand TPACK to prepare EFL teachers for computer-assisted language learning (CALL) environments (Tseng et al., 2020). On the other hand, thinking skills are also necessary for CALL teachers (Egbert & Shahrokni, 2019; MOE, 2018). However, technology integration that encouraged higher-order thinking skills was less observed in many EFL settings (Tseng, 2019; Tseng et al., 2020; Wu & Wang, 2015). According to Tseng et al. (2020), technology in EFL classrooms primarily motivated students or provided language input, evoking mainly lower-order thinking skills. Even though some teachers had a high confidence level on their TPACK, their teaching practices incorporated mostly lower-order thinking and reflected a minimum level of technology integration (Tseng, 2019; Tseng et al., 2020; Wu & Wang, 2015). Therefore, as Tseng (2019) concluded, a questionnaire instrument that directs attention to the level of technology integration in EFL settings is necessary.

Recently, there have been many TPACK assessments in EFL settings (Arslan, 2020; Tseng et al., 2020). Be it revised from the domain-general TPACK (e.g., Bagheri, 2020; Nordin & Ariffin, 2016; Prasojo et al., 2020; Solak & Çakır, 2014) or developed and validated independently (e.g., Başer et al., 2016), these EFL-TPACK instruments shared the same basic TPACK structure without including 21st-century learning or thinking skills. Thorough reviews of these EFL-TPACK measurement tools (Arslan, 2020; Tseng et al., 2020) showed no assessment that directs attention to the level of technology integration or teaching thinking skills.

When using the aforementioned EFL-TPACK instruments, teachers rated themselves pretty high on TPACK (Sarıçoban et al., 2019; Wu & Wang, 2015). Without considering the level of technology integration in teaching thinking skills, the existing EFL-TPACK instruments might give teachers an impression that they were effective in integrating technology (Tseng, 2019; Wu & Wang, 2015). Directing teachers’ attention to observe if they incorporate thinking skills in their TPACK might help them rethink their level of technology integration in actual practices. Therefore, attempting to help EFL preservice teachers understand their TPACK in teaching thinking skills, this study created a two-dimensional (2D) EFL-TPACK diagnostic tool. This paper reports the development and validation process of the 2D EFL-TPACK instrument that assesses TPACK in teaching thinking skills in the EFL domain.

2 Theoretical frameworks

To create a 2D TPACK instrument means to add one dimension to the existing TPACK framework. Two frameworks that support the development of teaching and learning serve as the foundation of this instrument: (1) the TPACK framework (Mishra & Koehler, 2006); and (2) the revised Bloom’s Taxonomy (Krathwohl, 2002).

2.1 TPACK

TPACK has been extensively studied as a body of knowledge to guide teaching and learning in the digital age. Mishra and Koehler’s (2006) TPACK framework, with its basis on Shulman’s (1987) framework of pedagogical content knowledge (PCK), includes seven knowledge bases: (1) technological knowledge (TK), (2) pedagogical knowledge (PK), (3) content knowledge (CK), (4) PCK, (5) technological pedagogical knowledge (TPK), (6) technological content knowledge (TCK), and (7) technological pedagogical content knowledge (TPCK). Mishra and Koehler (2006) described TPCK as “knowledge that goes beyond” the three primary knowledge bases (TK, PK, CK) (p. 1028). The terms TPCK and TPACK (Thompson & Mishra 2007-2008) mean the same in the TPACK literature. However, this study uses the two acronyms differently: TPCK as the synthesized knowledge base and TPACK as overall teacher knowledge (including component knowledge bases) (Fig. 1).

Fig. 1
figure 1

Content-specific Synthesized Knowledge Bases in the TPACK Framework

The relationships among the seven knowledge bases in the TPACK framework are ambiguous (Angeli et al., 2016; Archambault & Barnett, 2010; Cox & Graham, 2009). Evidence shows that the component knowledge base does not directly lead to the growth of PCK (Lee et al., 2007) or TPCK (Angeli & Valanides, 2009). Some (e.g., Bostancioğlu & Handley, 2018) argued that PK and PCK should merge as PCK. Some (e.g., Archambault & Barnett, 2010; Pamuk et al., 2015) suggested that the essence of the framework lies in the synthesized knowledge bases (PCK, TPK, TCK, TPCK). Still, TCK, TPK, and TPCK are difficult to distinguish (Angeli & Valanides, 2009; Archambault & Barnett, 2010; Başer et al., 2016; Schmidt et al., 2009). On the other hand, TCK could predict TPCK but not PCK (Cheng, 2017), and PCK had a low effect on TPCK (Pamuk et al., 2015). These findings imply that among the synthesized knowledge bases, PCK is very different from TPK, TCK, and TPCK.

According to Rahimi and Pourshahaz (2019) review of studies on validating TPACK constructs, TPACK requires a specific context to develop. Creating an assessment instrument specific for the EFL environment helps teachers observe their TPACK in their particular content (English language arts). PCK and TPCK, as content-specific synthesized knowledge bases, are the constructs that involve subject-specific representations (Cox & Graham, 2009). These two content-specific synthesized knowledge bases have long been considered essential: TPCK as unique (Angeli & Valanides, 2009) and PCK as critical (Kleickmann et al., 2012). Therefore, the TPACK dimension of this instrument was structured around PCK and TPCK (as the gray area in Fig. 1).

2.2 Revised Bloom’s Taxonomy

The revised Bloom’s Taxonomy has been broadly accepted as a set of cognitive skills to facilitate and analyze teaching and learning. Anderson et al. (2001) revised Bloom’s 1956 Taxonomy. The Revised Bloom’s Taxonomy (Krathwohl, 2002) includes lower-order thinking skills (remembering, understanding, applying) and higher-order thinking skills (analyzing, evaluating, creating). With technological advances in education, scholars have applied the revised Bloom’s Taxonomy in digital learning (e.g., Churches, 2008; Mishra et al., 2011) or used it to observe the development of TPACK (e.g., Tseng, 2008). When analyzing EFL teachers’ TPACK in CALL classrooms, Paneru (2018) categorized TPACK into two types, identical to lower-order thinking skills (Mechanistic) and higher-order thinking skills (Transformative). Also, Tseng (2019) described the levels of technology integration among EFL teachers in CALL lessons with a SAMR (Substitution – Augmentation – Modification – Redefinition) model. According to Tseng (2019), the SAMR model also resembles lower-order and higher-order thinking skills. Therefore, the cognitive process dimension of the instrument was designed with the revised Bloom’s Taxonomy to understand the levels of technology integration in CALL lessons.

2.3 Concept structure

Based on the literature reviews in the EFL field (Arslan, 2020; Wang et al., 2018), most TPACK assessment tools adopted self-reported measures. This instrument aims for student teachers to observe their TPACK development. The self-report measure serves well for directing attention to reflect on integrating thinking skills and technology in education. Thus, this instrument adopted the self-reported quantitative measure.

According to the literature (Sarıçoban et al., 2019; Wu & Wang, 2015), most teachers showed a high confidence level of TPACK when not considering thinking skills. This present scale added a cognitive process dimension to guide thinking skills when student teachers reflect on their TPACK. In this 2D EFL-TPACK scale, the TPACK dimension (X-axis) consists of PCK and TPCK, and the cognitive process dimension (Y-axis) includes six levels of thinking skills. Figure 2 presents the concept structure of this instrument.

Fig. 2
figure 2

Adapted from Rex Heer, Center for Excellence in Learning and Teaching, Iowa State University. (https://meestervormgever.wordpress.com/2015/02/05/revised-blooms-taxonomy-center-for-excellence-in-learning-and-teaching)

2D EFL-TPACK Concept Structure. Note.

3 Method

The purpose of this study is to establish the reliability and validity of this scale. According to the literature review of Tseng et al. (2020), developing and validating an EFL-TPACK instrument generally includes three steps: (1) pooling items; (2) content review; (3) factor structure exploration. This study followed the three steps. Pooling Items was to collect items from existing validated instruments. Content Review was to establish content validity in English and Chinese. Finally, Factor Structure Exploration would collect statistical data to test its validity. The following paragraphs provide a detailed explanation.

3.1 Pooling items

The primary survey items in this study were selected from some TPACK assessment tools with high validity and reliability (e.g., Başer et al., 2016; Koh et al., 2010; König et al., 2016; Sahin, 2011; Wu & Wang, 2015). The items in the PCK and TPCK sections were cross-examined with standards of EFL teacher professional knowledge (Kuhlman & Knežević, 2013; Tai, 2018). In these validated instruments, three themes in PCK and four themes in TPCK subscales were identified: (1) instructional design (PCK1); (2) teaching strategies (PCK2); (3) learning assessment (PCK3); (4) technology function (TPCK1); (5) technological instructional design (TPCK2); (6) technological teaching strategies (TPCK3), and (7) technological learning assessment (TPCK4).

Items related to these themes in existing TPACK assessments were selected and revised for teaching children English in EFL settings. These items served as the preliminary statements in this present scale. These initial statements constitute seven Primary survey items in the TPACK dimension (Fig. 3). Next, six cognitive levels of each Primary item were generated according to the verb lists in Krathwohl (2002) and Churches (2008). In this survey, the PCK subscale includes 21 items: instructional design (7 items), teaching strategies (7 items), and learning assessment (7 items); the TPCK subscale has 28 items: function of technology (7 items), technological instructional design (7 items), technological teaching strategies (7 items), and technological learning assessment (7 items). Figure 3 presents the scale structure. The seven Primary items and verbs in each cognitive level were in English, so the English version of the 2D EFL-TPACK survey was created first and then translated into Chinese.

Fig. 3
figure 3

2D EFL-TPACK Scale Structure

3.2 Content review

Ten experts in English teaching, CALL, or TPACK were invited to judge the appropriateness of the survey content. Every expert provided written suggestions about both the English and Chinese versions of the survey. According to their comments, the survey items were revised. After revision, a pilot paper–pencil test was administered in a class of 57 EFL student teachers majoring in elementary education. The student teachers spent around 30 min discussing any confusing items and interpreting the statements in the survey. The expert review and the pilot study tested the comprehensibility, relevance, and feasibility to establish content validity. Table 1 summarizes the participants in this stage.

Table 1 Participants in the Content Review Stage

The completed survey includes 49 statements with a five-point Likert scale, ranging from strongly disagree to strongly agree (1 = strongly disagree; 2 = disagree; 3 = neutral; 4 = agree; 5 = strongly agree). The survey result indicates a radar image that includes total TPACK scores (perfect scores = 245) and eight scores of the eight sub-categories (two subscales and six levels of thinking skills). Figure 4 demonstrates a sample TPACK result of a student–teacher before and after a teacher education course.

Fig. 4
figure 4

Sample Result of 2D EFL-TPACK Survey

3.3 Factor structure exploration

The completed 49-item survey was put online and tested for its validity. The online 49-item survey began with three background information questions: gender, location, and years of teaching English. This study adopted stratified sampling and invited teacher educators to distribute the online survey to EFL student teachers at the elementary level. The respondents followed a link to read a research description and consent information before taking the online survey. The respondents in Taiwan could leave their email addresses to collect a gift coupon and their TPACK results, while those in other countries were encouraged to leave their email to receive their TPACK results.

The teacher educators (N = 13) were from education universities in different EFL settings, including Taiwan, China, Japan, and France. The teacher educators solicited student teachers through their classes, student groups, or department websites. Respondents in Taiwan took mainly the Chinese version of the 2D EFL-TPACK survey, while those in other EFL areas used the English version. After deleting incomplete responses, this study collected 525 valid survey results, including 440 in Taiwan and 85 in China, Japan, and France. The sample size met the recommendation of Hair et al. (2019): the ideal ratio for items-to-participants is 1:10. Table 2 summarizes the participants in this stage. This study used SPSS and Mplus to analyze data, examining reliability with Cronbach’s alpha and factor structure with Confirmatory Factor Analysis (CFA).

Table 2 Participants in the Validation Stage

4 Results and discussion

This 2D EFL-TPACK survey contains two subscales representing two content-specific synthesized knowledge bases (PCK and TPCK) and six levels of thinking skills. This section reports the results of Cronbach’s alpha reliability, CFA, and TPACK Perceptions of the EFL teachers.

4.1 Alpha reliability

Cronbach’s alpha is a standard measure to estimate internal consistency (Bonett & Wright, 2015). As shown in Table 3, the reliability coefficient of the TPACK dimension (49 items; α = 0.98) was close to 1.00. According to Gay et al. (2012), any scale with a coefficient close to 1.00 indicates high reliability. Also, Cronbach’s alphas of the PCK subscale (21 items; α = 0.97) and TPCK subscale (28 items; α = 0.98) in the TPACK dimension were all pretty high.

Table 3 Alpha Reliability of the 2D EFL-TPACK Scale

The results of the Cognitive Process Dimension also demonstrate satisfactory reliability. As Table 3 indicates, the Cronbach’s alpha value of the Cognitive Process Dimension (49 items; α = 0.98) was also close to 1.00. Cronbach’s alpha reliability coefficients for the sub-categories (7 items each) were between 0.89 and 0.92. All the items in this survey are related and establish internal consistency reliability. High reliability means the effect of errors is small, and the survey produces consistency of the scores (Gay et al., 2012). The statistical evidence shows that this 2D EFL-TPACK survey is highly reliable.

4.2 Confirmatory factor analysis

CFA is used to examine the latent structure of a measurement instrument for construct validation (Brown, 2015). Based on Hair et al. (2019), this study adopted the most common techniques for testing dimensionality: absolute fit measures, incremental fit measures, and standardized root mean square residual (SRMR). Absolute fit measures use normed chi-square (X2/df) and root-mean-square error of approximation (RMSEA) to estimate the overall fitness of the model. Incremental fit measures use Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) to assess the deviations of the present model from a null model. Residual differences use SRMR to understand its characteristics.

According to the CFA results (Table 4), the goodness of fit indices revealed a good model fit. In Table 4, the chi-square test of models did not seem to show exact fit (p < 0.01). However, X2 statistic is so sensitive to sample size that it can be ignored (Schermelleh-Engel et al., 2003; Vandenberg, 2006). Chi-square values divided by df (X2/df) in models PCK (p = 2.98), TPCK (p = 2.86), and each model in the cognitive process dimension (ranging from 2.0 to 2.68) were all within acceptable limit of 5 or less. Also, RMSEA (≤ 0.08) indicated a fair fit (Kline, 2013; McDonald & Ho, 2002). As to other indices, according to the cutoff values suggested by Hu and Bentler (1999), CFI (≥ 0.90), TLI (> 0.90), and SRMR (≤ 0.08) all signaled a good fit. The results of CFA confirmed the factor structure of the survey.

Table 4 2D EFL-TPACK CFA Model Fit Statistics and Thresholds Indices

As for standardized model results, after examining with STDYX Standardization, the model diagrams of PCK and TPACK are graphically depicted in Figs. 5 and 6. Both the PCK and TPCK model showed a satisfactory fit. As Figs. 5 and 6 indicate, each factor is a representation of each subscale as a latent trait in PCK and TPCK. The empirical data statistically fit the theoretical model: every item in each category belongs to the anticipated category. The models in this scale were supported by empirical data set. These results show that the scale has good construct validity.

Fig. 5
figure 5

PCK model diagram

Fig. 6
figure 6

TPCK model diagram

4.3 TPACK perceptions of the EFL teachers

Overall

The participants in this study revealed moderate confidence about their TPACK (Total M = 174.49, SD = 32.04). The participants scored the highest in their Primary scores (Primary M = 3.68, SD = 0.69), scores of the original TPACK items from the current available TPACK assessments. These results were consistent with the observation in earlier studies (Sarıçoban et al., 2019; Wu & Wang, 2015): teachers tended to report pretty high TPACK when not considering thinking skills. Also, as expected, the teachers were more satisfied with their TPACK on teaching lower-order thinking skills than higher-order thinking skills: Scores in the categories of Remember, Understand, and Apply were higher than those in Analyze, Evaluate, and Create. Generally speaking, the participants were relatively less confident (below average 3.5) in their TPACK involving higher-order thinking skills, especially Analyze (M = 3.45, SD = 0.70) and Create (M = 3.46, SD = 0.74).

Gender

Different from the results in Koh et al. and’s (2010, 2014) studies, male and female teachers did not seem to show different confidence toward their TPACK. Gender differences were not observed in this group of participants. The female teachers (N = 439, 83.6%) scored slightly higher on PCK (M = 3.69, SD = 0.83) than the male (N = 86, 16.4%) (M = 3.51, SD = 0.91), and the male teachers scored marginally higher on TPCK (M = 3.67, SD = 0.75) than the female (M = 3.62, SD = 0.75). However, there were no statistical differences for gender on all aspects of TPACK (p > 0.05). The finding is as Koh et al. (2010) have anticipated: The gender gaps close when technology becomes prevailing. According to the Pearson correlation test, there was no correlation between gender and TPACK (r = 0.007, p = 0.875 > 0.001). The EFL teachers’ TPACK was not strongly related to gender, and there were no gender differences in their TPACK perceptions.

Areas

When analyzing the sample in Taiwan (N = 440), the results of ANOVA showed no statistical differences among different areas, F(3,436) = 2.28, p = 0.79 > 0.05. However, when analyzing the total sample (N = 525) in all EFL settings, significant differences were found, F(6, 518) = 7.83, p < 0.001. EFL teachers in Tokyo, Japan (M = 147.02, SD = 30.87) reported significantly less satisfaction with their TPACK than did those in Taiwan (M = 176.57, SD = 30.27), Xuzhou, China (M = 181.31, SD = 39.56), and Bordeaux, France (M = 185.36, SD = 33.73). The EFL teachers in Bordeaux reported the highest TPACK satisfaction than those in Asian countries. Castéra et al. (2020) reported a similar finding: teachers in France had higher scores than those in Asia. The reason why the EFL teachers in Japan self-reported significantly lower TPACK probably had to do with their cultural characteristic of being modest. However, it is beyond the scope of this study to discuss this part.

Years of english teaching experience

According to the Pearson correlation test, the EFL teachers’ TPACK scores were strongly and positively correlated with their years of English teaching experiences (r = 0.234, p < 0.001). When comparing TPACK among groups with different years of English teaching experience, significant differences were found, F(3, 521) = 11.76, p < 0.001. Understandably, EFL teachers with English teaching experience reported significantly higher TPACK perceptions than those without any English teaching experience. The EFL student teachers (N = 267, 50.86%) who did not have any English teaching experience self-reported significantly lower TPACK (M = 166.76, SD = 31.55) than the other three groups with English teaching experience.

Course grades

Chai et al. (2011) suggested collecting data from assignment grades to understand if they correlated to the scores of a TPACK survey. Thus, this study also explored the correlation between course grades and TPACK scores. Some participants (N = 109) provided consent to use their course grades, so their course grades were compared with their TPACK scores. According to the Pearson correlation test, the TPACK scores of the 109 EFL student teachers significantly and positively correlated with their course grades (r = 0.210, p < 0.05). The evidence confirms the usefulness of the 2D EFL-TPACK survey. The high-achieving student teachers tended to have high TPACK self-efficacy.

5 Conclusion

With the demands of the twenty-first century for integrating knowledge of the subject matter, technology, and student thinking, this study started with a need for EFL teachers to observe their TPACK involving thinking skills. This paper proposes a 2D EFL-TPACK scale that integrates two essential frameworks supporting teaching and learning: TPACK and revised Bloom’s Taxonomy. This quantitative instrument measures confidence levels in PCK and TPCK and their cognitive levels. The empirical data statistically fit the theoretical model. The evidence shows that this instrument has high reliability and validity and that this scale is helpful for understanding learning needs. In addition to exploring their CALL instruction, this scale further helps EFL teachers reflect on how teaching English with technology facilitates learning about thinking skills.

The preliminary survey results of this scale confirmed findings in the literature and provided new insight. Those consistent with the literature include: (1) The EFL teachers reported relatively higher TPACK when not considering thinking skills; (2) The EFL teachers in different cultures reported quite different levels of confidence in TPACK; (3) The TPACK scores were significantly and positively correlated with their years of English teaching experience. Still, this scale revealed further information that the existing TPACK assessments could not. First, the EFL teachers were less confident in their TPACK teaching higher-order thinking skills, especially preparing students to analyze. Second, the EFL teachers in Bordeaux and Tokyo were most satisfied with their TPACK teaching students to apply, while those in Xuzhou and different cities in Taiwan, preparing students to remember. Also, the high-achieving EFL student teachers reported high TPACK self-efficacy, which confirmed the usefulness of this scale. This scale could serve as a good reference for diagnosing TPACK development in EFL teaching in the twenty-first century.

Future research may further explore the use of this scale in different EFL settings or observe how student teachers develop their TPACK. As a self-reported measure, this scale merely reflects the confidence level of the teacher but not necessarily their TPACK in a real classroom. Some student teachers might be overconfident in their TPACK before entering a real English classroom, so sometimes the scale scores drop after their reality shock. Finally, the correlation sample used in this study was only 109 of the Taiwan sample. Further exploration may be necessary to understand the relationship between the student teachers’ TPACK and their academic performances.