Introduction

In Mainland China (hereafter China), kindergarten refers to preschools offering full-time Early Childhood Education (ECE) programs for children aged 3 to 6. According to China’s Ministry of Education (MoE), the proportion of children enrolled in kindergartens has expanded rapidly in the past decade, rising from approximately 62.3% in 2011 to 88.1% in 2021 (MoE, 2022). Kindergarten influences children’s developmental outcomes because they spend much time in kindergartens daily (e.g., up to 8 h) (Hu et al., 2020a). In ECE, process quality is defined as the level of richness and complexity children experience in their interpersonal activities in the classroom, including teacher–child and peer interactions (Burchinal et al., 2002). Researchers have found that process quality has a closer and more proximal association with children’s development than other structural features, such as teacher–child size ratio or teachers’ qualifications (Wang et al., 2020).

Policymakers, ECE professionals, and researchers in China are committed to improving the quality of children’s kindergarten classroom experience, specifically the quality of teacher–child interactions (MoE, 2012). Teacher–child interaction refers to the “daily back-and-forth exchanges that teachers and children have with one another throughout the day” (Hamre et al., 2012b, p. 89). International studies have found that high-quality teacher–child interactions benefit children in various domains, including cognitive, socialemotional, and behavioral development and academic learning (Hamre et al., 2014; Leyva et al., 2015).

In the past 15 years, teacher–child interaction has been commonly measured using the Classroom Assessment Scoring System (CLASS; Pianta et al., 2008). There are six versions of CLASS, specifically designed to analyze the quality of interactions between teachers and infants, toddlers, pre-K children, K-3 children, as well as upper elementary and secondary students. This study focuses on the pre-K and K-3 versions, which are suitable for children aged 3 to 6 years. Building on the teaching-through-interaction framework (Hamre et al., 2014), CLASS captures the multifaceted nature of interactions between teachers and young children, focusing mainly on instructional, socioemotional, and organizational domains.

CLASS was developed by researchers in the United States (U.S.); however, the importance of its application has been recognized in numerous countries. For example, European studies, such as in the Netherlands (Slot et al., 2017), have highlighted how CLASS scores may be attributed to cultural differences in teachers’ classroom management styles. Similarly, research in African settings such as Tanzania (Shavega et al., 2014) has demonstrated CLASS adaptability in contexts with varying resource levels and educational challenges. This tool has also been used in countries across America and Europe, including Chile (Leyva et al., 2015), Finland (Pakarinen et al., 2010), Germany (von Suchodoletz et al., 2014), Portugal (Cadima et al., 2010), and Poland (Cadima et al., 2023). In the East, the CLASS has been validated in China (Hu et al., 2016b), showing that the psychometric characteristics of this measure are similar to those of Western countries, despite important sociocultural differences. The authors claim that CLASS is applicable in Chinese kindergartens, providing meaningful insights into the quality of teacher–child interactions. CLASS has also been recently used in other Asian countries, such as Singapore (Ng et al., 2021).

The widespread utility of the CLASS globally and its unique insights in different countries highlight the importance of examination in Chinese societies, particularly in China, because of its large size. For several reasons, understanding how CLASS has been used in China is relevant to international scholars. First, while researchers have increasingly used CLASS in China, much of this work has been published in Chinese and, thus, is inaccessible to Western scholars. Reviewing the findings of this work may provide useful information about the unique cultural and social context in which the tool is applied, facilitating cross-cultural studies and enabling the comparison of teacher–child interaction quality across different societies. Examining how CLASS is used in China will add to our knowledge of the tool’s strengths and weaknesses in various cultural settings. This information will provide valuable feedback to CLASS developers, contributing to the tool’s refinement to make it more suitable for international applications. Finally, the ongoing debate about whether CLASS effectively predicts child outcomes (Burchinal, 2018; Guerrero-Rosada et al., 2021) has been based largely on evidence from countries in the Americas and Europe. Thus, reviewing evidence from China will provide an additional perspective to this discussion.

A comprehensive review is needed to show how CLASS has been used at the kindergarten level in China. This scoping review analyzes existing empirical research on CLASS in Chinese kindergartens, identifying research trends and gaps. Based on the findings, we provide suggestions for policymakers, ECE researchers, and stakeholders to enhance the quality of teacher–child interactions in China and countries with similar sociocultural characteristics.

Literature review

Classroom assessment scoring system (CLASS): characteristics and study types

CLASS measures three domains of teacher–child interaction: emotional support, classroom organization, and instructional support (Pianta et al., 2008). Emotional support concerns the positive and warm relationship between teachers and children. Classroom organization refers to how teachers use proactive behavior management strategies to keep students engaged in learning and play activities. Instructional support measures teachers’ effectiveness in promoting children’s higher-order thinking skills and language abilities (Hamre et al., 2014). Each domain comprises ten dimensions, with ten dimensions collectively assessing how teachers support children’s development and learning (Pianta et al., 2008).

Before using CLASS, all observers must undergo training and be certified to use CLASS coding. This certification is valid for only one year; thus, observers must retest annually to renew their qualification as reliable observers/coders. Interrater reliability must also be established before classroom coding. Reliability is enhanced when two raters are involved in the scoring process and when they write detailed justifications for their respective scores. According to Pianta et al. (2008), coders must independently code at least four 20-min video cycles drawn from a typical day, including whole-group teaching, free play, mealtime, or routine care. For each cycle, observers provide a final rating on a seven-point Likert scale for each of ten dimensions, with a score of 1–2 considered low quality, 3–5 medium quality, and 6–7 high quality. The average scores of all cycles are calculated for each of the three domains in each classroom.

In the U.S., CLASS has been used in large-scale studies that examine teacher–child interactions and their influence on children’s development and learning (Burchinal et al., 2011; Keys et al., 2013). For example, the National Center for Early Development and Learning (NCEDL) followed 2,995 children enrolled in 721 kindergarten classrooms randomly selected from 11 states. The 11 states served approximately 80% of the children in the U.S. who attended the pre-K program in 2001–2003. Teacher–child interactions were observed and rated. Many studies have been produced based on the NCEDL dataset, focusing on several outcomes. Meta-analyses of NCEDL studies have shown that instructional support is significantly associated with academic outcomes, whereas emotional support is associated with emotional outcomes (Burchinal et al., 2011). However, inconsistent findings have been reported. In their metaanalytical review, Perlman et al. (2016) found that classroom organization was the only domain related to cognitive development, whereas instructional support was the only domain related to socioemotional development. Furthermore, a recent study by Guerrero-Rosada et al. (2021) found no significant relationship between the three domains and children’s academic and cognitive outcomes. Despite these inconsistent findings, researchers continue to rely on CLASS to investigate teacher–child interaction quality.

CLASS has also been used to assess and appraise teachers’ performance. In the U.S., CLASS has been recognized as an evaluation measure of whether classroom quality meets the statutory requirement of the Head Start Act. CLASS has also been tied to the Head Start Designation Renewal System, with delegated classrooms sampled representatively. Minimum scores for each domain were established to reflect a guarantee-level score (Delaney & Krepps, 2021). Programs that fail to meet the minimum scores must retake the assessment and obtain grants only when they reach the established minimum threshold. Similarly, in Australia, the Effective Early Education for Children (E4Kids) project used CLASS to assess the effectiveness of licensed ECE services. This large-scale study recruited approximately 2,500 children in Victoria and Queensland, providing evidence that the quality of teacher–child interactions greatly influence child outcomes (Tayler et al., 2016).

Finally, CLASS has also been used as a framework for teachers’ professional training to improve the quality of their interaction with children. CLASS developers conceptualized two professional development models: My Teaching Partner (MTP) (Pianta et al., 2008) and Making the Most of Classroom Interactions (MMCI) (Hamre et al., 2012a). Previous research has shown that both models improve the quality of teacher–child interactions. For example, in Early et al. (2017), 486 teachers from 336 kindergarten centers from Georgia’s pre-K program were randomly assigned to one of three groups (MTP, MMCI, and control groups). Their behaviors were observed using CLASS pre- and post- training, with the findings indicating that both models improved teachers’ CLASS scores compared with the control group.

In summary, CLASS has been widely used in the United States and other Western countries for various research purposes and in the context of teaching performance evaluations and teacher professional development programs. In the following section, we describe some critical characteristics of kindergartens in China and introduce CLASS-related research conducted in kindergarten settings.

Context: kindergarten education in China

In China, a long-lasting divide exists between kindergartens in developed versus developing regions and urban versus rural areas (The State Council of China, 2005). According to the MoE (2022), “the number of children attending kindergartens was 48.052 million in 2021, an increase of 13.808 million compared to 2011” (para. 2). The gross enrollment rate has seen the fastest growth in China’s Central and Western regions (with an increase of 80% of new kindergartens established in the last 10 years located in these regions), and in rural areas (with an increase of 60%). The gaps in access between developed and developing regions and urban and rural areas have narrowed; however, there remains a lack of qualified kindergarten teachers in developing and rural areas (Xiao & Liu, 2022), where the overall teaching quality remains comparatively lower than that in developed and urban areas (Hu et al., 2014).

Chinese kindergartens are categorized as public and private (MoE, 2012). Both types charge affordable and government-regulated tuition fees. However, private kindergartens operate primarily for profit, while public kindergartens receive public funding (e.g., the government pays for operating costs) and are nonprofit. Unlike Western countries, in China, most private kindergartens serve children from socioeconomically disadvantaged families who living in rural areas (Hu et al., 2014). Most rural and private kindergartens are not part of the publicly funded sector, meaning that teacher salaries and benefits are lower than those in urban and public kindergartens. Consequently, public kindergartens generally attract better qualified and more stable teachers than private institutions (Hu & Li, 2012; Hu et al., 2016d).

The cultural value of collectivism is such that kindergarten teachers in China emphasize rules, order, and discipline in the classroom (Ning, 2019). Most activities are teacher-directed whole-group activities with very structured and rigid daily routines. Kindergartens classes are usually large (averaging 30–35 students) with a low adult-to-child ratio (1/15 to 1/20) (Hu & Li, 2012). Because class size may affect the quality of teacher–child interactions, it has been recommended that class sizes should be 20–25 for 3- to 4-year-olds, 25–30 for 4–5-year-olds, and 30–35 for 5-to 6-year-olds (MoE, 2012). Given the above unique features of kindergartens in China, it is necessary to understand how CLASS, a Western-developed tool, has been employed in the Chinese context and the results it has obtained.

GOALS

This scoping review analyzed empirical research that has used CLASS in Chinese kindergartens (children 3 to 6). The study had two goals. Goal 1 was to provide an overview of the literature using CLASS in Chinese kindergartens. We described the characteristics of the CLASS-based studies carried out thus far using a series of analytic categories (e.g., region, setting, research design, sample size). Our review considered all journal articles in the main academic databases, with no preselection based on quality or rigor, as our goal was to describe the characteristics of the prior works available. Goal 2 was to identify the various purposes for which CLASS has been used in prior research studies. We illustrate these purposes with specific examples, which will help inform researchers and policymakers to better translate research findings into educational policy reforms.

Method

Literature search and criteria for inclusion and exclusion

We identified eligible studies published from 2008 (when CLASS was released) to March 2023 (when the search was completed). Only peer-reviewed empirical articles were considered to focus exclusively on the most rigorous publications. Other sources, such as book chapters, conference proceedings, and master’s and doctoral dissertations, were excluded. Articles were included if they met the following criteria: (1) the study was conducted in Mainland China; (2) the participants were kindergarten teachers in charge of children aged 3 to 6; and (3) at least some CLASS rating scores were provided. Articles that did not meet these basic criteria were excluded. As explained above, according to the CLASS manual (Pianta et al., 2008), obtaining reliable scores requires observers with adequate certification. Furthermore, reliability between two coders must be established and scoring justifications provided. While these factors are essential for appropriately using CLASS, we did not consider them as criteria for inclusion of articles in our review, as our goal was to describe the characteristics of all works available, even those with methodological shortcomings (e.g., lack of certification training, limited scoring reliability).

We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for determining study inclusion. Publications in the Chinese language were searched in the China National Knowledge Infrastructure database, the main and most inclusive academic database in China. Using the keywords “師幼互動” (teacher–child interaction) OR “CLASS課堂互動評估系統” (CLASS or Classroom Assessment Scoring System), the search resulted in an initial 2,206 hits. After removing nonjournal articles and duplicates and assessing the studies’ content and eligibility, 30 Chinese language articles were shortlisted. Figure 1 schematically depicts the selection process for Chinese language publications.

Fig. 1
figure 1

PRISMA flowchart diagram depicting the screening and inclusion procedures for Chinese articles

Publications in English were searched in five databases: Education Resources Information Center, PsycINFO, MEDLINE, Scopus, and the Web of Science. We used the keywords “China or Chinese” AND “Classroom Assessment Scoring System” AND “kindergarten* or preschool*” OR “teacher–child interaction.” The search resulted in an initial 2,574 hits. After removing non-journal articles and duplicates and assessing the content and eligibility of the studies, 25 English language articles were shortlisted. The selection process for English language articles is represented in Fig. 2. In total, 55 articles were included in this scoping review (30 in Chinese and 25 in English).

Fig. 2
figure 2

PRISMA flowchart diagram depicting the screening and inclusion procedures for English articles

Data analysis

To address Goal 1, we coded each study based on the seven categories presented below. We calculated the frequency and percentage of studies in each category. To ensure the reliability of the review, two coders coded all the shortlisted articles based on the eight categories. Discrepancies were discussed until 100% agreement was reached.

  1. 1.

    Language This category captured whether the article was published in English or Chinese.

  2. 2.

    Region This category describes whether the participating kindergartens were located in China’s developed and/or developing regions.

  3. 3.

    Kindergarten type This category registered whether the participating kindergartens were public or private.

  4. 4.

    Setting This category describes whether the participating kindergartens were in urban and/or rural areas.

  5. 5.

    Unit of analysis This category was captured if the paper provided the CLASS scores of schools, classes, and/or teachers.

  6. 6.

    Teacher sample size This category focused on the sample size of participating kindergarten teachers, with three levels: small (fewer than 15 teachers), medium (16–50 teachers), and large (over 50 teachers).

  7. 7.

    Research design This category describes whether the study was based on a qualitative, mixed-method, or quantitative research design.

In addition, we present the mean scores for each CLASS domain based on the reviewed literature to analyze trends in CLASS scores in Chinese kindergartens. We found that 50 studies reported the scores for emotional support, 50 reported the scores for classroom organization, and 54 reported the scores for instructional support.

To address Goal 2, we used an open coding approach to identify different research purposes for which CLASS was used. Six categories were established. We summarized the prototypical studies in each of the six categories to better understand how researchers in China have used CLASS in their work.

Results

Eighty-one authors wrote the 55 shortlisted journal articles. Nineteen had authored more than one article: seven authors appeared in two articles; four authors appeared in three articles; four authors appeared in four articles; and one author each appeared in five, six, seven, eight, nine, and 12 articles. One author appeared in 26 articles (25 in English and one in Chinese), with studies conducted mainly in the provinces of Guang Dong and Zhe Jiang.

Goal 1: characteristics of CLASS-based studies in Chinese kindergartens

Table 1 presents the results of the coding process for the 55 studies selected for this review. Publications (row) included appear in chronological order. Each category (column) is coded from the highest to the lowest frequency. Percentages and frequencies are displayed at the bottom of the table.

Table 1 Fifty-five shortlisted articles and resulting codifications in the various analytical categories

More articles were written in Chinese (30, 55%) than in English (25, 45%). Forty studies (73%) were conducted in developed areas with high-quality educational resources, while 15 (27%) were conducted in developing areas with inadequate educational resources. Regarding type(s) of kindergarten, 30 studies (54.5%) included both public and private kindergartens, 18 (32.7%) were conducted in public kindergartens only, two were conducted in private kindergartens (Guo & Zhang, 2019; Li, 2021b), and five did not specify the type of kindergarten (e.g., Liu et al., 2020).

Thirty-five studies (63.6%) included only kindergartens located in urban areas, whereas 16 studies (29%) included kindergartens from urban and rural areas and only four studies (7.3%) were conducted exclusively in rural settings (e.g., Hu et al., 2022b). More than half the studies (51%) were conducted with a large sample of teachers (N > 50), 17 studies (31%) were coded with a small sample size (N < 15), and 10 studies (18%) with a medium sample size (16–50 teachers). Regarding research design, most studies applied purely quantitative research designs (50 studies, 91%), while five studies (9%) used mixed-method designs involving both quantitative and qualitative data. In terms of the unit of analysis, 26 studies (47%) reported CLASS scores of individual teachers without specifying if they were from different classes, while 21 articles (38%) presented CLASS scores adopting a class level. Eight articles (15%) presented CLASS scores of schools (e.g., Li & Yang, 2017). Finally, with a seven-point Likert scale for each domain (1 = low, 7 = high), we identified the highest mean score in emotional support, with an overall mean of 4.89 (SD = 0.562, range = 2.91–6.11). For classroom organization, the overall mean was 4.71 (SD = 0.444, range = 3.41–6.11), and instructional support had the lowest range, with a mean of 3.06 (SD = 1.203, range = 1.22–6.17).

Goal 2: purposes of using CLASS in Chinese kindergartens

We identified six types of research purpose. We illustrate each with representative examples, including information about the research design, sampling, tools, findings, and conclusions. By describing how existing studies were conducted, we aim to inform policymakers and researchers about the types of future studies to be undertaken. Because the research designs of the studies included under each type varied widely, it is not always possible to summarize the trends identified in this body of work. Below, we present the six types of studies organized from the most to least prevalent.

Using CLASS to measure interactional quality in specific classroom activities and settings

This study type refers to investigations of teacher–child interaction quality in specific classroom situations (e.g., free play, learning corner activity, whole-group teaching). Twenty-two of the 55 articles (40%) were classified as this type. For example, using a sample size of 161 teachers from 41 public kindergartens in Shanghai, a developed area of China, Tian and Huang (2014) observed teacher–child interactions in 644 activities in whole-class, small-group, and individual settings. Instructional support received the lowest scores. Additionally, teachers received a higher score in whole-class teaching than in the other two formats in all three domains. According to Pianta et al. (2008), CLASS scores must be obtained based on several classroom episodes, including teaching, free play, or mealtime. However, in many studies classified in this type, researchers did not specify the number of observation cycles used to calculate CLASS scores (e.g., Cao & Wang, 2014). Only three studies (e.g., Ma & Yan, 2020) explicitly reported that their CLASS scores were based on four to five observation cycles. Future studies should overcome this limitation to obtain more accurate CLASS scores.

Use of CLASS to examine the relationship between teacher-related characteristics and interactional quality

This study type explored the relationships between teachers’ characteristics and interactional quality (e.g., teaching experience, salary, educational level). Sixteen studies (29%) of this type were identified. Their findings were mixed regarding whether teachers’ characteristics influenced interactional quality. For example, only eight out of 16 studies found a positive relationship between years of teaching experience and interactional quality. An example of this type is Yang and Hu (2019), who used a sample of 164 teachers from 60 public and private kindergartens in Guangdong province, a developed region of China. Using a mixed-method design, they found a gap between holding child-centered beliefs and transforming them into practical teaching. Most teachers adopted a balanced approach that integrated teacher-directed and child-centered activities. The authors argued that although professional training often includes field experience, feedback is necessary for teachers to improve their teacher–child interactions. Although both public and private kindergartens were included in the study, this type of research was conducted in only one developed province, which restricted generalizing the findings. In addition, most studies focused on teacher demographics. Other characteristics, such as personality, beliefs, motivations for teaching, and well-being, must be considered in future research.

Using CLASS to investigate how interactional quality predicts child outcomes

These studies (27% of the total) analyzed the relationship between teacher–child interactions and child outcomes. Although the focus was on different developmental domains, higher teacher–child interactional quality was associated with enhanced child outcomes. Specifically, 11 of the 15 studies identified a positive association. For example, Li’s (2020) study examined 59 teachers and 570 children from 59 public and private kindergartens in Guangdong province. Parents also reported on their children’s social skills. They found that teachers received the highest mean score in emotional support (mean = 5.08) and the lowest in instructional support (mean = 2.38). They also found that all three domains significantly correlated with children’s social competence, among which emotional support was the strongest explanatory factor. The author argued that in-service training, practical workshops, and one-on-one consultations are needed to improve the quality of teacher–child interaction in these Chinese kindergartens. Similar findings were observed for academic achievement. For example, based on a sample of 29 teachers and 567 children, Hu et al., (2018b) found that high-quality teacher–child interaction predicted children’s literacy performance. Most studies adopted cross-sectional designs. Because teacher–child interaction involves dynamic reciprocal relationships, with one influencing the other, it is possible that children’s performance could affect teacher–child interaction. Future research should investigate these dynamic reciprocal relationships using longitudinal research designs. In addition, given that children’s development is influenced by multiple environmental systems (Bronfenbrenner & Morris, 2006), future studies should extend beyond the direct influence of teacher–child interactions. For example, examining how teacher–child interactions interact with family-related and child-related factors would be important. Furthermore, as previous research has indicated a nonlinear relationship between classroom quality and children’s development (Hu et al., 2020a), future research should use more complex statistical methods to model the relationship between teacher–child interaction and children’s development.

Using CLASS as a tool to evaluate teacher–child interaction quality

We identified 12 studies (22%) that evaluated teacher–child interactional quality in kindergartens. For example, Li and Yang (2017) randomly selected a sample size of 42 teachers from 42 public and private kindergartens in Guangdong province (an urban area). Observations were conducted during school activities in half-day programs. We found that Chinese teachers obtained similar score patterns in the three domains as in Western studies, with relatively higher scores in emotional support and lower scores in instructional support. One reason for the low scores in instructional support was the lack of teacher training and the need to improve teacher quality. Notably, their scoring did not follow standardized CLASS procedures, which require observations to be conducted across different observation cycles. Researchers did not specify the number of cycles used to obtain CLASS scores in many studies. Future research may expand the sampling frame to include at least four 20-min video cycles drawn across a typical preschool day.

Examining the psychometric properties of CLASS in China

Approximately 7% (n = 4) of the studies included in this review examined the psychometric properties of CLASS. Based on a sample of 60 classrooms from public and private kindergartens in urban and rural areas of southeast China, Wang et al. (2019) compared the psychometric properties of four factorial structures of CLASS (three-factor, two-factor, one-factor, and bifactor). Their confirmatory factor analysis showed that the bifactor model best fit the data, comprising one general factor (e.g., responsive teaching) and three domain-specific factors. The authors also found that variables such as teachers’ teaching experience, salary, teacher–child ratio, and class size were significantly related to the general factor and that the general factor significantly predicted children’s social and cognitive performance, indicating good validity and reliability of CLASS in Chinese kindergartens. Although good psychometric properties have been established, there is a lack of research on the detailed aspects of measurement invariance. Once measurement invariance has been established, CLASS will be equally valid and scores compared meaningfully across cultures.

Using CLASS as a training tool to improve teacher–child interaction quality

Only one of the 55 studies (1.82%) used CLASS as a tool to enhance teacher–child interaction. The purpose of Song and Gai’s (2019) mixed-method study was to examine the effectiveness of an intervention program. The participants were 24 teachers from six public or private kindergartens, who were divided into an experimental group and a control group. The program focused on enhancing teachers’ child-centered beliefs, knowledge, and strategies for effective teacher–child interaction. CLASS was used to measure the effectiveness of this 2-month program at two time points (postintervention and 4-month follow-up assessment). The results found a significant intervention effect on all three domains. Despite the 4-month follow-up assessment showing a slight decrease in the emotional support and classroom organization scores, they were still considerably higher than those at the pretest. The authors concluded that a successful intervention program requires training videos, case studies, and experience sharing. These components helped participants transform their theoretical knowledge into practical skills. However, the study was limited because there was no CLASS score reliability and validity testing.

Discussion

Goal 1 was to provide an overview of CLASS-based research conducted in kindergartens in China. We aimed to facilitate international scholars’ access to this research, given that many existing studies were published in Chinese. We found that studies have primarily been conducted in developed regions of China, with more attention paid to kindergartens in urban than rural areas. The fact that research has been conducted in some provinces may limit the generalizability of the findings, given the diversity of educational policies and resources across China. A large portion of the studies were conducted by the same research group, with data mainly collected in the provinces of Guang Dong and Zhe Jiang. Participating kindergartens are publicly funded, with children from higher socioeconomic backgrounds. Notably, most findings were based on small and medium samples of teachers (less than 50 teachers). Thus, to provide a more accurate picture of teacher–child interaction, future studies should be conducted with a larger sample of teachers from public and private kindergartens. Kindergartens should also be recruited across the 34 provinces.

Studies have shown that scores in emotional support and classroom organization tend to be higher than those in instructional support, similar to findings in other Western countries such as Australia (Thorpe et al., 2022), the U.S. (Mashburn, 2017), and Finland (Salminen et al., 2012). However, the scores provided in much of this literature should be interpreted with caution, as the administration of CLASS did not follow the standard protocol in many cases (Pianta et al., 2008). Of the 42 studies reviewed, only 23 were rated by certified coders trained by Teachstone (US). Furthermore, many studies provided CLASS scores based on observing single activities without adequate observation cycles, instead of using the recommended minimum of four activities. Furthermore, only 14 studies reported interrater reliability.

Goal 2 was to identify the purposes of the various studies. Most studies used CLASS to measure interactional quality in classroom activities and settings (e.g., Tian & Huang, 2014), examine the impact of teacher-related characteristics on interactional quality (e.g., Yang & Hu, 2019), or explore how teacher–child interactions influenced child outcomes (e.g., Li, 2020). Fewer studies have investigated the psychometric properties of CLASS (e.g., Wang et al., 2019). Finally, few studies have used CLASS to evaluate the quality of teacher–child interaction (e.g., Li & Yang, 2017) or to improve the quality of teachers’ interaction with children (e.g., Song & Gai, 2019). Although the existing research is meaningful and informative, it is insufficient to inform policymaking and educational reform in kindergartens.

Two types of research studies are still lacking in China. First, CLASS has not been used in large-scale evaluation studies. In Western countries, CLASS has been used as a proxy for process quality (Delaney & Krepps, 2021; Mashburn, 2017; Thorpe et al., 2020; Torres et al., 2021). It has also been adopted as a standardized measure to inform funding decisions (Delaney & Krepps, 2021). Although Hu and her collaborators conducted a series of relatively larger scale studies to examine the influence of interactions on child outcomes (e.g., Hu et al., 2020a), the sample sizes were still relatively small. Moreover, the available evidence is restricted to a few provinces (e.g., Guang Dong, Zhe Jiang). Given the wide variation in kindergarten quality across rural versus urban areas and developed versus developing regions in China, researchers need to enlarge the scale of studies and conduct more systematic evaluations across the whole country.

Second, researchers have not yet conducted large-scale teacher training projects or interventions based on CLASS. Continuous professional development enhances teachers’ knowledge and teaching practices (Sheridan et al., 2009). Recently, Hu et al., (2022a) have taken a promising first step in this direction in a study with 112 preservice teachers using the MMCI model (note that this study was not included in the review, as the publication date was beyond our review period). The participants’ professional knowledge and skills improved in the intervention group, but their CLASS scores were not reported. Because of this limited body of work, we recommend that the government invest more funding and resources to design and implement nationwide and district-wide training interventions with in-service teachers to enhance teacher–child interactional quality.

Limitations

The current study has several limitations. First, we only focused on peer-reviewed empirical articles published in two languages (Chinese and English); hence, we may have missed other types of publications (e.g., books, book chapters, conference proceedings, reports) written in other languages. Future reviews should broaden the scope in terms of publication types and languages. Second, our review did not include methodological rigor as a criterion for inclusion, leading to considering studies with obvious methodological issues (e.g., studies in which observers were not CLASS certified or where interrater reliability had not been calculated). This resulted in considerable variation in the quality of the studies included which, along with the diverse nature of the studies identified, makes synthesizing the literature and comparing the findings challenging. Future studies may consider conducting meta-analytical reviews to synthesize the effect sizes of the relationship between teacher–child interaction and child outcomes across various domains. Finally, this review focused on empirical studies that reported quantitative CLASS scores. As practitioners may have different understandings of what “high quality” means in different contexts, future systematic reviews may investigate teachers’ perspectives on interactional quality on which CLASS is based and examine whether teachers’ perspectives moderate the relationship between CLASS scores and child outcomes.

Significance and implications

The current study is the first scoping review that thoroughly investigates the literature on CLASS in China. Using both planned coding and open descriptions of representative studies, we provided an overview of how researchers have used this measure in Chinese kindergartens. Our review shows that the available evidence is thin for establishing new regulations or stimulating policy reforms. Nationwide studies are needed to comprehensively evaluate evidence-based policy considerations (Mashburn, 2017; Thorpe et al., 2020). We strongly recommend that researchers administer CLASS correctly, following the stipulated guidelines (Pianta et al., 2008). Observers need to be trained and certified before any formal coding for evaluations of teacher–child interactions to be reliable and valid, and coders must be certified to observe and code teacher–child interactions. In addition, it is necessary to provide evidence of interrater reliability and justify their coding decisions. Reliable and valid nationwide evaluative information would allow policymakers to stimulate educational policy reform to promote process quality in kindergartens better (Burchinal et al., 2002).

To move the field forward, we encourage the government to allocate more funding to support professional development initiatives on teacher–child interaction, drawing on CLASS or other similar frameworks, especially for teachers in developing regions and rural areas. Although China launched the National Teacher Training Program in 2017 (MoE, 2017), the program mainly focuses on educational theories and teaching pedagogy (Zhao & Liu, 2022). Greater focus on teacher–child interaction would be more efficient in improving process quality.

Given the evidence of the applicability of CLASS (Hu et al., 2016b) and the effectiveness of MMCI mode-based teacher training in China (Hu et al., 2022a), teacher–child interaction training constitutes a promising path for the professional development of teachers. Considering the cost and challenges that arise in one-on-one teacher–child interaction training at scale, group coaching or peer coaching may be effective alternatives (Hu et al., 2022a). As found in previous research (Hu & Roberts, 2013), one key obstacle rural kindergartens face is the shortage of well-qualified teachers. Teachers increasingly pursue employment in urban areas because of poor pay and limited career prospects in rural areas. Thus, teacher–child interaction training may serve as an incentive to recruit and retain quality teachers in rural areas (Xia et al., 2023). Kindergarten principals may also consider using teachers’ CLASS scores when refining teacher evaluation systems, which could help teachers be more aware of their interactional quality in the classroom. These initiatives could reduce the socioeconomic gaps in kindergarten program quality across China (Hu et al., 2014).

While study suggests that CLASS robustly predicts child outcomes, there are inconsistent findings and ongoing debates in the literature regarding whether process quality, as measured by CLASS, predicts child outcomes (Burchinal, 2018; Guerrero-Rosada et al., 2021; Mendive et al., 2022; Perlman et al., 2016). Much of the debate has been based on evidence collected in Western contexts. While our scoping review indicates that teacher–child interactional quality in China is also associated with child outcomes, this finding should be interpreted with caution because many of the studies analyzed were based on the same dataset while others did not administer CLASS properly (e.g., no certification, limited reliability). Thus, the identified positive association may be biased. While a systematic examination of whether teacher–child interaction predicts child outcomes in China goes beyond the scope of our review, it highlights problematic issues in the current literature and offers recommendations for both researchers and policymakers. This contribution adds a crucial perspective to existing CLASS debates by providing information from a large Asian country.