Do the numbers add up? Questioning measurement that places Australian ECEC teaching as ‘low quality’

Internationally, standard observational measures of Early Childhood Education and Care (ECEC) are used to assess the quality of provision. They are applied as research tools but, significantly, also guide policy decisions, distribution of resources and public opinion. Considerable faith is placed in such measures, yet their validity, reliability and functioning within context should all be considered in interpreting the findings they generate. We examine the case of the Classroom Assessment Scoring System (CLASS) in the Australian study, Effective Early Education for Children (E4Kids). Using this measure Australian educators were identified as “low quality” in provision of instruction (average 2.1 on a scale of 1–7). When these results became public, they attracted negative press coverage and the potential for harm. We interrogate these findings asking three questions relating to sampling, contextual and empirical evidence that define quality and measurement strategies. We conclude that measurement problems, most notably a floor effect, is the most likely explanation for uniformly low CLASS-Instructional scores among Australian ECEC educators, and indeed across international studies. Using a theoretically and empirically informed rescaling strategy we show that there is a diversity of instructional quality across Australian ECEC, and that rescaling might more effectively guide improvement strategies to target those of lowest quality. Beyond, our findings call for a more critical approach in interpretation of standard measures of ECEC quality and their applications in policy and practice, internationally.


3
Complex statistical analyses of large-scale data collections are one of the many strategies we apply in our research team as we endeavour to inform policy and practice in early childhood education and care (ECEC). Such work can provide the big picture; an overview of ECEC functioning and effectiveness at national or jurisdiction level. However, like any other research strategy, this one has limitations and should be open to question. Importantly, the contribution of this form of research is inextricably linked to the faith we can place in the measures applied to collect data at-scale.
Large scale statistical work is dependent on measurement. When applied to understand the impact of ECEC on children's development and learning, this means that qualities of the ECEC environment and their effects (e.g., change in children's knowledge and skills) are quantified. Some measurements of quality are straight forward. Structural qualities of the ECEC environment, such as group size or educator to child ratios, are simple counts and typically not subject to dispute. Other measurement is less certain; both subject to judgements on the part of those who develop the measurement tools and to interpretation by those who observe ECEC environments and apply these tools of quantification (Gordon et al., 2021;Mantzicopoulos et al., 2018;Styck et al., 2020).
While there is agreement that interactions between children and adults in the ECEC context are central to children's experiences (Being and Belonging) and learning outcomes (Becoming) (Australian Government Department of Education, 2019; Mashburn et al., 2008;Stronge, 2018;Thorpe et al., 2020a), quantifying interactional qualities and naming these as representations of quality, becomes contentious (Mantzicopoulos et al., 2018;Mashburn, 2017;Thorpe et al., 2020aThorpe et al., , 2021. There is ongoing scholarly debate about whether ECEC quality should be measured relative to context or as a standard that transcends context and is applicable to all (Campbell-Barr & Bogatić, 2017;Hunkin, 2018;Jackson, 2015;Rentzou, 2017). Contextual factors that might influence how quality is defined, or how its elements are weighted, include two key sources of variation. First, variations in the cultural and community characteristics in which a service is sited may influence quality. These may determine the resources available, influence educational priorities and determine an appropriate educator response (Jackson, 2015;Rentzou, 2017). Interactions require child inputs and 'quality' is seen in responsiveness to the child (Justice et al., 2013). Children's language, cognitive abilities, and behaviours, influence interactional possibilities (Coley et al., 2019;Houts et al., 2010). Second, the pervading pedagogical philosophy (Campbell-Barr & Bogatić, 2017;Hunkin, 2018) and specific pedagogical intent (Justice et al., 2013) within a teaching moment may direct interactional strategy and inform understandings of quality. Children are active agents and highly sensitive to educator's cues (Bonawitz, 2011;Donaldson, 1978;Siegal, 2013). For example, experimental evidence shows that when presented with a novel object, direct instruction will focus a child's attention solely on the demonstrated function and thereby limit exploration of other possibilities. In contrast, when presented with the same novel object without direct instruction a child exhibits greater exploration to identify multiple functions of the novel object (Bonawitz, 2011). Thus, if a pedagogical goal is to inculcate specific knowledge, direct instruction may well define quality. In contrast, if the intention is to support hypothesis testing,

3
Do the numbers add up? Questioning measurement that places… generate motivation for learning, or encourage creativity, then a problem-based learning approach may well define quality (Kuhn, 2007).
Despite ongoing debate, across the last two decades, commercialised, standard observation measures of ECEC environments have come to dominate assessment of ECEC and definitions of ECEC quality. Measures such as the Early Childhood Environment Rating Scale (ECERS) (Harms et al., 1998;Sylva et al., 2003) and the Classroom Assessment Scoring System (CLASS)  have been increasingly reified as synonymous with ECEC quality. They have been applied not only as research tools but have been trusted as accurate representations of ECEC quality and applied in critical policy and practice judgements that direct funding actions (Mashburn, 2017;Thorpe et al., 2020a) and influence public opinion (Marriner, 2012(Marriner, , 2016. The Classroom Assessment Scoring System (CLASS)  has emerged as the most influential measure of ECEC quality in the last decade (Mashburn, 2017). CLASS measures three domains of quality: instructional support (CLASS-IS), classroom organisational (CLASS-CO) and emotional support (CLASS-ES). The content of each is outlined in Fig. 1. In the Australian context CLASS was the measure adopted by our research team to quantify educatorchild interaction in the largest, published national study of ECEC quality, E4Kids (Tayler et al., 2016a(Tayler et al., , 2016b. E4Kids was designed to assess the effectiveness of Australia's licensed ECEC services by tracking ongoing child development outcomes (2010-2015). In the first year of the study, a representative sample of 257 preschool, long day care and family day care rooms were observed. Of these, 225 rooms were observed for a minimum period of 80 min (4 cycles × 20 min). The findings, presented in Fig. 2, showed that while assessed emotional (CLASS-ES) and organisational (CLASS-CO) qualities of the ECEC rooms were, on average, in the satisfactory range (average scores of 5.13 and 4.6. respectively, on a 1-7 scale) instructional qualities (CLASS_IS) were in the low range (average 2.37) (Tayler et al., 2013;Tayler et al., 2016aTayler et al., , 2016b. When these results were first made public, newspaper headlines suggested that all Australia's early childhood educators were "flunking the test" (Marriner, 2012). In this paper, we revisit the findings of lowquality instruction in Australian ECEC services. We commence with two stories: one of the fieldworkers undertaking CLASS observations in the E4Kids study and the other that of a researcher taking an alternative approach to understanding educator-child interactions in ECEC, applying conversation analysis (CA) methodology (Heritage, 2016;Sacks, 1995). We bring these two together to present a third story that calls into question the numbers deriving from CLASS-IS that asserted that Australian early childhood educators, on average, deliver poor quality teaching.

Story 1: what the observers saw
The assessment of interactional quality in E4Kids was an enormous task with more than 90 researchers collecting data across urban (Melbourne, Brisbane), regional (Shepparton, Victoria), and remote (Mt Isa, Queensland) Australian communities. The field work required direct observation of ECEC quality in each classroom using the CLASS measures in which there were cycles of 20-min of observation with a following 10-min coding period. Alongside, researchers spent many more hours Do the numbers add up? Questioning measurement that places… present in the ECEC services as they undertook standard testing of each participating child's cognition and learning. All field-researchers were trained in CLASS observation against master codes using proprietary, standard video-recorded classrooms (Teachstone Corporation, USA). All were certified as reliable by a certified trainer, and independently verified by Teachstone, USA. Across the data collections, field-researcher's work was meticulously scrutinised to ensure fidelity with double scoring against a gold standard rater in the field. After their visits the researchers provided feedback both formally through standard report, but also informally in team meetings. A few times in their informal accounts the researchers reported observations of concern for the health or safety of children that required immediate follow-up by the study director. Sometimes they would comment on opportunities not taken (e.g., lack of discussion at mealtimes). Most times they would comment on educators doing their best under conditions of high demand. They also reported awe-inspiring moments of interactions between young children and their educators. When the summary statistics from analyses of thousands of observation cycles in early education settings were finalised, however, this variation was not captured. The results showed teaching interactions (CLASS-IS) did not reach the moderate range but, rather, were uniformly rated low. Those "awesome" moments reported by the researchers were not evident (Fig. 3). The scores suggested that even those aweinspiring educators were, at best, rated as mediocre (low-mid range).

Story 2: what the PhD student discovered
Far from the application of standard assessment of ECEC quality, is the method of conversation analysis (CA) (Heritage, 2016;Sacks, 1995). In the ECEC context this method has been applied to undertake detailed analysis of interactions, examining unfolding moment-by-moment exchanges between educator and child. Conversation analysis is not a measure of quality but rather provides deep insight into how interactions are enacted and, in the context of ECEC, can identify strategies that engage children in ongoing conversation and opportunities for learning. In contrast to standard quantitative measurement that captures counts of pre-determined actions as indices of quality (e.g. "Teacher uses How and Why questions"- Pianta et al., 2008, p. 62), conversation analysis serves to uncover the qualities of interactions that serve to promote, or limit, learning.
In a Ph.D. study by one of the authors (Houen, 2017;Houen et al., 2016) conversation analysis was applied to understand how educators request factual information from children and how they invite children's contributions to classroom discussions. Such dialogic interactions between educator and child have been identified in an extensive education literature as a marker of ECEC quality (e.g., Mashburn et al., 2008;OECD, 2019;Siraj-Blatchford & Manni, 2008) and are included within the behavioural markers of quality in CLASS-IS . The Ph.D. study undertook a finegrained analysis of educator-child interactions captured within a corpus of 80 hours of teacher-child video recordings in ECEC settings. The findings challenged a longheld assertion, embedded within measures such as CLASS-IS that "open" questions (e.g. 'what', 'where', 'why') necessarily result in longer and deeper exchanges between educator and child (Siraj-Blatchford & Manni, 2008), a challenge also noted in school contexts (Dillon, 2006). The analysis showed that these methods of questioning often positioned the educator as 'knowing' and the child as 'tested' with consequent closing down of conversation. In contrast, the use of phrases such as "I wonder…", positioned the educator as 'unknowing' and equal partner in learning and were more likely to achieve extended interactions. However, the educator's follow up responses were pivotal in providing children with opportunities for sustained discussions and co-constructed learning. Thus, the educators' action of questioning (Pianta et al., 2008, p. 62) was not of itself a marker of quality but rather the ongoing contextualised response. Further, an important meta-finding was that these effective verbal moments of 'wondering' were only of a few minutes' duration. Beyond, there were pauses for children to think and spaces for children to experiment, explore, discover, and act. Teacher talk, a defining element of CLASS-IS, was not of itself the essence of high-quality interaction, but only part of the picture. Important also was what happened in the spaces between.
Do the numbers add up? Questioning measurement that places…

Story 3: awesome or awful? Applying conversation-analysis to question assumptions underpinning standard measurement of instructional quality
Sometimes in quantitative research the numbers do not add up. The analyses deliver a score or a statistical finding that does not tally with expectation or logic. In these circumstances, while the finding may indeed be correct and new knowledge created, the data must be scrutinised to ensure that errors have not been made or that alternative explanations cannot be found. Given the power of numbers to influence policy and practice, such scrutiny is essential to avoid the potential of harm (Mashburn, 2017;Thorpe et al., 2020aThorpe et al., , 2020b. The finding of uniformly low 'instructional quality' across the diversity of Australian ECEC provision was potentially harmful and did not add up in light of reports from the field-researchers. A focus on dialogic exchanges between child and educator, however, did fit with educational theory and empirical evidence that shows that such exchanges predict positive child outcomes (Mashburn et al., 2008;Siraj-Blatchford & Manni, 2008). So, what was wrong? To investigate we asked three fundamental questions that interrogated the functioning of the CLASS-IS measure in context:

Question 1: were low CLASS-IS scores related to the Australian ECEC settings or the E4Kids sampling?
E4Kids was the first large-scale study of ECEC quality in Australia and the first international study to observe all forms of licensed ECEC provision, sampling across family day care, long day care and kindergarten programs. The low average CLASS-IS scores may reflect the greater diversity of settings observed in this study compared with others internationally. Yet, evidence suggests this was not the case. Looking within the Australian sample, we focussed on stand-alone Kindergarten programs as these have the most favourable conditions and should perform highest. Kindergarten programs are distinguished by having the oldest cohort of children, stable class groups attending shorter sessions, and generally more qualified staff. In these more optimal settings, as seen in Fig. 4, we still found that CLASS-IS was almost entirely in the low range.
Looking beyond the Australian context to compare our findings with those of several other nations (Fig. 5) we found low CLASS-IS scores are typical, regardless of where the data were collected, and the diversity of service types represented. In fact, an average upper range of scoring high quality, that is a score of 5-7, is not evidenced in any international context. Most notable are observations of Finnish Kindergarten settings, revered as representations of excellence in ECEC (Sahlberg, 2012(Sahlberg, , 2021Sahlberg & Doyle, 2020;Taguma et al., 2012). In these settings, high-quality curriculum, exceptional levels of educator qualification (most with Masters degree) and older age of those attending (7 years) with commensurate higher verbal ability, all raise expectation of high CLASS-IS scores. Yet, average scores only enter the low-moderate range (mean = 3.7). The specific context of Australian ECEC, and the diversity of service types, therefore, did not explain the low scores.

Question 2: were low CLASS-IS scores related to a discrepancy in understanding of ECEC quality?
A discrepancy between the philosophical understanding of quality enacted in pedagogical practice in the Australian context and that underpinning the USA-developed CLASS-IS measure was a potential explanation as educators may be working to different goals. Yet this also seems unlikely. The three dimensions of CLASS-IS and their observational indicators (Fig. 1) have strong face validity. That is, they align with a view of instructional quality that places high levels of interaction between educator and child as central (Edwards, 2017;Mashburn et al., 2008;OECD, 2015OECD, , 2019. The content of CLASS-IS also aligns with specification of instructional quality within Australia's National Quality Standards (Quality Area 1) (Australian Children's Education & Care Quality Authority, 2020) that asserts that a child should have agency in their learning.
Pedagogical intent (high levels of educational interaction) and the functioning of the CLASS-IS measure in practice might not align. Available evidence on how CLASS-IS functions in practice shows that high scores are generated in more formal instruction seen in whole group activities, literacy and numeracy content Thorpe et al., 2020aThorpe et al., , 2020b. A measure that preferences whole group activity may not align with the predominating play-based pedagogical philosophy underpinning practice in Australian ECEC settings and explain the low scores seen in Australian ECEC. Yet this explanation does not adequately account for the findings for two reasons. First, in the school environment we still saw low scores. Over 4 years of tracking, E4Kids conducted observations in the children's school aged classrooms (N = 2187), across Preparatory (n = 1500), Year 1 (n = 497), and Year 2 (n = 190). Comparison of the distribution of CLASS-IS observed across each grade, presented in Fig. 6, showed that while there was a slight increase in average CLASS-IS in the formal school years, where learning is typically more structured and programs are led by degree qualified teachers, the average observation score remained in the low range. Second, in Finnish classrooms where a strong play-based pedagogical approach predominates, we saw the highest scores across international comparisons. This finding suggests play-based approaches, of themselves, do not explain low scores.

Question 3: is there a measurement problem?
We asked whether the scaling of the measure was producing a floor effect, in which the possibility of obtaining a moderate CLASS-IS score was unlikely; and that of obtaining a high score infeasible. Our next step was to investigate that possibility.
The detail of conversation analysis within the ECEC setting provided a clear direction to understand why a floor effect may emerge in measurement of CLASS-IS. Like the other two domains of CLASS (CLASS-CO, CLASS-ES), scoring of CLASS-IS is based on observation cycles of 20-min following which educator behaviours across the entire observation period are scored. To obtain a high score on CLASS-IS for each cycle, therefore, requires instructional language exchanges between child and educator for the majority of each 20-min observation. Furthermore, to achieve high average CLASS-IS scores across the total observation of 4-6 cycles would require a continually high level of verbal instruction across a 2-3-h period. Using this scoring procedure there are two distinct reasons that explain the low likelihood of scores in the higher range on CLASS-IS. First, the expectations are unrealistic. Across a 2-3-h period in ECEC settings the imperative for care activities (meals, toileting) and transitions between activities reduces the possibility of continual instructional exchange. Second, the expectations may be suboptimal. Houen's detailed analysis (Houen, 2017;Houen et al., 2016) suggests long verbal exchanges that comprise most of a 20-min observation cycle are inconsistent with a play-based pedagogical approach. Yet play-based pedagogy is recommended, both as an age-appropriate approach (Australian Children's Education and Care Quality Authority (ACECQA), 2020; Edwards, 2017;Flückiger et al., 2018;Fluckiger et al., 2017) and as a marker of high quality in early education (Sahlberg, 2021;Sahlberg & Doyle, 2020;Taguma et al., 2012). Within a play-based approach, reciprocal verbal exchanges interspersed with spaces for thinking and acting would be expected. For example, the Learning Language and Loving It program (Weitzman & Greenberg, 2002) recommends teachers engage in a minimum of four turns when interacting with children and also encourage peer to peer interaction. The teacher input in these exchanges would take a few minutes, not the majority of a 20-min cycle. Thus, the instructional domain (CLASS-IS) contrasts with the emotional (CLASS-ES) and organisational (CLASS-CO) domains in rating behaviours that are necessarily and appropriately intermittent, not continual. CLASS-IS counts content-events, with high weighting on teacher talk, (80% of behavioural indicators in CLASS-IS manual) but low weighting on child inputs (20% of behavioural indicators relate to "student" actions). Continual rating, therefore, fails to capture non-verbal educator actions (e.g., providing pauses and spaces) and does not adequately capture child inputs. Fig. 7 Original distribution of CLASS Instructional support for ECEC classrooms in E4Kids compared with rescaled distribution Houen's analysis (Houen, 2017;Houen et al., 2016) suggests the problem of scaling relates to the expectation of continuous rather than interspersed exchange; of focus on the density of educator inputs rather than the dispersal of exchanges that afford child agency and encourage child input into interactions. Indeed, this is true not only in ECEC contexts. Studies of learning in more formal school settings indicate that the period of instruction and the activities in between (opportunities for reflection, self-directed application) are both necessary for deep learning (Dillon, 2006). Applying this logic, we trialled a rescaling of E4Kids data taking the highest level of exchange rather than the average across time. In so doing we removed the assumption that continual exchange was optimal. The result of the rescaling is presented in Fig. 7.
The rescaled CLASS-IS served not only to shift scores into the moderate range but also redistributed scores as those educators providing space for child actions were less likely to be penalised by the scoring system. The resulting distribution was more consistent with reports of our fieldworkers. A few rooms were indeed providing no or low levels of instructional verbal interaction across a period of 1-3 h, many were providing moderate levels of educational exchanges in that time and an "awesome'' few were scoring at the highest levels. Taking this scaling we then compared patterns across the transition from ECEC setting through Preparatory, Year 1 and Year 2 classrooms using the same E4Kids sample. The results, presented in Fig. 8, show that most CLASS-IS scores, when rescaled were no longer low but within the moderate range. Increases in CLASS-IS occurred more distinctly beyond the ECEC year consistent with changes in staff qualification (educators are now uniformly degree qualified teachers) and increasing verbal ability of children. Most notable in the trend across school years is the increased scores at Preparatory year consistent with greater consistency of teacher qualification (all degree qualified) increased focus on intentional teaching and previously reported biases of CLASS-IS to whole group formats and literacy and numeracy content.

Discussion
Developing observation measures of ECEC practice is a difficult task that demands a valid representation of 'quality'; that can be reliably collected in large numbers; and that can accurately capture variability of practice. In examining the case of CLASS, we acknowledge the importance of this measure, and similar such standard measures. Collectively, large scale standard measures of ECEC quality have drawn attention to the conceptualisation of ECEC quality, undertaken detailed work in measure development and facilitated large scale research endeavours. Nevertheless, our questioning of potential issues with such measures when "something does not seem right with the results" is important. Questioning contributes to refinement; both of the target behaviour being measured (validity-does CLASS-IS align with conceptualisations of ECEC quality?) and of the methods by which these behaviours are measured (reliability-is the measurement replicable and representative?). Faith in statistical findings without questioning of the underpinning principles of measurement can misdirect policy decisions and practice actions (Hunkin, 2018;Mashburn, 2017;Roberts-Holmes, 2015;Thorpe et al., 2020a) or undermine public trust (Marriner, 2012).
Internationally, considerable energy and resource has been expended in remediating the perceived deficit in early childhood educator pedagogical skills with little effect (Egert et al., 2018;Pianta et al., 2016). Our questioning of the measurement of in CLASS-IS directs attention to underpinning assumptions about instructional quality in early childhood settings. Our findings based on rescaled CLASS-IS likely delivers a more accurate understanding of the diverse enactment of instruction in Australian settings (NQS Quality area 1) and should be a catalyst for questioning in other settings internationally.

Representing ECEC quality: learning from the example of CLASS-Instructional Support
Our analysis of CLASS-IS raises critical questions about intent and values in measuring ECEC quality through use of standard measures that are applied without adjustment for context. The intent of a standard measure is to allow absolute comparison. The value of absolute comparison should be to identify inequities and redress these through policy and practice actions. The danger of absolute comparison is the possibility of misunderstanding or misuse. In the case of low CLASS-IS scores, the value has been in directing focus on instructional elements of the ECEC environment, but the cost has been a labelling of all educators as "flunking the test" (Marriner, 2012) and assuming that there is a uniform deficit in educator skill. Most importantly, the biggest risk is reification of CLASS-IS scores that directs attention to educator deficit without considering the contexts of community or child in interpretation of scores. While CLASS-IS items have strong face validity, aligning with current research that places value on educationally focussed exchanges between child and educator, in practice CLASS-IS places extremely high weighing on the content of the educator's inputs. Thus, the possibility of scoring highly in a community where children have lower language levels (Coley et al., 2019) (or language other than those used in assessment) or higher behavioural difficulties (Houts et al., 2010) is more limited. In these circumstances a focus on generalised educator deficit may misdirect policy and practice actions to focus on standard educator professional development rather than tailored educator supports to meet contextual needs. Further, other forms of resource such as improved staff ratios and/or specialised assistance (e.g., specialised expertise in behavioural or emotional problems) for educators or families may serve to enable higher rates of interaction and educational content.
To understand the value of a measure and the various components from which it is comprised, one important test is predictive validity; that is, whether the measure maps to an intended outcome. In the case of CLASS-IS we would predict (and intend) that scores would be associated with child learning and development outcomes. Analysing E4Kids data, we have examined the association of CLASS scores with concurrent, short-term (to age 8) and long-term (to age 14) child outcomes controlling for potential confounders (e.g., family background). Consistent with other such analyses (Egert et al., 2018;Hong et al., 2019;Perlman et al., 2016), we found weak prediction of CLASS, generally, and of CLASS-IS specifically (2020b). Nevertheless, when we applied the strongest test, long-term prediction, interesting outcomes emerged. Using data linkage of CLASS domain and dimension scores at age 3-4 years to school records across ages 5-14 years showed that while the CLASS-IS domain does not reliably predict child outcomes, one of the three Instructional Support dimensions, quality of feedback, predicts achievement in numeracy, literacy, and science across time (Thorpe et al., 2020b). Alongside, the emotional domain, CLASS-ES, and its dimension, the dimension regard for student perspective emerged as a predictor. Together these findings suggest that it is not the content (concept development and language modelling) but the process of interacting with children that has enduring effect, with child inputs and educator responses a significant component of quality. This being the case, attention to the scaling of the CLASS-IS to adjust underlying assumptions of continual rather than interspersed input becomes particularly meaningful. Such an adjustment captures the affordance of space for child inputs and, thereby better captures educator-child interaction, not simply educator action.

3
Do the numbers add up? Questioning measurement that places…

Measuring Instructional quality: learning by questioning CLASS-Instructional Support
Our analysis of CLASS-IS raises critical questions about procedures in measuring ECEC quality that relate to reliability of the measure. The protocols for measuring ECEC quality using CLASS are well-developed. There are stringent training regimes and tests of reliability to ensure maintenance of fidelity to the observation strategy. We note some concerns have been raised about the standard reliability criterion set by the developers and about rater-bias effects (Styck et al., 2020). Reliability in CLASS does not require exact agreement between gold standard codes but rather deems an observer as reliable if they are within 1 score (plus or minus) of the master code. This procedure manifests in reduction of scale usage with an observer unlikely to use the extreme codes (1 and 7). Addressing this issue, however, is outside the remit of this paper. In E4Kids we adopted the standard training and reliability protocols. Considerable effort was expended to ensure ongoing fidelity to the codes. As data collection proceeded, reliability checks occurred against both mastercoded video-recordings and a master-coder in the field. Additionally, weekly checks of codes against field notes were undertaken. Our results, therefore, are a true representation of the CLASS measure as intended by the developers .
A true representation of a measure, however, may not be a true representation of ECEC quality. In the case of CLASS-IS, our analysis suggests the measure may work to under-represent some aspects of ECEC quality, namely child inputs into interactions, and fail to adequately distinguish variability. We argue that while it is pragmatic to measure all three domains of CLASS on the same scale based on a rate/time, the perverse effect is to create a floor effect, reduce variability and represent ECEC instructional quality as uniformly low. Our simple rescaling likely presents a more accurate and useful focus for exploring the associations with child outcomes and targeting interventions.

Interpreting the meaning of findings of low instructional quality for policy and practice
Our analysis of CLASS-IS has three key implications for ECEC, both in the specific context of Australia and internationally.
First and most important, our rescaling of CLASS more likely provides a more accurate picture of the quality of educational experiences provided to Australian children across the diversity of ECEC services. Our analysis refutes the claim that Australian ECEC educators are uniformly of low quality. Rather, our results suggest there is a diversity of quality and, on average, moderate levels of instruction. While there is certainly room for quality improvement, not all educators are failing our children as suggested in the headlines of 2012 (Marriner, 2012). Understanding the functioning of the measure and applying rescaling might increase precision in directing resources to those services and educators most in need of supports to improve interactional quality. Consideration of the context in which these scores were generated might direct policy and practice responses to be more effective.
Second, our analysis provides direction for improving instruction across the diversity of the ECEC day. In our analyses, Quality of Feedback has emerged as a positive predictor of long-term education outcomes (Thorpe et al., 2020b). This dimension of CLASS-IS focusses on the educator's response to child input rather than educator-initiated input, assessing "the degree to which a teacher provides feedback that expands learning and understanding and encourages continued participation" (Pianta et al., 2008, p. 69). Consistent with Houen's research (Houen, 2017;Houen et al., 2016), this dimension of instruction focusses on child inputs and child agency as a learner, not educator directed content (concept development) or language (language modelling) alone. A focus on engaging children as active agents in learning conversations is identified as a key focus for professional development to improve ECEC quality, and is one identified by Evidence for Learning Australia who have recently developed teacher resources based on a systematic review of Australasian evidence (Houen et al., 2019;Evidence for Learning, 2020). Importantly, the opportunity for child-led learning activity is more likely to occur outside the whole group format, yet the evidence available suggests this is where higher CLASS-IS scores are achieved. One key focus, therefore, might be activities and times of day when evidence shows that opportunities for educational exchanges are least likely to happen. For example, mealtimes score very low in instructional quality yet present important opportunities for such educational interactions (Thorpe et al. Thorpe, Rankin, et al., 2020).
Third, current studies of CLASS utilise the standard CLASS-IS measure that we originally applied in E4Kids. This is scored as a rate/time and therefore has the implicit assumption that density of educator input equates to quality. The greater variability in the rescaled measure that removes this assumption may serve to increase the predictive validity of the measure as it more finely discriminates between ECEC experiences with a range of 1-7 compared with a range of 1-5 (Styck et al., 2020). While other conceptualisations and methods of assessing instructional quality may better discriminate quality, for those studies that have already used or that choose to use CLASS applying the rescaling approach may offer a valuable additional approach in analyses.

Conclusion
We conclude that while the CLASS measure and other such measures have a place as a research instrument in assessing ECEC quality, the underlying assumptions should be rigorously examined, and potential limitations acknowledged. Questioning standard measures that quantify ECEC quality, is a key part of responsible interpretation of research results and an essential prior step before advancing to subsequent policy and practice actions.