Introduction

Effective early reading instruction requires explicit understanding of language and literacy skills being taught and of how they can be taught successfully to a diverse group of young learners who bring to the task very different experiences with print prior to school. Over the last 30 years, a substantial body of research has suggested that teachers may lack the necessary knowledge of language and literacy skills to effectively teach children to read (e.g., Moats, 1994; Piasta et al., 2009; Porter et al., 2022). As this knowledge may no longer be intuitive to experienced readers, many teachers also overestimate their knowledge (e.g., Stark et al., 2016); in the absence of generally accepted expectations and assessment tools, even reflective practitioners may find it difficult to know what it is that they don’t know. Further, acquiring the knowledge does not always translate to better instruction (e.g., Arrow et al., 2019; Foorman & Moats, 2004) as the new knowledge has to be amalgamated with the existing knowledge and beliefs about learning and translated into effective lesson plans that are then delivered competently in classrooms. For example, many teachers learned about the three-cuing system in their initial teacher education and may readily add phonics instruction to it but not discontinue encouraging the use of context and syntactic cues. In this study, we examined whether teachers’ knowledge of literacy and language constructs and their perceived ability to teach early literacy skills predicted the quality of their phonics lessons, and further, whether their knowledge and perceived ability affected students’ learning of reading either directly or indirectly via the lesson quality.

Teachers’ knowledge of language and literacy constructs was brought to focus by Moats (1994) who identified significant gaps in teachers’ knowledge of spoken and written language structure. According to Moats, the level of knowledge teachers demonstrated in her study was not sufficient to explicitly teach necessary spoken and written language structure to beginning readers or to struggling older readers. Numerous studies have since replicated the finding of knowledge gaps in multiple countries using diverse measures and examining different aspects of language and literacy knowledge (e.g., Aro & Björn, 2016; Bos et al., 2001; Goldfeld et al., 2021; McCutchen, Abbot et al., 2002a; McCutchen, Harry et al., 2002a, 2002b; Moats & Lyon, 1996; Piasta et al., 2009; Porter et al., 2022; Stark et al., 2016; Washburn et al., 2016). As a result, we can say with some certainty that teachers—whether pre- or in-service—have seldom displayed the knowledge of the structure of spoken and written language that researchers consider necessary for effective early reading instruction (Moats, 1994; Snow et al., 2005). We also know that their instructors in teacher education programs may not have that knowledge either (Binks-Cantrell et al., 2012; Joshi et al., 2009b) and that this information may be lacking or poorly represented in many teacher education programs (Coltheart & Prior, 2006; McCombes-Tolis & Spear-Swerling, 2011; Meeks et al., 2017) and early literacy textbooks (Joshi et al., 2009a). These observations would explain why pre-service teachers do not have the knowledge when they graduate and why this knowledge has to be obtained through ongoing professional learning.

Some of the language and literacy knowledge studies have also indicated that teachers are not particularly accurate at estimating their own knowledge and often overestimate their knowledge and ability to teach early literacy skills (Cunningham et al., 2004; Stark et al., 2016; Washburn et al., 2011). If the necessary knowledge needs to be acquired through ongoing professional learning, the overestimation of knowledge and ability can pose a serious challenge for motivating teachers to seek and benefit from the appropriate learning opportunities. Given the limited knowledge of language and literacy constructs that many teachers may still graduate with from teacher education programs (e.g., Washburn et al., 2016), it seems important that teachers seek and are offered effective professional learning opportunities in early language and literacy instruction.

Studies have also shown, perhaps unsurprisingly, that when teachers are explicitly taught basic language constructs in preservice teacher education or in teacher professional learning, their knowledge improves (e.g., Goldfeld et al., 2021; McCutchen, Abbot et al., 2002a; Moats, 1994; Moats & Foorman, 2003; Spear-Swerling, 2009; Spear-Swerling & Brucker, 2003, 2004). For example, Goldfeld et al. (2021) showed that compared to a control group, 4 days of face-to-face professional learning improved teachers’ knowledge of oral language and literacy constructs and the acquired knowledge was retained 1 year later. There is also some evidence that professional learning focused on language knowledge may improve student achievement (e.g., Bos et al., 1999; Podhajski et al., 2009), although in most studies to date the gains have been limited to only a few (e.g., Foorman & Moats, 2004; McCutchen, Harry et al., 2002a, 2002b; Spear-Swerling & Bruckner, 2004) or none of the measured constructs (Goldfield et al., 2022). Studies of direct effects of teacher knowledge on student performance have similarly shown limited effect (e.g., Carlisle et al., 2009, 2011).

These observations suggest that while knowledge may be necessary, it is not sufficient: Students can only benefit from teachers’ knowledge if it is translated into effective classroom instruction, and this may frequently not be the case (Arrow et al., 2019; Eadie et al., 2022). Piasta et al. (2009) noted that in the earlier teacher knowledge studies classroom instruction was mostly treated as a “black box” with no observations of what the actual instruction entailed. The few studies that have examined the knowledge–instruction interaction have provided somewhat conflicting results. McCutchen and Harry et al., (2002a, 2002b) reported that teacher phonological knowledge was associated with the use of explicit phonological awareness activities but not with the use of comprehension or writing activities. Both the teacher phonological knowledge and more frequent explicit phonological instruction also correlated significantly with reading performance at the end of the year for kindergarten students, but not for Grade 1 and 2 students. Puliatte and Ehri (2018) reported that better phoneme, spelling conventions and morpheme knowledge were positively associated with self-reported spelling strategy instruction but not with how much time teachers’ reported spending in teaching spelling. In turn, phoneme knowledge and self-reported instructional time and spelling strategy instruction were all positively associated with students’ spelling scores. Piasta et al. (2009) found no association between teachers’ knowledge of language and literacy constructs and the amount (measured in minutes) of explicit decoding instruction they provided. Finally, in their study of early childhood educators and emergent literacy outcomes, Piasta et al. (2020) examined both the direct and indirect effects—mediated by classroom practice—of educator knowledge assessed in Fall on children’s Spring print concept knowledge, letter name knowledge, phonological awareness, and oral language while controlling for children’s Fall performance on the same measures. Their results indicated that teacher knowledge predicted the change in phonological awareness directly and the change in print concept knowledge both directly and indirectly via classroom practice, whereas Spring letter name knowledge and oral language were only predicted by their respective Fall scores. The quality of classroom practice in this study was observed rather than self-reported, and the rubric attempted to capture both the overall quality of instruction and the quantity of instruction in book reading, print and letter knowledge, phonological awareness, written expression, and oral language.

To our knowledge, Piasta et al. (2020) is the only study that has statistically examined a full conceptual model similar to Fig. 1 below and that has implicitly or explicitly guided the earlier studies. However, as Piasta et al.’s (2020) study was conducted in early childhood settings, it did not include explicit phonics instruction that is a critical part of early reading instruction (e.g., Connor et al., 2004; Foorman et al., 2016; National Reading Panel, 2000). Further, their classroom quality measure did not capture teachers’ ability to differentiate the instruction according to the varied performance levels and needs of the students. In early literacy classes, students’ performance levels can vary from fluent reading to knowing only some of the letters and the teachers need to design and adapt instruction to keep all children engaged and learning (Arrow et al., 2015; Connor et al., 2009; see Puzio et al., 2020, for a recent meta-analysis). While most teachers naturally differentiate their instruction in response to individual student needs either by designing differentiated lesson plans or in their interactions with students who struggle, it is likely that the quality of the differentiation varies widely across teachers and could be related to teacher knowledge and student outcomes (e.g., Arrow et al., 2015).

Fig. 1
figure 1

Hypothesized model for teachers’ knowledge, instruction quality, and children’s outcome. Note: T1 = Time 1; T2 = Time 2

Figure 1 displays the assumed paths between teacher knowledge and perceptions, classroom practice, and student outcomes assessed in this study. The aim of the current study was to examine the associations between teachers’ knowledge of language and literacy constructs, their perceived ability to teach early literacy skills, their instructional practice (quality of reading instruction and differentiation of reading instruction), and students’ early reading outcomes in the context of implementing phonics lessons after attending short workshops (see below for details). We observed literacy lessons twice and deemed the instruction proficient when the teacher supported children’s acquisition and application of phonological awareness and phonics learning in a manner that kept children engaged. Differentiation was assessed as proficient when the teacher was observed effectively modifying the instruction according to the students’ needs (see “Appendix 1” for the rubric). We measured children’s decoding skills at the beginning of the study and again 4 months later with a measure closely aligned with the content of the phonics lessons. We also assessed children’s word reading skills with a standardized test at the end of the study. Teachers’ language and literacy knowledge and perceived ability were assessed at the beginning of the study together with information on years of experience and education, and again at the end of the study. We divided perceived ability into proximal and distal, with the proximal score encompassing teachers’ self-rated ability to teach phonemic awareness, phonics, and reading to typically developing children and children who struggle learning to read whereas the distal score included teachers’ self-rated ability to teach reading fluency, vocabulary, comprehension, children’s literature, and spelling.

More specifically, we attempted to answer the following five research questions:

  1. 1.

    Is teachers’ language and literacy knowledge related to their observed early reading instruction quality and their ability to differentiate instruction?

  2. 2.

    Is teachers’ language and literacy knowledge related to the effectiveness of their early reading instruction as measured by student outcomes?

  3. 3.

    Are the observed early reading instruction quality and ability to differentiate instruction related to the student outcomes?

  4. 4.

    Is the relationship between teachers’ language and literacy knowledge and student outcomes mediated by observed quality or differentiation of instruction?

  5. 5.

    Are the proximal and distal perceived abilities and years of experience related to knowledge, quality of instruction, differentiation of instruction, and student outcomes?

Methods

Participants

The participants used in this study were part of a larger project that aimed to examine the effects of providing Grade 1 teachers professional learning on evidence-based practices including phonological awareness and phonics, and on providing Tier 2 intervention to struggling Grade 1 children (Georgiou et al., 2021). Ethics permission for the project was obtained from the Research Board of the University of Alberta (Pro00091636).

To recruit the sample for this study, we first sent letters containing information about our study to parents/guardians of 1796 children attending Grade 1 in 42 public elementary schools (79 classes) in Edmonton, Canada. The schools were located in different parts of the city to increase representation of different demographics in our study. Parental consent was received for 1526 children. If children experienced sensory or intellectual difficulties, were not willing to participate, were absent at both testing periods, moved to a different school not participating in the study, or had recently immigrated to Canada and did not speak English well enough to follow instructions, they were excluded from the study (45 children). At Time 2, our sample comprised 1481 children (729 females and 752 males; Mage = 6.47 years, SD = 0.44). Eighty-two percent of the children were White, 12% East Asian, 4% Indigenous, 2% European, and 2% Other.

Teachers (n = 79; 76 females, 3 males) participated in the project by attending professional learning on evidence-based practices (see below) and by filling out a questionnaire before the onset of this study on their perceived ability to teach different literacy skills, their highest attained education, the number of years of teaching experience, and their knowledge of language and literacy constructs. The perceived ability and knowledge of language and literacy constructs were assessed again at Time 2. The teachers were observed twice delivering a 40-min reading lesson in order for us to evaluate the quality of their instruction and their ability to differentiate instruction (see below). Their written consent was obtained prior to participating in the study.

Measures

Children

Children were assessed at Times 1 (mid-September) and 2 (mid-January) on two experimenter-developed decoding tasks (one with real words and one with pronounceable pseudowords) and at Time 2 also on the blue form of the WRAT-5 Word Reading task (Wilkinson & Robertson, 2017). The two experimenter-developed decoding tasks comprised 20 items each and a participant’s score was the sum of the scores in the two tasks (max = 40). The items were words or pseudowords made of graphemes covered in the scope and sequence we provided to teachers during their professional learning prior to the start of the school year. Children were first asked to read aloud the list of real words and then the list of pseudowords. The lists were identical at Times 1 and 2, and presentation was discontinued after five consecutive errors. Cronbach’s alpha reliability in our sample ranged from 0.90 to 0.94.

In WRAT-5 Word Reading, children were asked to read aloud a list of 15 lowercase letters and 55 words of increasing difficulty. The test was discontinued after five consecutive errors and a participant’s score was the total number correct (max = 70). Cronbach’s alpha reliability in our sample was 0.92.

Teachers

Teachers filled out a questionnaire prior to the onset of the study and again in January at the conclusion the study. The questionnaire consisted of three sections (see “Appendix 1”). In Sect. 1, we collected information on teachers’ background (degree: BEd, BA + After Degree BEd, or graduate studies; and years of teaching experience). In Sect. 2, we asked teachers to rate their ability to teach different aspects of literacy. These questions were adapted from Washburn et al. (2011). Each question in this section had four choices: minimal, moderate, very good, and expert. We derived two scores from the teachers’ responses in this section. The Perceived Proximal Ability score (max = 16) was the sum of the scores on items asking teachers to rate themselves in teaching (a) reading to typically-developing children, (b) reading to struggling readers, (c) phonemic awareness, and (d) phonics. The second, the Perceived Distal Ability score (max = 20), was the sum of the scores on items asking teachers to rate themselves in teaching (a) reading fluency, (b) vocabulary, (c) comprehension, (d) children’s literature, and (e) spelling. In Sect. 3, we asked 46 questions assessing teachers’ knowledge about language and literacy constructs. The questions covered syllable awareness, phonemic awareness, phonics, and morphological awareness and were sampled from previous questionnaires on teachers’ knowledge about language and literacy constructs provided to us by Dr. Malatesha Joshi (e.g., Binks-Cantrell et al., 2012; Washburn et al., 2011). The Time 1 Cronbach’s alpha reliability in our sample was 0.71, and Time 1 and 2 questionnaire scores correlated 0.58, indicating reasonable stability.

Teacher professional learning

We offered three professional learning sessions to teachers: one at the end of August (prior to students coming to school in September for the new school year), one at the end of October, and one at the end of January. Each session was about 3 h long and the teachers attended it in groups of 25–30 in a conference room at the school division’s headquarters. In the first session, the focus was first on understanding Phonological Awareness (PA) and Phonics (including the difference between them) and the key competencies on the developmental continuum of PA (Bos et al., 1999). A collection of evidence-based PA activities (e.g., sound identification, blending and segmenting words into sounds) were shared along with video examples. This involved explaining to teachers what each of these concepts are, how they differ from each other, and why they should be taught. They were also given a scope and sequence of 60 grapheme-phoneme correspondences and example lesson plans on teaching phonological awareness and phonics corresponding to the scope and sequence. The scope and sequence were based on the GPC frequencies presented in Vousden et al. (2011), apart from us moving several vowels earlier in sequence to be able to spell more real words. The first six graphemes covered were s, a, t, l, p and i. All materials were made available to the teachers through a shared folder in Google Drive. We also provided the teachers with multiple examples of how to differentiate instruction to meet the needs of all their students. In the second session, teachers reviewed the phases of reading development and orthographic mapping (Ehri, 2014), the scope and sequence for teaching high frequency Grapheme-Phoneme Correspondences (GPCs), and previewed a 30-min Phonics lesson. In this lesson, students are led through a brief PA drill and then taught a target GPC of the day using explicit instruction. Next, they are given time to practice reading the new GPC through an interactive application activity (e.g., Roll and Read). The lesson concludes with shared book reading where they are asked to identify and read words that contain the taught GPC in decodable text (Savage et al., 2018). During this meeting, we also shared with teachers progress monitoring tools for both phonological awareness and phonics and showed them how to use the tools. In the third session, teachers were introduced to the concepts of morphology and Morphological Awareness (MA). They completed a brief training in an instructional approach called Structured Word Inquiry (Bowers, 2009) and were provided with morphology activities from the Florida Center for Reading Research (www. fcrr.org). Both resources offered instruction on teaching morphemes (the smallest unit of meaning in words), morphological structure, and word analysis.

Teaching observations

Eight graduate students were trained to perform observations of reading lessons at each participating class. Two independent raters observed two 40-min reading lessons of each teacher with about 10 days interval in between. The same raters observed the same teacher at both times. We developed a rubric based on the treatment integrity tool used in Savage et al. (2018; see “Appendix 2”). The rubric included two dimensions of teaching (quality of teaching and differentiation of instruction) that were each scored either as limited (0 points), meets (1 point), or proficient (2 points). The final score for each teacher on the two observed dimensions was the sum score from the two observations. The initial interrater agreement was 0.95 and where disagreements existed, the raters discussed their ratings and came to an agreement on the score.

Procedure

Children were tested individually by trained graduate students in a quiet room at their respective schools during the school hours. Testing at both times lasted approximately 15 min. Teachers filled out the questionnaire before receiving any professional learning on evidence-based practices in teaching reading. The observations were carried out in November (about 3 months after the beginning of the school year) at a convenient day and time for the teachers. During the observations, the raters would sit at the back of the classroom in opposite sides.

Statistical analysis

The analyses were conducted using the multilevel modeling framework (Heck & Thomas, 2009) with Mplus (Version 8; Muthén & Muthén, 1998–2017). The multilevel modeling allows us to differentiate variances in the variables into two components: (a) variation due to differences between teachers (i.e., between-classroom variation) and (b) variation due to individual differences between children (i.e., within-classroom variation). To examine different predictors at both between- and within-classroom levels, we constructed a multilevel mediation model shown in Fig. 1 above. For the between-classroom level, the model included teachers’ years of experience, perceived ability (self-reported), and knowledge about language and literacy as predictor variables, quality of teaching and differentiation of instruction as mediator variables, and children’s performance on the experimenter-developed decoding tasks at Time 2 as an outcome variable. Degree was excluded from the predictors as there were only a handful of teachers with an after-degree B.Ed or a master’s degree. Children’s decoding skills at Time 1 were also included in the model as a control variable. For the within-classroom level, the model included children’s age, gender, and decoding skills at Time 1 as predictor variables and phonics skills at Time 2 as an outcome variable. Next, to test the indirect effects of teacher variables at Time 1 (i.e., years of experience, perceived teaching ability, and knowledge about language and literacy) on children’s decoding skills at Time 2 via quality of teaching and differentiation of instruction, we performed mediation analysis (Hayes, 2018).

Little’s Missing Completely at Random test (Little, 1988) showed that our missing data (either due to children missing some tasks due to absences or missing responses in teacher questionnaires) were missing completely at random at both between- and within-classroom levels (between-classroom level: χ2 = 32.7, df = 37, p = 0.67; within-classroom level: χ2 = 12.7, df = 9, p = 0.18). All analyses handled missing data using full information maximum likelihood (FIML) estimation, which allowed the use of all observations in the data set to estimate the parameters in the models (Muthén & Muthén, 1998–2017). Model fit was examined using chi-square values and four fit indices: the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root-mean-square error of approximation (RMSEA), and the standardized root-mean-square residual (SRMR). CFI and TLI values above 0.95, RMSEA values below or at 0.06, and SRMR values below 0.08 indicate good model fit (Kline, 2015).

Results

Descriptive analysis

Table 1 reports the teachers’ perceived ability to teach different reading skills and students as well as the proportion of correct answers they provided to the language and literacy knowledge questionnaire. In general, the perceived ability means are comparable to what has been reported in the literature for North American (Cunningham et al., 2004) and New Zealand (Arrow et al., 2019) teachers, and slightly higher than those reported for preservice teachers (Washburn et al., 2011) using similar questions. The Language and Literacy Knowledge proportion correct score of 0.54 at Time 1 is comparable to 0.57 Arrow et al. (2019) reported for New Zealand teachers, lower than 0.67 Washburn et al. (2016) reported for Canadian preservice teachers, but comparable to their reported values for English, US, and New Zealand preservice teachers.

Table 1 Teachers’ self-reported (perceived) teaching ability

Table 2 reports the descriptive statistics for all measures used in the study. Intraclass correlations (ICCs) of children’s variables showed that between classrooms differences in decoding skills were statistically significant: 6% [90% CI (0.03, 0.10)] and 13% [90% CI (0.09, 0.20)] of the total variability at Times 1 and 2, respectively, reflected shared variance at the classroom level. Time 2 WRAT standard score mean indicates that our sample of Grade 1 students was reading slightly below the average for what would be expected of North American children at this point. The teachers, on average, had 12 years of experience and their teaching quality and differentiation of instruction was rated slightly above “meets the standard” criteria.

Table 2 Descriptive statistics for the measures used in the study

Table 3 reports the correlations among all the variables in the study. Not surprisingly, children’s decoding skills at Times 1 and 2 were significantly correlated with their Time 2 WRAT performance. Children’s decoding skills at Time 1 were negatively correlated with differentiation of instruction (r = − 0.30), while Time 2 decoding skills were positively correlated with teachers’ years of experience (r = 0.24). Years of experience was also positively correlated with teachers’ perceived teaching abilities (both proximal and distal; rs ranged from 0.29 to 0.36 across time points). Distal, but not proximal, perceived ability at Time 1 correlated with Time 1 decoding and Time 2 WRAT scores. In addition, teachers’ knowledge about language and literacy at Time 1 correlated significantly with the quality of teaching (r = 0.31) but not with the differentiation of instruction (r = 0.19). Time 2 language and literacy knowledge score correlated significantly only with Time 1 score and Time 2 proximal perceived ability.

Table 3 Within-level (below the diagonal) and between-level (above the diagonal) correlations among the observed variables

Multilevel modeling

Figure 2 shows the final multilevel model for teacher variables and children’s decoding skills. The model fit was excellent, χ2(20) = 16.25, p = 0.70, CFI = 1.00, TLI = 1.00, RMSEA = 0.00, SRMRwithin = 0.02, SRMRbetween = 0.06. As expected, decoding skills at Time 1 predicted significantly decoding skills at Time 2 both in the individual and classroom levels (βs = 0.69 and 0.55, ps < 0.001). At the between-classroom level, teachers’ knowledge about language and literacy was associated significantly with the quality of teaching (β = 0.31, p < 0.01), whereas the association with the differentiation of instruction approached significance (β = 0.18, p = 0.08). In addition, children’s higher Time 1 decoding skills were associated with less differentiation of instruction (β = − 0.33, p < 0.01), whereas more differentiation of instruction (β = 0.25, p < 0.05) and teachers perceived proximal ability of teaching (β = 0.22, p < 0.05) uniquely predicted decoding skills at Time 2 over and above the autoregressive effect of decoding skills at Time 1. In turn, at the within-classroom level, children’s gender, but not age, predicted decoding skills at Time 2 after controlling for decoding skills at Time 1, indicating that the improvement was slightly larger for girls than for boys. Finally, the results of mediation analysis showed that the indirect effect of teachers’ knowledge on children’s decoding skills at Time 2 via differentiation of instruction was not statistically significant (estimate = 0.05, p = 0.18). As the association of decoding to differentiation changes direction, no estimate for the indirect effect was calculated.

Fig. 2
figure 2

Final model for teachers’ knowledge, instruction quality, and children’s phonics outcome. Note: T1 = Time 1; T2 = Time 2. †p = 0.08; *p < 0.05; **p < 0.01; ***p < 0.001

Figure 3 shows the final multilevel model for teacher variables and children’s WRAT word reading scores at Time 2. The model fit was excellent, χ2(13) = 10.52, p = 0.65, CFI = 1.00, TLI = 1.00, RMSEA = 0.00, SRMRwithin = 0.01, SRMRbetween = 0.06. As above, teachers’ knowledge was associated significantly with the quality of teaching and the association with differentiation of instruction approached significance. For WRAT, however, years of experience instead of perceived proximal ability predicted the Time 2 outcome (β = 0.29, p < 0.05). In turn, the positive association of quality of teaching with Time 2 word reading score approached significance (β = 0.38, p = 0.05), whereas the differentiation of instruction was significantly and negatively associated with the word reading scores (β =  − 0.44, p < 0.05). These last two results indicate that differentiation was more common in classrooms with more struggling readers but when that is controlled, better quality instruction was observed in classrooms with better reading skills. The results of mediation analysis showed that the indirect effect of teachers’ knowledge on children’s word reading skills at Time 2 via quality of teaching approached significance (estimate = 0.13, p = 0.09). When Time 1 decoding score was added to the model (Fig. 4), it was a highly significant predictor (β = 0.61 on teacher and 0.79 on the individual level, both p < 0.001) and explained an additional 25% (teacher) and 61% (individual) of Time 2 word reading variance. Years of experience remained a significant predictor of Time 2 word reading (β = 0.24, p < 0.05), but neither quality nor differentiation of instruction predicted word reading after earlier decoding was controlled.

Fig. 3
figure 3

A model for teachers’ knowledge, instruction quality, and children’s word reading outcome. Note: WRAT RS, WRAT raw score; T1 = Time 1; T2 = Time 2. †p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001

Fig. 4
figure 4

Final model for teachers’ knowledge, instruction quality, and children’s word reading outcome. Note: WRAT RS, WRAT raw score; T1 = Time 1; T2 = Time 2. †p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001

Finally, we reran the same models as in Figs. 2 and 4 using the teacher’s perceived teaching abilities and knowledge about language and literacy at Time 2 (Figs. 5, 6). The teacher variables at Time 2 were included in the models at the same level with the quality and differentiation of instruction as mediators. Both models showed a good fit (Fig. 5: χ2 = 13.71, df = 19, p = 0.80, CFI = 1.00, TLI = 1.00, RMSEA = 0.00, SRMRwithin = 0.01, SRMRbetween = 0.07; Fig. 6: χ2 = 15.99, df = 21, p = 0.77, CFI = 1.00, TLI = 1.00, RMSEA = 0.00, SRMRwithin = 0.02, SRMRbetween = 0.09). The results showed that years of experience predicted teachers’ perceived teaching abilities (both proximal and distal; βs = 0.27–0.31, respectively). Distal, but not proximal, perceived ability at Time 2 was negatively associated with children’s outcomes at Time 2 (βs =  − 0.22, p < 0.05 and − 0.19, p = 0.07 for decoding and word reading scores, respectively). As the association of years of experience to distal perceived teaching ability changes direction, no estimate for the indirect effect was calculated.

Fig. 5
figure 5

Final model for teachers’ knowledge at time 2, instruction quality, and children’s word reading outcome. Note: T1 = Time 1; T2 = Time 2. *p < 0.05; **p < 0.01; ***p < 0.001

Fig. 6
figure 6

Final model for teachers’ knowledge at time 2, instruction quality, and children’s word reading outcome. Note: WRAT RS, WRAT raw score; T1 = Time 1; T2 = Time 2. †p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001

Discussion

In the current study, we examined the associations between teachers’ knowledge of language and literacy constructs, their perceived ability to teach early literacy skills, their instructional practices, and students’ early reading outcomes. The model we tested assumed that teachers’ knowledge of language and literacy skills could affect students’ performance by improving the quality of phonics lessons in general or by improving differentiation of instruction observed in those lessons. We also included a direct link from teacher knowledge to child outcomes, thus accounting for possible impact of knowledge on teacher behaviours our observations did not capture. Perceived ability to teach reading was first divided into proximal (self-reported ability to teach phonics, phonemic awareness, and reading to typically developing children and struggling readers) and distal (self-reported ability to teach reading fluency, vocabulary, comprehension, spelling and children’s literature) and then included in the model similarly to teacher knowledge with both direct and indirect effects to student outcomes. Finally, we included teachers’ years of experience as well as predictor of reading outcomes and controlled for earlier decoding skills.

As is typically the case, the strongest predictor of a skill was the skill itself at an earlier time point. Decoding at Time 1 accounted for most variance in decoding and word reading at Time 2, making the analyses of the impact of the remaining variables somewhat conservative. In relation to our first research question, Figs. 2 and 3 show that teacher knowledge was associated with both the quality of instruction and the differentiation of instruction we observed during the reading instruction. It did not, however, directly predict reading outcomes at Time 2, and for any substantive indirect effects to eventuate, the quality of teaching and differentiation of instruction variables needed to be positively associated with the reading outcome variables at Time 2 over and above what the skill itself accounted for. In general, this was not the case in our study. The exception to this was the effect of differentiation of instruction to Time 2 decoding skills where the pattern of results suggested that poorer Time 1 decoding skills in the classroom lead to more differentiation of instruction, which then contributed to better decoding skills at Time 2. Teacher knowledge also contributed to differentiation, although this association only approached significance.

While this last result is encouraging, as is the fact that teacher knowledge was significantly associated with the quality of instruction, the limited strength of the associations raises questions about the quality and sufficiency of the information that can be obtained about classroom instruction during short visits, such as those used in this study. However, Piasta et al. (2020) recorded a full day of instruction in early childhood classrooms and analysed the recordings using a considerably more complex rating scale than what we used in this study, yet their results showed no significant correlations between classroom practice and child outcomes. While it is possible that variation between classrooms is small and the teacher impact on variability in learning outcomes limited (e.g., Olson et al., 2014), it is also possible that the current observation schemes do not capture some of the critical elements of early literacy instruction. Specific to the current study, it is also possible that while our observation rubrics resulted in high inter-rater reliability, they were not sufficiently valid for the purpose and may have failed to capture the important aspects of variability in instructional quality and differentiation sufficiently with their three-point scales and limited set of questions. One implication of these results would then be that we need more research on what efficient early literacy instruction entails and how it can be assessed in a more fine-grained, yet reliable, manner. At a practical level, our results (together with those of Piasta et al., 2020) question the validity of current assessment methods and encourage further observational studies into the details of effective literacy instruction for diverse groups of students (see e.g.. Pressley et al., 2007).

In line with previous studies, the teachers in this study rated their ability to teach different early literacy content areas somewhere between “moderate” and “very good” as a group and their ratings improved during the study. Given our students’ mean WRAT standard score of 95, we could argue that similar to existing results, they may have overestimated their ability to teach at least the word reading skills that WRAT most closely assesses. However, self-rated ability to teach different aspects of early literacy skills was positively correlated with some of the outcome measures. Perceived ability, both in Time 1 and Time 2, was also positively correlated with years of experience, and the proximal ability predicted Time 2 decoding after controlling Time 1 decoding—these results could be interpreted to indicate that teachers perceive their ability to meet their students’ needs developing over time, and that this assessment is not entirely incongruent with their actual ability. We should note, however, that the perceived ability scores did not correlate with observed quality of instruction or with the language and literacy knowledge, with the exception of Time 2 perceived proximal ability and language and literacy knowledge. Together with a large improvement in language and literacy knowledge from Time 1 to Time 2, this last result could indicate an effect of the professional development on what teachers knew about the structure of language and how they perceived that knowledge helping them to design better basic reading instruction.

Several existing studies have indicated that teachers lack knowledge of the language and literacy constructs researchers frequently consider necessary for high-quality, explicit, early reading instruction. Not surprisingly, teachers in our sample fared no better with the language and literacy knowledge questionnaire, although their knowledge increased during the study. Teacher knowledge correlated positively with the quality of their instruction, but not with any of the reading measures included in this study. The direct association between teacher knowledge and student outcomes has been difficult to establish (e.g., Carlisle et al., 2009, 2011; McCutchen, Harry et al., 2002a, 2002b), which was one of the reasons for including the assessment of instructional practice in this study. Having knowledge about language and literacy constructs does not necessarily mean that the teachers can or will apply that knowledge in their teaching practice. For example, Arrow et al. (2019) noted that in their sample of New Zealand teachers, teachers with high levels of explicit linguistic knowledge did not apply that knowledge to their teaching practice, with the exception of giving slightly more word-level (as opposed to contextual) prompts when children made reading errors. Assuming the connection between language and literacy knowledge and both quality of instruction and student outcomes may be too simplistic without understanding the additional knowledges needed, and possibly how local communities of practice can shape beliefs (Nuttall, 2010) and constrain how knowledge teachers bring to the school can be implemented in practice. We clearly need more studies on how and under what conditions knowledge is translated into practice.

We would, however, be negligent if we did not consider an alternative explanation to the lack of significant associations: Maybe the knowledge assessed by the questionnaire used in this study—and those used in earlier studies including very similar items—is not the knowledge that matters the most. While the existing studies have provided sufficient reliability and construct and content validity information, we could argue that the predictive validity remains to be established. If the goal is to understand what teachers need to know to provide high quality early literacy instruction and thereby improve student outcomes, predictive validity of the tool used to assess that knowledge seems of paramount importance. This leaves us with two recommendations for future research: examine both knowledge of content and practice, and examine them broadly in relation to classroom instruction and student outcomes to establish what the critical aspects driving better student performance are.

Finally, we should note several limitations of the current study that need to be considered. First, we did not assess motivational factors that may have significantly affected in particular the quality of phonics instruction we observed. The teachers in this study were required to attend the workshops as part of their mandated professional learning and the workshops were developed by the university researchers without consultation with the teachers. It is possible that co-designing workshops with teachers to address what they consider as their most significant needs would lead to different outcomes, and likely to including more translational knowledge in the workshops. This is another avenue that professional learning research may want to pursue to establish reliable pathways from knowledge to practice to student outcomes. Second, there clearly is a lot of work to be done with validating the classroom observation tools and making sure they capture the critical content. Since establishing the indirect effects in this study depended on classroom practices being positively associated with the reading outcomes, it is possible that better observation tools would reveal significant indirect effects we were not able to observe.

In conclusion, teachers’ knowledge of language and literacy was associated with the observed quality of their early reading instruction, but neither was predictive of later reading outcomes when earlier reading is controlled. We suggest that more research is needed both on what knowledge and instructional practices matter the most and on how and under what conditions the knowledge is translated to effective practice.