Introduction

Decades of reading research have led to sophisticated scientific evidence about effective reading instruction. In their best-evidence synthesis on effective reading programs for elementary school children, Slavin et al. (2009a, 2009b) concluded “what matters for student achievement are approaches that fundamentally change what teachers and students do together every day” (p. 1453). But what is it that teachers and students do together every day that needs to be changed? What does business-as-usual (BAU) reading instruction look like, and how different is it from evidence-based reading interventions? Gaining insight into both the content and the structure of BAU reading instruction is important because (a) it helps us interpret and understand findings from randomized controlled trials in which control groups receive BAU reading instruction, and (b) it can reveal which principles of good reading instruction are systematically neglected by teachers. Thus, investigating BAU can help to identify key topics that should be addressed in research, teacher education, and professional development.

Reading instruction always takes place in a specific cultural context and depends on the respective educational system. In this study, we conducted observations in German 2nd grade classrooms. In particular, we focused on whether BAU includes instructional elements that have shown to positively influence reading competence, and we investigated whether and how these evidence-based elements cluster into a meaningful structure. In the following, we first highlight the sub-skills that should be promoted in schools because they comprise the construct of reading literacy; then, we present some general criteria for describing the content of BAU reading instruction. Finally, we provide information on some key evidence-based elements and summarize findings on the status quo of reading instruction in primary school.

Reading literacy

Reading literacy is defined as “the ability to understand and use those written language forms required by society and/or valued by the individual. Readers can construct meaning from texts in a variety of forms. They read to learn, to participate in communities of readers in school and everyday life, and for enjoyment” (Mullis et al., 2017). This definition illustrates that reading literacy encompasses numerous sub-skills. While some precursor skills (e.g., phonological awareness) are acquired as early as kindergarten (Carroll et al., 2003), reading instruction in school typically begins with letter knowledge (Bremerich-Vos et al., 2012). Once grapheme-phoneme correspondence is established, young readers must engage in accurate and automatic decoding of words, learn to read at a sufficient pace, and use different reading strategies to eventually gain a deep understanding of a text (e.g., Perfetti et al., 2005). Simultaneously, they also need to make inferences from the written content to produce a coherent mental representation of the text (van Dijk & Kintsch, 1983). These different aspects of reading—accurate reading, fluent reading and reading comprehension—are interdependent: Once students can decode words automatically, they can read more fluently. High reading fluency, in turn, allows them to focus on comprehending the entire text due to released cognitive resources (Wolf & Katzir-Cohen, 2001).

Thus, these performance-related sub-skills of decoding, reading accuracy, reading fluency, and reading comprehension should be fostered in primary school; further, they are complemented by other elements of reading literacy that should also be encouraged. For instance, when planning and implementing reading instruction, teachers should also consider motivational constructs, such as students’ reading motivation (Guthrie et al., 2007), self-efficacy (Bandura, 2013) or academic self-concept (Marsh, 1993), as all three have been found to be positively related to reading achievement (Chapman & Tunmer, 1997; Hebbecker et al., 2019; Retelsdorf et al., 2014; Schiefele et al., 2012; Schunk, 2003). Another aspect to consider is that students differ significantly in their competency of the various sub-skills (Mullis et al., 2017; OECD, 2019), usually making it necessary to differentiate instruction for different students or groups of students.

Taken together, reading literacy is multifaceted and highly complex. Teachers face the enormous challenge of not only knowing and understanding all sub-skills involved in reading but also knowing which methods to use to best support the reading development of students with varying levels of proficiency (Cunningham et al., 2009; Joshi et al., 2009).

General aspects of reading instruction

Teachers can design reading instruction using a large variety of methods and materials and can encourage students to engage in reading in various ways. For instance, instruction can be teacher centered, where the teacher acts as a lecturer by presenting information and expecting students to passively receive knowledge or learn skills. Moreover, the teacher can read aloud to students while providing additional explanations and demonstrate the use of strategies. In contrast, students may work on reading-related exercises by themselves or with other students (e.g., with a partner or in a group) and be supported by the teacher as they complete the exercises. Further, students can engage in silent reading, read aloud to other students or become involved in a combination of reading and writing.

Teacher-selected materials may be self-constructed or drawn from books, reading cases placed in the classroom, or specific programs that include multiple types of content (e.g., continuous or discontinuous texts). In addition, teachers can use supplemental learning software as well as (self-)assessments for students. While many different ways to teach reading are available, research points to certain evidence-based instructional approaches that have been shown to foster reading literacy (e.g., Mullis et al., 2003; NICHD, 2000; Slavin et al., 2009a,2009b). In the following, we review some elements of the empirically evaluated reading instruction methods that have been shown to affect reading outcomes.

Evidence-based elements of reading instruction

Among the best-known and effective approaches for promoting reading literacy are Success for All (Slavin et al., 2009a, 2009b), Reciprocal Teaching (Palinscar & Brown, 1984), Concept-oriented Reading Instruction (CORI; Guthrie et al., 2004) and Peer-assisted Learning Strategies (PALS; Fuchs et al., 1997). In all these programs, teachers explicitly teach students to apply reading strategies. Further, all these programs include approaches that are student centered instead of teacher centered, and all of them apply cooperative learning settings. Thereby, once teachers have introduced effective reading strategies, students learn to monitor both their own and their peers’ reading process. Numerous literature reviews and meta-analyses provide support for the effectiveness of these instructional practices. For instance, Slavin et al. (2009a, 2009b) as well as the National Reading Panel (NICHD, 2000) also summarized and recommended approaches that include repeatedly reading aloud, using reading strategies, and providing tasks that can be implemented in cooperative settings. Overall, research suggests that several instructional approaches, which have shown to effectively foster student achievement should be incorporated and reading instruction be differentiated according to students’ needs.

In addition, cooperative approaches involve the possibility of employing different approaches for different levels of reading proficiency and adapting instruction to students’ needs, which is necessary due to the large heterogeneity in primary schools. To make accurate decisions on how to differentiate instruction, empirical research suggests that teachers need reliable and valid information on students’ reading achievement and reading progress, as the continuous use of assessment data positively impacts reading achievement (Stecker et al., 2005).

In the early school years, reading instruction should foster skills such as reading accuracy, reading fluency, and reading comprehension. An effective approach to foster reading accuracy is phonics instruction, in which the main goal is to teach children how graphemes (i.e., letters) are linked to phonemes (i.e., sounds), such as by teaching students to convert letters into sounds (NICHD, 2000). Accuracy of reading can also be efficiently promoted by syllable-based reading, for example by underlining or clapping syllables (Müller et al., 2017, 2020).

Reading fluency can be fostered by methods that encourage students to read aloud (NICHD, 2000). Two such well-evaluated methods are repeated reading and paired reading. In repeated reading, students work in pairs and take turns reading short texts aloud until they reach a sufficient level of fluency (Samuels, 1979); such improvements in reading fluency were highlighted in Therrien’s (2004) meta-analysis. In paired reading, which also increases reading fluency (Topping & Lindsay, 1992), students simultaneously read aloud to each other. Further, dyadic partner work allows less fluent readers to be paired with more fluent readers. Overall, repeatedly or simultaneously reading aloud helps students decode words quickly and automatically, which releases cognitive resources to be used for text comprehension.

Once students have gained a sufficient level of reading accuracy and reading fluency, they should learn to apply comprehension strategies, like underlining important content, generating questions in response to a text, and predicting what might happen next in a story (NICHD, 2000). By explicitly and directly teaching these strategies (e.g., by modeling, thinking aloud, and offering guided practice), teachers can encourage their students to reflect on what they are reading and can monitor their comprehension of texts (Block & Lacina, 2009; Duffy, 2002; Fuchs et al., 1997). Subsequently, teachers can withdraw their direct support in a step-by-step manner, thereby allowing students to engage in self-regulated reading.

In addition to providing cognitive reading strategies, teachers should also encourage students to use metacognitive strategies that aim to have students associate thoughts with the written content (e.g., by activating their prior knowledge; NICHD, 2000). For instance, teachers can encourage students to set their own goals, use a training plan, and evaluate their own reading process. Importantly, teachers can promote the use of such metacognitive strategies directly via classroom instruction (Paris et al., 1984); this has been shown to increase students’ reading comprehension (Boulware-Gooden et al., 2007).

Lastly, instructional approaches can also be aimed at increasing primary school students’ reading motivation; such approaches include optimizing student choice, providing support for student collaboration, and setting goals (Wigfield et al., 2014). In addition, reading motivation and reading achievement can be strongly affected by feedback, as both motivation and achievement have been shown to increase when students receive regular feedback on their reading progress (Hattie & Timperley, 2007).

Status quo of reading instruction

Many international observational studies have investigated the content of BAU reading instruction, specifically for students with learning disabilities. In two literature reviews on the topic, the general quality of reading instruction was assessed to be low (Vaughn et al., 2002), and evidence-based elements were not found to be frequently used (McKenna et al., 2015). In another study, Suárez et al. (2018) conducted observations of whole general education classrooms in Spain and found that fewer than half of the instructional approaches used by the observed teachers were recommended by the NICHD (2000). Similarly, Schumm et al. (2000) conducted teacher interviews as well as classroom observations in the US for 29 3rd-grade teachers and found that teachers generally did not differentiate instruction according to students’ needs; instead, they implemented whole-classroom instruction for all students, even in largely heterogeneous classrooms. Another observational study of 20 US-American 2nd-grade teachers and their classrooms conducted by Ness (2011) partly contradicted these findings; they found that on average, teachers incorporated reading strategy instruction 28.9% of the time. For instance, in 7% of the observed instances, students made predictions about what could happen in a story; in 4.3% of instances, students summarized texts.

Further evidence on reading instruction can be found in international large-scale assessments. For example, the Progress in International Reading Literacy Study (PIRLS) revealed that teachers of English-speaking countries (e.g., England, Scotland, and New Zealand) reported to form homogeneous student groups more often than teachers of other European countries (e.g., Germany, Greece, and Italy; Mullis et al., 2003). Teachers in German schools reported that they mainly implement reading instruction that is teacher centered and focuses on the whole class rather than pairing or grouping students according to their needs (Mullis et al., 2003). Surprisingly, results of non-western countries are contradictory and do not fully coincide with evidence-based practices in western countries. For instance, in the Russian Federation or the United Arab Emirates, the PIRLS revealed that letting students read silently was associated with better PIRLS results (Marôco, 2021). In Singapore, the amount of time teachers spent on reading instruction was found to be negatively related to students’ reading achievement.

In Germany, teachers reported that they tend to use a variety of organizational approaches during reading instruction (e.g., teacher-centered and student-centered approaches), and they reported using a computer for reading instruction considerably more (for a longer average duration) than in most other European countries (Tarelli et al., 2012). Also, according to the PIRLS, few teachers reported explicitly teaching students when and how to use reading strategies; this is a central part of the curriculum in Germany. In addition, the PIRLS used student and teacher questionnaires to assess how often teachers implement other features that have been shown to positively influence students’ learning growth (e.g., structured instruction and activities that are cognitively activating); the results indicate that teachers do not employ evidence-based elements on a regular basis. Furthermore, the results of the PIRLS revealed that teachers in Germany implement explicit reading instruction for about 90 h per school year, whereas the international mean is about 160 h.

Apart from these large-scale assessments using teacher self-reports, only a few other systematic observational studies have assessed reading instruction in Germany. In one study study, Kleinbub (2010) videotaped reading instruction in 41 classrooms and revealed that 4th-grade teachers in Germany mainly instructed their students to find information that was explicitly stated in texts but did not encourage them to make inferences or to use metacognitive strategies. Another observational study by Lotz (2016) who videotaped reading instruction revealed that students do not often engage in tasks that are cognitively activating.

Further insights into BAU reading instruction in Germany would be gained from intervention studies in which control groups receive no special support. However, while the treatment integrity of interventions has often been assessed, such studies tend to omit information on what happened during business-as-usual reading instruction in the control groups or the wait-list groups (e.g., Müller et al., 2020; Peters et al., 2021; Schünemann et al., 2013).

Purpose and research questions

Overall, studies report that teachers tend to implement teacher-centered reading instruction and do not differentiate instruction according to students’ needs (e.g., Mullis et al., 2003). Furthermore, results indicate that evidence-based elements to foster reading competence are only rarely implemented in BAU reading instruction (Kleinbub, 2010). Particularly in Germany, however, not much research has been conducted that addresses these concerns. The few studies that have focused specifically on BAU reading instruction have small sample sizes and often included higher primary school grades.

Thus, we aimed to investigate what constitutes BAU reading instruction and to what extent it can be described as evidence based. Knowledge on these issues is important to better qualify effects of intervention studies in which BAU reading instruction serves as the reference control condition (Century & Cassata, 2016). Moreover, such knowledge can serve as a basis for teachers’ professional development. Our main methodological approach was classroom observation (Hoffman et al., 2011), and we used teachers’ self-reports to help interpret the observational findings. In addition, we aimed to analyze whether teachers tend to combine certain evidence-based elements in BAU reading instruction; thus, we investigated whether there are certain clusters of different evidence-based elements. We addressed these exploratory questions for reading instruction in 2nd grade of general primary school in Germany.

Our specific research questions (RQs) were as follows:

  1. 1.

    What general aspects does BAU reading instruction encompass?

  2. 2.

    To what extent does BAU reading instruction encompass evidence-based elements?

  3. 3.

    What clusters of different evidence-based elements to foster reading competence can be identified in BAU reading instruction?

Method

Participants

We contacted and informed schools by phone or with an information letter. The secretaries or principals of the schools then informed the teachers and, if interested, appointments were made to observe reading instruction. The study involved 52 teachers and their classes from 30 different schools in mid-western Germany. The sample comprised 92.3% female teachers, who were on average 44 years old (M = 44.29, SD = 11.42) and had taught for about 18 years (M = 17.93, SD = 10.64). On average, teachers taught 23 students in each classroom (M = 23.20, SD = 3.30), which is comparable to the average class size in Germany (M = 21.60) and other European countries (M = 21.90; Tarelli et al., 2012). Participation in the study was voluntary. We did not collect data on student demographics or identity. Therefore, no informed consent was obtained from parents, which is not mandatory under institutional regulations.

Procedure

We employed systematic observation, which allows for analyzing sequences of and associations between different habitual behaviors that are perceivable in a natural context (Argilaga, 2003). Furthermore, it allows for a close link between research and classroom practice to be established (Hoffman et al., 2011). In addition to the observations, all teachers were asked to complete a questionnaire related to their BAU reading instruction.

All nine observers were student research assistants who attended a three-hour training by the first author of this article. The goals of the training were to establish a shared understanding of all aspects to be observed and to ensure standardized and reliable classroom observations. The observers received information on the content of a newly developed observation protocol and a handout with detailed explanations of each item of the protocol. In addition, they viewed videotaped sequences of reading instruction in primary school, which were not part of the dataset of the current study but were used only to prepare for the classroom observations. While watching the video, the observers were asked to rate each item of the observation protocol. The results were then compared, and items for which different ratings were given were discussed to establish a common frame of reference.

Each classroom was observed once. To obtain information about the extent of reliability and objectivity of the observations, the aim was to have as many lessons as possible observed by two student assistants. In total, 28 classrooms were observed by one person (i.e., only the principal observer) and 24 classrooms were observed by two persons (i.e., the principal observer and a co-observer). During each lesson, the observer(s) sat in a designated location designed to reduce distraction (e.g., the corner of the classroom). All observers began by coding background information, such as the date and number of students. Depending on the item, the observers documented how often they observed different aspects of reading instruction. The duration of a lesson in Germany is usually 45 min. Each lesson was split into three parts of 15 min each, and the observers completed one observation protocol for each phase (i.e., T1, T2, and T3). Thus, there were three observation protocols for each observed lesson. After each phase, the observers made a final rating for each observed behavior.

Measures

We developed the observation protocol to collect information on key aspects of reading instruction (as described in the Introduction section). The protocol was divided into two parts: In part A, general and miscellaneous aspects of reading instruction were coded. This part was further divided into social form (e.g., including items that are indicative of whether instruction was teacher centered or students worked with a partner), material (e.g., whether students worked with a book or a computer) as well as student and teacher behavior (e.g., whether students read aloud). In part B, evidence-based elements (e.g., behavior related to strategic reading) were coded (see Table 1). In the observation protocol, each item was labeled using a word or a few words implying which aspect it relates to (e.g., “book” or “syllable-based reading”). For each item, observers documented whether they observed an element. To gain information on the extent to which certain elements were employed and not only whether aspects were observed or not, it was documented whether an aspect was observed at least two times (by ticking a ‘yes’ box), once (by ticking a ‘sometimes’ box), never (by ticking a ‘no’ box) or could not be observed, e.g., due to too many disturbances or no focus on reading instruction (by ticking a ‘not codable’ box). Thus, the specific time (e.g., minutes) and/or quality with which specific elements were employed was not rated, but only the extent with which certain aspects were employed. For the social form items, it was not possible to choose “sometimes”, as reading instruction always takes place in a specific social form. The items of the observation protocol were rearranged for this article according to content areas. Please view the Open Science Framework (osf.io) for a complete version of the observation protocol, which also contained many items that we did not consider here in the analyses. For instance, observers rated the noise in each classroom, which was not relevant for the research questions of this article.

Table 1 Observed aspects during the classroom observations of reading instruction

Beyond classroom observations, all teachers completed a questionnaire on demographic information, aspects related to their experience, and the content of the observed lesson. They were also asked to state to what extent they usually use methods to differentiate reading instruction and how often they exclusively teach reading (rather than general German) on a scale from 1 (Never) to 4 (Very often). Furthermore, teachers stated whether the observed lesson was typical on a scale from 1 (Untypical) to 4 (Typical), and they made statements on how often they usually attend in-service training in general and specifically on reading instruction on the following scale: 1 (At least once per term), 2 (Once per term), 3 (Every second school year) or 4 (Every third school year or less). Finally, teachers were asked to indicate the objective of the observed lesson using an open-ended question.

Data analysis

To examine interrater reliability, Cohen’s kappa for each item at T1, T2, and T3 was computed. Overall, the principal rater and the co-rater agreed on most items (median of κ = 0.71, range of κ from 0.17 to 1.00); please view the Open Science Framework (osf.io) for all κ values. Therefore, to examine research questions 1 and 2, only observations of the principal observer were included in the final descriptive analyses. To describe BAU reading instruction, total numbers and percentages of “yes”, “sometimes”, “no”, and “not codable” indications for each phase of the lesson were calculated using SPSS Version 27.0 (IBM Corp., 2020) and Excel 2016. For the sake of parsimony, results were aggregated across T1, T2 and T3. In addition, the number of teacher responses to the open-ended question was counted. For the cluster approach, only items of the evidence-based elements and methods (part B of the observation protocol) were used.

To answer research question 3, we used exploratory graph analysis (EGA; Golino et al., 2020) as implemented in the R package EGAnet (Golino & Christensen, 2020). EGA is a psychometric network approach that allows for empirically determining the number of underlying latent dimensions (i.e., dimensionality reduction); this approach has been shown to outperform other approaches (Golino & Epskamp, 2017). To use all of the available information while also accounting for the ordinal nature of the data, we imputed missing data via multiple imputation as implemented in the package mice (Van Buuren & Groothuis-Oudshoorn, 2011) for the statistical software R (R Core Team, 2020). Specifically, we used the ratings made by the principal observer at the beginning, middle and end of a lesson as auxiliary variables to inform imputation of missing values of the co-observer’s ratings. As recommended in the missing data literature, we used 40 imputed datasets for our analyses (Azur et al., 2011).

Importantly, we set “non codable” items to missing values (median of percentages of missing values across phases and raters was 4.55%; range of percentages was 0% to 22.92%) to run imputation of the ratings on an ordinal scale (“yes”, “sometimes”, “no”). Hence, the feasibility of multiple imputation requires that the missing data pattern attributable to “non codable” items (and missing values in general) needs to be Missing At Random (MAR). Due to the design, a second reviewer was only used in some classes to check reliability. A second rater was not present in 54.17% of the observed classes. These missing values can be considered as MAR because the missing values are a consequence of availability of a second rater, but not of the unobserved variables (which would imply a Missing Not at Random mechanism; MNAR). In addition, we employed an explicit test of a MAR model vs. a MNAR model. Specifically, we fitted a linear response tree model (Debeer et al., 2017) with four endnotes (“yes”, “sometimes”, “no”, and “not codable”). The linear response tree model incorporated node-specific item-parameters across raters and phases (i.e., item-parameter estimates were constrained to be equal across raters and phases). This model had three internal nodes with the first internal node differentiating between a “non codable” process and a codable process. Multidimensionality in such tree models can be imposed on the internal nodes and we fitted a MAR model in which the latent variable underlying the first internal node was modeled as orthogonal to another latent variable underlying the codable process. A MNAR model can then be fitted when the correlation between these two latent variables is freely estimated. Information criteria (smaller values indicate better fit, while at the same time also model parsimony is taken into account) and a likelihood ratio test can be used to compare the MAR model with the MNAR model (see Debeer et al., 2017). We found that model fit for the MNAR model did not improve as compared to the MAR model (MAR model: AIC = 2611.5, BIC = 2957.2; MNAR model: AIC = 2612.9, BIC = 2965.1; Δχ2(1) = 0.63, p = 0.427). Hence, the MAR assumption was reasonable for the data subjected to multiple imputation.

To further estimate the consistency between observers based on the imputation approach, we averaged Krippendorff’s α for ordinal variables across the imputed datasets. Inter-rater consistency was good for all time points (T1: average α = 0.85; T2: average α = 0.86; T3: average α = 0.65). Hence, we proceeded with average ranks across observers for each time point to check time stability of the items; this was done using two-way consistency intra-class correlations for average measures. Results indicated that the items were quite stable across time points (median of ICCs = 0.87, range of ICCs from 0.32 to 0.94). Consequently, items were averaged across time points and averaged Fisher-z-transformed Spearman correlations across the imputed datasets. The average transformed correlation matrix was then back-transformed to the unit of correlations prior to exploratory graph analysis.

Results

Descriptive statistics of observations

Figures 1, 2, and 3 show the percentages of “yes”, “sometimes”, “no” and “not codable” responses to the items in part A, the general and miscellaneous aspects of reading instruction, as rated by the principal observer (RQ 1). Findings indicate that, on average, over 60% of the observations recorded students working individually, and ~ 55% of the observations recorded teacher-centered instruction. In contrast, cooperative social forms were observed in less than 30% (partner work) and less than 20% (group work) of the observations. Students mostly received material from their teacher or worked with a book or a booklet containing continuous or discontinuous text material. On a few occasions, students received material from their own map or reading cases located in the classroom. Students only rarely worked with a specific program or a computer. Regarding student behavior, it was observed that within the 15-min observation periods, students worked quietly on reading-related tasks in, on average, over 60% of the observations. In about 30% of the observations, students were allowed to choose their own reading content and read aloud to the whole classroom. Cooperative social forms, like reading aloud to a partner or a small group, were observed less frequently. Writing in combination with reading hardly occurred before reading (< 5%), partly during reading (30%) and more after reading (50%). Teachers read aloud to students very infrequently, but they did give additional explanations and supported single students.

Fig. 1
figure 1

Bar chart displaying indications for the subcategory “Social form” of Part A general/miscellaneous aspects of reading instruction. Note. “Sometimes” was not an option for these items. TC = Teacher centered, SP = Single-person work, PW = Partner work, GW = Group work

Fig. 2
figure 2

Bar chart displaying Indications for the subcategory “Material” of Part A: general/miscellaneous aspects of reading instruction. Note. TEA = Teacher, MAP = Own map, BOOK = book, BL = Booklet, CASE = Reading cases, PGM = Specific program, PC = Computer, CTM = Continuous text material, DTM = Discontinuous text material

Fig. 3
figure 3

Bar chart displaying indications for the subcategory “Student and Teacher Behavior” of Part A: general/miscellaneous aspects of reading instruction. Note. EX = Exercises related to reading, SR = Silent reading, FR = Free reading, SRC = Students read aloud to the whole class, SRG = Students read aloud to small groups, SRP = Students read aloud to partner, SRT = Students read aloud to themselves, TRC = Teacher reads aloud to the whole classroom, TRG = Teacher reads aloud to a small group, TRS = Teacher reads aloud to single students, SWB = Students write before the reading process, SWD = Students write during the reading process, SWA = Students write after the reading process, EXP = Teacher gives additional explanations, SUP = Teacher supports single students

Figure 4 displays percentages for the “yes”, “sometimes”, “no” and “not codable” indications for the Part B items on evidence-based aspects and methods of reading instruction (RQ 2). It was observed that on average, tasks were at least sometimes clearly differentiated by ability in about 50% of observations. Differentiation, however, was rarely based on assessment results, as this occurred in < 20% of the observed classes. A comparatively frequently observed evidence-based activity was teacher feedback.

Fig. 4
figure 4

Bar chart displaying Indications of Part B: evidence-based elements of reading instruction. Note. STR = Material that instructs the use of reading strategies, AB = Material used is assessment based, DIF = Material used is differentiated, SYL = Syllable-based reading, DEC = Explicit strategic processing of the text (declarative), CON = Explicit strategic processing of the text + reference to benefit (conditional), PRO = Explicit strategic processing of the text + illustration of steps (procedural), MOD = Modeling, ATH = Strategy “Attend to heading”, SUM = Strategy “Summarize”, PRE = Strategy “Predict”, CDW = Strategy “Clarify difficult words”, UIC = Strategy “Underline important content”, GQT = Strategy “Generate questions to the text”, GS = Goal setting, TP = Training plan, EVA = Evaluation of the reading process, FB = Feedback, PR = Paired reading, RR = Repeated reading, PAL = Peer-assisted learning strategies (PALS)

Regarding the role of strategies in reading instruction, in about 30% of the observations, material was used that included an explanation and instruction of reading strategies. The explicit instruction of a strategy by the teacher, however, was observed only about half as frequently, in ~ 15% of observations, and modeling the use of a strategy either by a teacher or a student was almost never observed. In general, the use of cognitive reading strategies (e.g., attend to heading, summarize, clarify difficult words) and metacognitive reading strategies (e.g., goal setting, training plan, evaluation of the reading process) was observed, on average, in about 20% of the classes, with no systematic differences in the frequency of cognitive and metacognitive strategies. Systematic, evidence-based approaches like syllable-based reading, paired reading, repeated reading, and PALS only played a minor role, as they were only observed sporadically.

Teacher self-report

Table 2 shows the results of the teacher questionnaire. Most teachers stated that they often use methods to differentiate instruction, that they sometimes exclusively implement reading instruction, and that the observed lesson was typical for what they normally do. Furthermore, most teachers indicated that they usually attend in-service training about once per school year, and in-service training specifically related to reading instruction every other school year. Regarding the goal of the observed lesson, 23 of the 39 responding teachers indicated in the open-ended question that they wanted to promote student comprehension, 9 teachers intended to increase student reading motivation, 5 teachers reported that the goal was to foster student reading fluency, and 4 teachers worked toward differentiated instruction.

Table 2 Results of the teacher questionnaire

Clusters of evidence-based elements

The exploratory graph analysis, shown in Fig. 5, indicates two clusters of evidence-based instructional elements. Refitted as a CFA the two-cluster structure displayed acceptable fit (χ2(118) = 121.16, p = 0.402; RMSEA = 0.024, SRMR = 0.096, CFI = 978, TLI = 0.975). The latent variable correlation was r = 0.51 (SE = 0.14, z = 3.78, p < 0.001) which highlights that observing evidence-based practices from both clusters generally tend to go together.

Fig. 5
figure 5

Cluster structure of evidence-based components of reading instruction. Note. Solid lines imply a positive relationship. Dashed lines imply a negative relationship. PR = Paired reading, GS = Goal setting, TP = Training plan, UIC = Strategy “Underline important content”, DEC = Explicit strategic processing of the text (declarative), PRO = Explicit strategic processing of the text (procedural), SUM = Strategy “Summarize”, MOD = Modeling, CON = Explicit strategic processing of the text + reference to benefit (conditional), ATH = Strategy “Attend to heading”, CDW = Strategy “Clarify difficult words”, PRE = Strategy “Predict”, GQT = Strategy “Generate questions to the text”, RR = Repeated reading, EVA = Evaluation of the reading process, SYL = Syllable-based reading, PAL = Peer-assisted learning strategies (PALS)

The first identified cluster is mainly made up of positive relations between explicit strategic processing (both declarative and conditional), using a training plan and evaluation of the reading process (i.e., metacognitive strategies), and underlining important information. A negative association within this cluster was found between paired reading and summarizing. The average variance explained for this cluster was AVE = 0.23 which highlights the fact that some of the items appeared to be quite marginal for this cluster (reliability of this cluster, however, was acceptable as indicated by ω1 = 0.71; Bollen, 1980). This impression was further highlighted by observing rather small and non-significant factor loadings for syllable-based reading, explicit strategic processing (procedural), goal setting, paired reading, and peer-assisted learning strategies (all these items appeared to be more marginal also in Fig. 5).

The second cluster points to strong relations between reading strategies (i.e., paying attention to the heading, clarifying difficult words, making predictions, and generating questions about the text). This cluster indicates that lessons promoting the strategy of paying attention to the heading also tended to promote other cognitive strategies, and vice versa. Additionally, this second cluster contains modeling the strategy, and the method of repeated reading. This cluster had better measurement quality as compared to the first cluster. All items of this cluster had significant loadings and all standardized loadings were > 0.39 (range: 0.40—0.78). Also, average variance explained (AVE = 0.39) and reliability (ω1 = 0.78) were higher as compared to the first cluster.

Discussion

Summary of the findings

In our study, we investigated which elements encompass BAU reading instruction and to what extent teachers incorporate evidence-based elements of reading instruction in 52 2nd grade classrooms of 30 German primary schools. Furthermore, to gain deeper insights into what constitutes typical reading instruction and to identify aspects that should be addressed in teacher trainings, we examined whether evidence-based elements of effective reading instruction cluster together. We conducted the study in highly ecologically valid settings and observations were supplemented by teachers’ self-reports, supporting the interpretation and integration of findings.

Regarding RQ 1, the classroom observations indicate that reading instruction in the observed 2nd grade classrooms in Germany is strongly teacher centered. In addition, students were predominantly observed to work individually and silently read continuous texts, which they received from their teacher or from books and booklets. Students often completed reading exercises, and they engaged in writing activities during or after reading. Partner or group work and reading aloud was rarely observed. In some cases, students were allowed to choose their own reading material. Teachers read aloud to students very infrequently, but they did give additional explanations and supported single students.

Regarding RQ 2, the observations revealed that apart from giving feedback, which was observed comparatively frequently, evidence-based elements of effective reading instruction, like the systematic instruction of cognitive or metacognitive reading strategies or structured programs to increase reading fluency and reading comprehension, were observed rarely and were mostly found in < 20% of the observed classrooms. This is in line with other observational studies (e.g., Kleinbub, 2010) and large-scale assessments (e.g., PIRLS, Mullis et al., 2017), which have indicated that reading instruction in German primary schools rarely contains evidence-based elements. Teachers did partly differentiate instruction, but the selection of reading material was only rarely observed to be informed by assessments.

The results of the questionnaire partly contradict the results of the classroom observations, indicating that teachers had a different perception of their own instruction than observers. In slight contrast to the classroom observations, teachers stated that they often differentiate instruction. According to the open-ended question, their main focus was on fostering reading comprehension, which was generally consistent with the observations. The second most frequent focus was on students’ reading motivation, followed by the promotion of reading fluency.

Regarding RQ 3, the exploratory graph analysis revealed that the evidence-based elements of reading instruction are grouped to two clusters. While the second cluster is characterized by many strong and moderately strong relations between elements that are associated with strategy-guided reading, the first cluster is less amenable to a theoretical description and tends to have weaker relations between their elements. Overall, the results of the exploratory graph analysis, especially the second cluster, show that in some classes the reading strategies were explicitly instructed, and this was associated with the use of reading strategies like paying attention to the heading, making predictions, clearing difficult words or generating questions about the text. Thus, evidence-based elements of reading comprehension instruction were rarely observed, but when they were, it was likely that the teacher engaged in modeling reading strategies.

Interpretation of the findings

While the observation of cognitive and metacognitive reading strategies in some classrooms certainly is a positive first step, we offer the following for consideration. First, we focus on the question of how much reading instruction is aligned to the specific needs of 2nd graders. Specifically, while some readers at this grade level can read a text effortlessly and focus on the content, others likely still struggle with automatically decoding words and need all of their cognitive resources to read the words. For most of 2nd graders, automatization of reading processes is a major learning goal (Wolf & Katzir-Cohen, 2001). Our observations suggest that systematic approaches to foster reading fluency such as repeated reading or paired reading were rare. More often, students completed reading exercises, read silently or only one student read aloud to the whole class. This finding might indicate that teachers consider frequent silent reading to be effective in promoting reading fluency; yet, the effectiveness of independent and silent reading to improve reading achievement has been questioned (Carver & Leibert, 1995). Nevertheless, many 2nd grade students may benefit from instruction that is rather teacher centered instead of primarily cooperative, as younger primary school students in particular first need to learn how to work with a partner or a group. Further, it was observed that some teachers differentiated instruction according to students’ needs. Possibly, in these classrooms, students used differentiated reading material and were asked to engage in silent reading.

Second, when compared to theoretical models of self-regulated learning (e.g., Boekaerts, 1999), this finding indicates that core elements regarding the regulation of the self are missing. While evaluation of the reading process and following a training plan group together in the first cluster (they were among the items with significant factor loadings), goal setting appeared to be weaker related to these (goal setting displayed a non-significant loading), and all three (especially evaluation and training plan) were closely connected with explicit strategic processing (both declarative and conditional). Despite its strong relations, elements related to strategy-based reading comprehension instruction were only found in a few classrooms. Given that 2nd graders in Germany are about seven to eight years old, one might not expect instruction to include all elements needed for self-regulated learning; however, the importance of including self-regulatory strategies into a program has been emphasized (Schünemann et al., 2013).

Third, if cognitive and metacognitive reading strategies were observed in a class, the chances were high that several strategies were observed. Using a high relatively high number of reading strategies might be overwhelming, especially for 2nd graders. Seuring and Spörer (2010) found that 5th graders benefited more from a reading comprehension training that included three strategies compared to a training containing four strategies. Hence, focusing on a small number of (the most) important strategies and making sure that children feel confident in applying them may be beneficial.

Study limitations and implications

Classroom observations and asking teachers about their daily teaching practices has advantages, but some limitations should be considered when interpreting the results. First, although the sample size of our study is relatively large, each classroom was observed only once and resembles a snapshot of each observed teacher’s BAU reading instruction. Additionally, participation in our study was voluntary and teachers might have engaged in self-selection. Thus, the results of our study cannot be generalized to all primary schools in Germany and its external validity might be flawed. By their mere presence, observers may have influenced student and teacher behavior; in fact, four teachers’ statement that the observed lesson was “not typical” may have been for this reason. However, most teachers reported that the observed lessons did represent their typically implemented reading instructions. Although all observers received extensive training on how to conduct standardized observations and high interrater reliability was established, the subjective nature of classroom observations persists. To gain more objective insights, using experience sampling methods, videotaped sequences of reading instruction, or conducting more observations of the same classroom at several points in time would have been beneficial. Using these approaches, the time and/or quality with which specific elements were employed could be tracked, too. Further, the observation protocol we used was extensive, and a simpler version might have helped observers by enabling them to focus on specific methods. Possibly, in addition to the facets of reading instruction that we focused on here, we could have investigated more aspects of instruction in general primary school, which may involve topics other than reading. This would correspond with the results of the current study, which indicate that not all teachers exclusively implement reading instruction.

Research has shown that effective practices can be characterized on the school-level, teacher-level, and student-level (Marôco, 2021). It would be interesting to collect more information on each of these levels (e.g., cooperation between teachers or students’ socioeconomic status), thereby enabling a more sophisticated integration of results. Other aspects that could have been examined in more detail include what kind of text material was used (e.g., fictional or non-fictional texts), how much time students spent reading at home or how much support they receive from their parents.

Although quite extensive, it would be interesting to study differences between countries and federal states within countries related to the content of reading instruction. Within the scope of large-scale assessments, teachers of English-speaking countries (e.g., England, Scotland, and New Zealand) reported to form homogeneous student groups more often than teachers of other European countries (e.g., Germany, Greece, and Italy), in which instruction is more teacher centered (Mullis et al., 2003). In future research, it would be interesting to study whether these findings generalize to observational studies.

Finally, a strong relation exists between teacher quality and student reading competence (Hairrell et al., 2011), and certain instructional practices are more related to reading achievement than others (Connor et al., 2005). With our study, instructional quality could only be measured indirectly by investigating whether teachers employ evidence-based elements of reading instruction. However, the use of effective approaches to foster students’ reading competence does not necessarily lead to higher reading achievement. Thus, a fruitful approach of future studies could be to measure instructional quality in addition to the elements we observed.

Conclusions for teacher training

Besides implications for future research, our study offers valuable conclusions for future teacher trainings regarding reading instruction:

Notably, (prospective) teachers are educated on theories related to important sub-skills involved in reading. Overall, we interpret our findings to suggest that many teachers lack knowledge on the development of reading sub-skills. This is particularly indicated by the fact that only five teachers aimed to improve students’ reading fluency in 2nd grade reading instruction. The classroom observations also suggest that reading instruction mostly focused on reading comprehension. Thus, we recommend teachers learn that reading accuracy, reading fluency, and reading comprehension are interdependent and develop gradually. Given that even after 2nd grade many students struggle with accurate and fluent reading, teachers need to know how they can efficiently foster these sub-skills and that silent reading is probably not the best way. Instead, teachers must be educated on the use of evidence-based approaches to foster reading accuracy and fluency, such as syllable-based reading (Müller et al., 2020), repeated reading (Samuels, 1979) and paired reading (Topping & Lindsay, 1992). Further, they should learn how to instruct reading strategies to augment student reading comprehension (Block & Lacina, 2009; Duffy, 2002; Fuchs et al., 1997). Good instructional materials can support the acquisition of reading strategies (e.g., Souvignier & Mokhlesgerami, 2006), but teacher trainings should emphasize the importance of the teacher as a role model.

In our study, at least some teachers fostered self-regulated reading by teaching cognitive and metacognitive or self-regulatory strategies. As indicated by the exploratory graph analysis, it is unlikely that teachers covered all ellipses or all phases described in theoretical models of self-regulated learning (c.f. Boekaerts, 1999). Thus, it seems necessary to provide teachers with theoretical knowledge for why they should instruct the use of cognitive, metacognitive and self-regulatory strategies.

Especially for young readers, using reading strategies and self-regulating the reading process is challenging. Thus, teachers need to know that the sheer number of reading strategies is less important than the purpose a particular strategy serves, e.g., a strategy helps one understand the text, helps to reduce the information so one can remember the text better, or helps regulate one’s motivation. Learning to use reading strategies is challenging and demanding, so strategies should be learned gradually and one at a time.

Our classroom observations illustrate that the observed reading instruction was mainly teacher centered and that students mostly worked individually. Cooperative learning, however, leads to strong student activity and provides new opportunities for regulating the reading process and for learning to use self-regulatory and metacognitive strategies by taking a tutor role. Moreover, compared to whole-class instruction, cooperative learning provides many opportunities to differentiate instruction to students’ needs.

The second most frequently cited goal of reading instruction was to increase students’ reading motivation. Based on the observations, the fact that students in some classes could choose their own reading material does indeed support the students’ experience of autonomy—an important psychological need according to self-determination theory (Ryan & Deci, 2000). Yet, understanding how to support the experience of autonomy, competence, and relatedness within reading instruction, e.g., by including elements that immediately show one’s progress, like counting the number of words read correctly in repeated reading or by using cooperative learning to increase feelings of relatedness, can help teachers design reading instruction to target motivation.

In sum, these suggestions for improving teacher trainings related to reading instruction are by no means exhaustive. Notably, they lack ideas on how to ensure that teachers regularly receive and accept offers for professional development. In addition, core features of professional development (e.g., active learning and collective participation) as suggested by Desimone (2009) should be considered. Nevertheless, the suggestions given here may serve as an inspiration for both researchers and practitioners.

Overall conclusion

In this study, we gained realistic insights into BAU reading instruction in general primary school from classroom observations and teacher self-reports. Given that BAU reading instruction is usually the typical benchmark in intervention studies, our findings may serve as a good estimate for what can be expected as a control condition in reading intervention studies in primary school expected in reading interventions. Century and Cassata (2016) point out that this aspect is often neglected in intervention studies, although interpretation of effect sizes depends on clearly describing the specific intervention as well as the control condition. Overall, our study offers valuable information that can help us make sense of findings from intervention studies, and it highlights which principles of good reading instruction are systematically neglected by teachers. Further, it provides inspiration for future research related to interventions, for teachers’ professional development as well as for observations related to BAU reading instruction.