Introduction

Language is an integral part of science and learning about science (Fang, 2008; Hand et al., 2010; Phillips & Norris, 2009). Reading and writing, too, are fundamental to science and not just general tools used in the context of science—or, indeed, of any other subject (Norris & Phillips, 2003). Especially when students are completing assignments or doing homework, reading scientific text is of high importance. Fang et al. (2008) stated that the argument has often been made to integrate reading explicitly in teaching science and research has shown that reading infusion can help develop students’ science literacy. However, the language characteristics of scientific text are known to differ from those of non-scientific text (Fang, 2006). In the present study, we explore whether the contribution of various reading comprehension component processes differs between scientific and non-scientific genres. Genre is known to influence reading comprehension (Artelt & Schlagmüller, 2004), so it might be important to investigate whether there are genre-specific aspects of reading comprehension. If these can be identified for science, then reading instruction in science teaching might focus on those aspects rather than on implementing generic comprehension strategies.

Students’ difficulties with problem solving, explaining phenomena, or understanding scientific texts can be due to incomplete previous knowledge, to their language skill (e.g. difficulty making sense of a sentence despite knowing the words), or to both. While scores on national and international assessments are usually attributed to students’ scientific knowledge or general abilities, the effect of reading comprehension on science test results is also well known. Students can struggle with such tests, for example, when closed-ended concept inventories are used (Clerk & Rutherford, 2000). Among German students, Härtig et al. (2015b) showed that general reading comprehension affects physics assessment scores, especially on constructed-response items. Students who are not native speakers of the language in which science courses are taught seem to be at a disadvantage on tests when compared to native-speaking students (Abedi et al., 2004; Bird & Welford, 1995).

However, these studies test comprehension with a combination of expository and narrative text passages that differ on multiple dimensions. On the one hand, prior research shows that language skills do influence outcomes on both science and reading tests; on the other hand, it is not possible to conclude whether this is due to general reading comprehension skill or to science-specific (i.e. expository) reading skill. With respect to teaching, this is an important question as enhancing general reading comprehension skill may not be enough to improve students’ learning about science. That is, it is not clear which language demands are domain-specific and which are domain-general. From the perspective of scientific literacy, there might be important differences between reading comprehension of narrative and expository texts because of differences between the two text genres. However, not much is understood about specific effects of reading narrative versus expository texts. For this reason, it is important to further explore the relationship between general reading skill and reading comprehension of expository text in science as this would have implications regarding the role of language in teaching science (cf. Härtig et al., 2015a). A first step is identifying a baseline model to enable testing comprehension of different genres against each other. The aim of this study is to test the applicability of a single model of text comprehension to both narrative and scientific texts and then to compare the results. If differences in the contribution of predictors are found across genres, these differences could be conveyed to teachers to inform science-specific reading instruction.

Models of Reading Comprehension

Several influential models of reading comprehension conceptualize reading as including both personal and text factors, and these may mutually influence each other (e.g. Kintsch & van Dijk, 1978; Schnotz, 2014; Schnotz & Bannert, 2003). The reader tries to build a mental representation of the text’s content, but this mental model is bound to what the reader already knows or believes. Hence, it is unlikely that two readers of the same text will come up with exactly the same situation model (Schnotz & Bannert, 2003) since personal characteristics such as previous knowledge, working memory capacity, motivation, and interest will influence the processes and products of text comprehension (Rauch et al., 2011).

In one such model of the factors influencing adolescent and adult students’ reading comprehension, Cromley and Azevedo (2007) reviewed the experimental literature and summarized their findings in the Direct and Inferential Mediation Model of Reading Comprehension (DIME). The DIME model includes five personal characteristics— previous knowledge, reading comprehension strategies, word recognition, vocabulary knowledge, and inference making—as central factors that either directly or indirectly influence students’ general reading comprehension across different types of text (cf. Cromley, 2005).

Previous Knowledge

It is widely accepted that previous knowledge has a strong influence on what readers can understand when presented with a text (Kintsch & van Dijk, 1978). Consequently, it is not surprising that previous knowledge influences text understanding generally (Kendeou & van den Broek, 2007; McNamara & Kintsch, 1996; Pascual & Goikoetxea, 2014). Depending on the nature of the text, one cannot assume a positive relation between previous knowledge and comprehension. An important counter-example is the so-called expertise-reversal effect: readers with high previous knowledge learn more from low-cohesion text than from cohesive text, while readers with low previous knowledge are benefited by higher text coherence (Linderholm et al., 2000; McNamara & Kintsch, 1996). Nevertheless, in all cases previous knowledge is important and will have an influence on what is understood from text.

Reading Comprehension Strategies

There exists a rich body of literature on the role of reading strategies for reading comprehension. Suggate (2016) found a general positive effect for reading comprehension strategy interventions in his meta-analysis. Hebert et al. (2013) found in their meta-analysis that the strategy of writing-to-learn promotes reading comprehension. Nesbit and Adesope (2006) highlighted the effectiveness of the concept map construction strategy on comprehension in their meta-analysis. Gunn (2008), as well as Ozgungor and Guthrie (2004), presented evidence that strategy interventions can increase students’ ability to draw inferences, which should help readers acquire knowledge from expository text.

Word Recognition

Following Müller and Richter (2014), the recognition of words is a first important step when reading. Perfetti (1984) provided evidence that the ability to recognize words increases reading speed and might also increase reading comprehension. Interventions, mostly for students with disabilities or second language learners, provide evidence that word recognition can be learned and that such interventions are highly effective (Bourassa et al., 1998; Suggate et al., 2014). However, even for Grade 8 students, word recognition still seems to have an important influence on their reading comprehension (cf. Ahmed et al., 2016).

Vocabulary

Vocabulary knowledge is of specific interest in the case of expository text. In many studies on text comprehension, knowledge of single words or word associations has been used to indicate conceptual learning from expository text (Duit, 1986; Shavelson, 1971). Fang et al. (2006) emphasized that the same word can be bound to different concepts in different domains (e.g. a literary meaning and a scientific meaning), which can lead to confusion when reading a text. The strong positive correlation found between previous knowledge and vocabulary knowledge (Cromley & Azevedo, 2007; Cromley et al., 2010b) is, therefore, to be expected. From a methodological perspective, this creates a potential confound if researchers use vocabulary assessments that are too highly correlated with previous knowledge.

Inference-Making

A central aspect of the van Dijk and Kintsch (1983) model of text comprehension is that readers must draw inferences. The main idea is that readers build a mental model by reading the text; they select and combine information and, thus, draw inferences. This relation is supported by positive effects of interventions on inference making that have been found to support students’ reading comprehension (Best et al., 2005; O’Reilly & McNamara, 2007). Researchers have created standardized measures of bridging inferences in English (Ahmed et al., 2016; Cromley & Azevedo, 2007). Typically, two sentences are presented and students choose the most plausible inference to be drawn from them, for example, a causal or temporal connection. Because such an instrument does not exist in German, one challenge for testing the DIME model in Germany was to create such an instrument.

Studies Testing the DIME Model

The first study of the DIME model (Cromley & Azevedo, 2007) was with US high school students (mean age = 14.2 years). The comprehension measure was a domain-general standardized instrument; the other measures of reading components were based on this general comprehension test, that is, component measures were not domain-specific; and the fit of a manifest path model was tested. Ahmed et al. (2016) broadened the evidence for the DIME model in three ways: they tested students across a wide range of ages, addressed test method bias, and accounted for measurement error. They also used domain-general standardized instruments for English language skills but with US school students from Grades 7 – 12. Compared to the first study, the sample was much larger, more heterogeneous, and contained different age groups to test for developmental effects. Furthermore, not just one but several test instruments were administered to each student to measure each personal characteristic (e.g. multiple vocabulary measures). Most importantly, the DIME model was supported in most regards, across a broad age range, and with a heterogeneous sample. Additionally, the authors found a general factor (g-factor) that explains shared variance between all assessment instruments: As all instruments were based on reading, the g-factor reflects the common method and adds important information to the model. Based on these findings, the DIME model may be regarded as a standard model of general reading comprehension of texts in the English language.

Characteristics of Scientific Text

Research has shown many differences in the characteristics of scientific text that distinguish them from narrative text. Although most of the scientific vs. narrative text research has been conducted with English text (e.g. Fang, 2006), these differences are also found in German text (e.g. Roelcke, 2010). With regard to typical text characteristics of science textbooks, differences are usually distinguished on three levels: word, sentence, and passage.

At the word level, scientific text in both English and German is characterized by technical vocabulary (i.e. words that are not frequent, highly specific uses of terms), lengthy nouns, nominalizations, and the use of pronouns such as “it” or “they” (Fang, 2006; Roelcke, 2010). From a teaching and learning perspective, word level is important as it incorporates both language features and scientific knowledge. Deese (1962) and Shavelson (1971) argued that associations between nouns in a test can indicate conceptual understanding. Building upon this evidence, Duit (1986) and Fang (2006) contended that unfamiliar nouns used as technical terms or concepts are hard for students to learn. Arya et al. (2011) and Hall et al. (2014) found that the use of synonyms and pronoun references negatively influenced students’ reading comprehension.

At the sentence level, Fang (2006) discussed complex sentence structure (i.e. using relative clauses and/or connectives) and the use of the passive voice as characterizing scientific text. The empirical evidence regarding the influence of sentence structure and the passive voice on comprehension is contradictory. While some studies report that using connectives influences reading comprehension (Hall et al., 2014), others indicate no effect (Arya et al., 2011). Regarding connectives, Roman et al. (2016) compared 12 science and 12 social studies texts. They reported that (a) science text contains more logical connectives, especially inferential connectives (e.g. therefore) than do social studies text that favors contrastive connectives (e.g. however) and (b) the rate of logical connectives increases in science but not in social studies texts over grade levels. Consequently, expository text might differ by domain regarding the usage of specific connectives. On the other hand, this difference might apply to narrative and expository texts in general, not just to science text.

At the passage level, the usage of visual representations (e.g. diagrams, pictures, graphs) and a high density of information are considered typical features of science text (Fang, 2008; Roelcke, 2010). Ozuru et al. (2009) and Schmellentin et al. (2017) analyzed the use of syntactic features as well as the combination of text and graphic representations or subheadings in scientific vs. narrative text; they found that these text differences had an impact on students’ reading comprehension. Subheadings and the structuring of paragraphs divides information and highlights what is important; that is, these features can be considered a kind of scaffolding.

Text Comprehension of Scientific Text

Revisiting the DIME model from a science text characteristics perspective leads to the question of what general reading skill level is enough to understand science text and if there exist science-specific reading skills that reflect more than simply scientific knowledge. For science educators, it would be important to know which language demands are domain-specific and which are not. This differentiation would help science educators understand which aspects of language demands are most closely related to science learning and should be emphasized in science teaching (Brown & Ryoo, 2008).

The DIME model has been tested for its generalizability to science text (Cromley et al., 2010b). We briefly map each science text characteristic onto a variable in the DIME model: The usage of technical terms is related to the vocabulary measure. By contrast, typical vocabulary measures for narrative reading comprehension will cover common vocabulary, and technical terms will explicitly not be included as they occur too seldom in everyday language. In the DIME model, text aspects on the sentence level are related to the inference measure, but this raises a question about what exactly is measured: Is it a text feature or a learner skill? Accordingly, text features on the text level may be related to the measure of strategy use in the DIME model. In their study on science text, Cromley et al. (2010b) modified (i.e. contextualized) nearly all test instruments; topic-specific—not just domain-specific—measures were developed for reading comprehension and previous knowledge as well as for inference, reading strategy use, reading vocabulary, and word reading fluency. Findings from this biology-specific study were mostly consistent with the DIME model although some differences were found (Cromley & Azevedo, 2007). Overall, previous knowledge seemed to be more important for science text comprehension than in the original test of the DIME model (Cromley & Azevedo, 2007), whereas vocabulary knowledge seemed to be less important. By switching to domain-specific instruments, this g-factor might be accompanied by a second factor that captures previous knowledge. This additional factor might explain the high correlation between vocabulary knowledge and previous knowledge reported by Authors in the original version (Cromley & Azevedo, 2007) and in the science version (Cromley et al., 2010b). However, it is difficult to directly compare these DIME studies since all instruments differed in how contextualized they were (i.e. general measures in Cromley & Azevedo, 2007; topic-specific measures in Cromley et al., 2010b). No direct comparison can be made about reading comprehension of narrative text because students read passages about one biology topic in the later study (Cromley et al., 2010b).

Another important issue arises when one takes a closer look at the measurement of reading comprehension in the two studies. In their later study on science text, Cromley et al. (2010b) made use of paragraphs from science text; but the first study contained paragraphs with domain-specific topics as part of a standardized assessment (Cromley & Azevedo, 2007). Here, the measure of students’ reading comprehension consisted of short paragraphs (N = 14) from narrative (fictional) as well as expository (non-fictional) texts that were tested using 48 items. The authors argued that they intended to measure reading comprehension independently from previous knowledge. However, based on domain-specific previous knowledge items in the instruments, they found a clear influence of previous knowledge on students’ reading comprehension. While this finding can be considered to support the assumption that general reading comprehension functions similarly across domains, it remains difficult to decide whether there is something like domain-specific reading comprehension due to domain-specific linguistic text features. From our perspective, a sound comparison based on the same measures for supporting language skills between text comprehension of narrative versus expository texts is still missing.

Based on Fang’s (2006) linguistic analysis, it seems important to investigate the interplay between both aspects of literacy—general language and domain-specific abilities—in order to understand the DIME model’s implications for science teaching. In short, we pose the question: To which degree are general reading abilities for narrative text comparable to domain-specific science reading comprehension? This will be an important step in answering the question about domain-specific aspects of text comprehension.

The Present Study

The DIME model has been generalized across different age groups in English (Ahmed et al., 2016), and preliminary evidence has been presented for its application to scientific text specifically (Cromley et al., 2010b). To further investigate how the set of predictors in DIME functions to explain student reading comprehension across text genres, we aim to analyze whether there is a common core of text comprehension between different genres tapping the same supporting language skills.

In the present research, we were first interested in the applicability of the DIME model for narrative texts in German for Grade 8 students (study 1) by taking a closer look at a possible language effect (i.e. German vs. English). This is important to test first; otherwise, one could argue that in study 2 we are confounding language and genre effects. In contrast to Cromley and Azevedo (2007), our focus was on narrative text only, not general comprehension across both narrative and expository texts. Thus, in study 1, we look at DIME as a sound description of text comprehension of narrative text; in study 2, we analyze to what degree the same skills are needed when reading either narrative or expository texts. While Cromley et al. (2010b) transferred the whole model into domain- or topic-specific contexts by contextualizing all predictors, we applied identical measures in our tests of narrative comprehension and expository comprehension. By doing so, we aimed to determine which components are indeed domain-specific and which might be more generic with respect to the distinction of the fundamental and derived senses of scientific literacy. This is an important step that could lead to future intervention work because it would inform the need for domain-specific (vs. domain-general) interventions.

Consequently, we first applied the DIME model for general reading comprehension of narrative text in German. If results provided evidence for suitability of the DIME model, we reasoned that the model could be used with the same measures for science comprehension as well. By taking this second step, we aimed to directly compare students’ reading comprehension of narrative and expository texts. In summary, the two research questions (RQ) for this study were:

  1. 1.

    Can the DIME model be applied to lower secondary students’ general reading comprehension of narrative text in German?

    We hypothesize that there are no differences in model fit for the DIME model when comparing findings from our German language sample and previous studies using English language samples.

  2. 2.

    To what degree are the direct and indirect influences of the corresponding variables in the DIME model comparable between narrative and expository science texts?

We hypothesize that there are no differences in model fit for the DIME model when comparing findings from narrative text comprehension and expository science text comprehension. This hypothesis does require that our science text not include diagrams, graphs, or other visuals known to influence comprehension.

Method

Participants

Data were collected in Germany at two time points between 2016 and 2017. In study 1, the 443 participants were Grade 8 students (45.3% female; age: M = 14.94, SD = 0.66; both parents born in Germany for 65.1% of these); 217 were in several grammar schools (higher track), and 226 were in several comprehensive schools (middle track). In study 2, the 261 participants were Grade 8 students (37.5% female; age: M = 14.97, SD = 0.82; both parents born in Germany for 53.3% of these); 75 were in several grammar schools (higher track), 126 in several comprehensive schools and 60 in a general secondary school (both middle track). There were no important differences between the samples regarding teaching science or general assumptions about students’ language skills. Students at this age were chosen because they would have little previous knowledge of these topics—and hence little variability in prior knowledge—since the topic (i.e. the Rutherford experiment) used in the comprehension assessment had not been taught to the participants. Instruments measuring students’ general reading comprehension of narrative text, vocabulary knowledge, inferences, reading strategies, and topic-specific previous knowledge were administered by trained researchers on one day; after a 2-week period, their science text reading was assessed. Parental and education ministry permission was obtained in advance.

Measures

Our measures followed the ones used in previous studies on the DIME model as much as possible (Ahmed et al., 2016; Cromley & Azevedo, 2007; Cromley et al., 2010b). For this reason, published and standardized test instruments in German were administered whenever possible. However, this selection process led to some differences regarding the specific measures, which are explained in the following sections. To accommodate both the total testing time that schools could allot the target age group and the study’s parameters regarding the different measures in DIME and their domain specificity, we chose not to administer a measure of word recognition.

Topic-Specific Reading Comprehension and Previous Knowledge

To remain comparable to the general reading comprehension instrument, we adapted a longer text from a high school physics textbook for study 1 then shortened that text for study 2. The original text focused on atomic physics in the context of the Rutherford experiment that related how scientists discovered the structure of the atom. Atomic structure is a topic typically covered in middle school science. The text presentation was at an introductory level to reduce potential effects of previous knowledge. Items were constructed to represent three levels of reading comprehension—surface, bridging inference, and elaborative inference—according to commonly-used frameworks (Ozuru et al., 2009). For the first type of question, students were to extract information from single sentences; for the second type of question, students were to draw inferences between at least two sentences; for the third type of question, students were to integrate information from the text with their previous knowledge to form a situation model. The same set of items was used to measure students’ previous knowledge before reading as well as their reading comprehension after reading the science text. The items were always presented after reading the text, and the text was available when answering the questions. Items were coded dichotomously; scores were analyzed based on the one-dimensional Rasch model. Reliabilities using the weighted likelihood estimates (WLE) method were sufficiently high at pretest and posttest (atompre = 0.73, atompost = 0.84). In study 2, only a subset of items could be used due to the text being shortened; this resulted in slightly lower but still acceptable reliabilities (Bond & Fox, 2012; atompre = 0.65, atompost = 0.69).

General Reading Comprehension of Narrative Text

A standardized instrument for German reading comprehension in Grades 8–9 was administered (Bäuerlein et al., 2012). This instrument is one of the few widely tested and validated instruments for general reading comprehension in German that can be applied in highly heterogeneous samples. We administered one subscale of the larger test battery that was based on a narrative fictional story. To assess subsequent general reading comprehension, students had 15 min to read a text and answer 14 multiple-choice items. There is an important difference between this text comprehension measure and the measure used by Cromley and Azevedo (2007): the US instrument consisted of 14 short paragraphs that were not related to each other while the German measure included a single longer story that induces dependency among all items because all are related to the same text. However, as the text is a fictional story, it is thought to be independent of previous specialized knowledge. This might be counterintuitive as previous knowledge is usually discussed as one of the most important factors explaining reading comprehension. Considering the nature of a text and the questions, the fictional text used here is not intended to convey content knowledge. We see this as analogous to using a scene from Shakespeare, such as one from Romeo and Juliet. Each of the 14 items asks for information that can be found in the text and includes five answer options, of which only one is correct. To continue our Shakespeare example, we might ask “Who drank the poison first: Romeo, Juliet, Balthasar, Tybalt, or the Apothecary?” The information needed to find the correct answer could always be found within the text without any additional information. For our samples, WLE reliability was sufficiently high (study 1 = 0.63, study 2 = 0.72).

Vocabulary Knowledge

We used a standardized instrument for vocabulary knowledge breadth in German (Kognitiver Fähigkeitstest 4–13 R [KFT-R]; Heller & Perleth, 2000). The vocabulary knowledge subscale for the KFT-R consisted of 25 items. For each item, students were asked to find a synonym for a stimulus word from a list of five other words (e.g. select the word that most closely means rose: (a) music, (b) flower, (c) nutrition, (d) sight, (e) lasso). Students had seven minutes to complete the test. The complete KFT-R battery was piloted, revised, and normed in large samples over the course of many years (Heller & Perleth, 2000). The WLE reliability for our samples was sufficiently high (study 1 = 0.71, study 2 = 0.75).

Inferences

To measure inference skills, the non-verbal figural analogies subscale from the KFT-R cognitive abilities test (Heller & Perleth, 2000) was used. Using 25 items, students were presented with a pair of two figures combined with a third one. From a list of five other figures, students were to find one that matched the third in such a way that both pairs illustrate an analogous relation. It is important to note that this instrument differs in important ways from those used by Cromley and Azevedo (2007) and Cromley et al. (2010b). In their study, they used text-based inference measures; however, we did not find a comparable standardized instrument for German. The measure used in this study is not text-based but does measure the ability to draw inferences (Heller & Perleth, 2000). As our goal is comparing text comprehension of narrative and expository texts and as we wanted to use standardized measures, this seemed to be plausible as it is another subscale from the same test as the vocabulary knowledge. Thus, both subscales test students’ skills in a highly comparable way. The WLE reliability for our samples was sufficiently high (study 1 = 0.78, study 2 = 0.85).

Reading Strategies

The reading strategies use scale is a standardized and published instrument for German high school students (Würzburger Lesestrategie-Wissenstest 7–12; Schlagmüller & Schneider, 2007). It has been tested using a large sample from the Programme for International Student Assessment covering Grades 7–12. Based on vignettes of six reading situations, students were asked to indicate their preference among a set of different reading strategies on a 6-point Likert scale. Each strategy within each vignette had been rated by experts; students’ answers were scored by comparing their evaluation of each alternative strategy to the experts’ evaluation. A scoring guide links the student’s rating of each strategy within the specific vignette to an absolute score for each vignette, which are then summed to create a reading strategies score. In this way, the reading strategy test aims to capture students’ reading skills separate from vocabulary and inferences. There was no time limit for this test, but students typically needed 20 to 35 min to finish. The WLE reliability for our samples was adequate (study 1: α = 0.61, study 2: α = 0.69).

We conducted two studies, study 1 and study 2, using the same set of predictors but different science texts and different samples from the same student population. In both studies, general reading comprehension was measured based on the same narrative text. However, in study 1, we assessed students’ topic-specific reading comprehension of a longer physics text; in study 2, we used a shorter physics text. We first provide evidence regarding RQ1: reading comprehension of narrative text by combining the samples from study 1 and study 2. Next, we provide evidence regarding RQ2 by discussing findings from the two separate samples on topic-specific comprehension of science text in more detail.

Data Analysis

We used structural equation modeling (SEM) in Mplus 8.3 to test our hypothesized model. We specified all constructs as latent variables (i.e. analogous to factors in factor analysis). For reading strategies, the six vignette scores were used as manifest (measured) indicators of the reading strategies factor, including an error term for each item to represent less-than-perfect reliability. For all remaining measures (i.e. vocabulary, inferences, general reading comprehension, topic-specific reading comprehension in studies 1 & 2), we used four item parcels—small groups of items—as the manifest indicators for each of these factors to reduce the complexity of our models (Little et al., 2002). This approach seemed feasible in the context of our study since all constructs were well represented by one factor each (unidimensional; Bandalos & Finney, 2001; Little et al., 2002). To determine parcels where each parcel represented the construct equally (i.e. equally weighted), we assigned the items to the parcels according to their factor loadings from previously estimated unidimensional factor analyses for each measure. The loadings from these factor models were then used to assign items to parcels. To obtain balanced parcels, we first assigned the item with the highest loading to the first parcel, the item with the second highest loading to the second parcel, and so on through the fourth parcel. Then, the items with the next highest loadings were assigned to the parcels in reverse order, continuing until all items were assigned (cf. Little et al., 2002). In the next step, we used Item-Response Theory scaling. Specifically, using the four resulting parcels per latent variable, we estimated WLEs for each parcel to represent the subjects’ ability scores in four-dimensional item-response models using ConQuest (Wu et al., 2003). For the models including previous knowledge, we correlated error terms between the first and second tests for each item parcel over time. Single missing answers were recoded as incorrect answers.

In our analyses, one main goal was comparability of the models for narrative and scientific reading comprehension. Consequently, in the first step, topic-specific previous knowledge was not added to the model. The purpose was not to signal any lack of importance of previous knowledge. The purpose was only to enable us to compare models; in a SEM framework, this requires the variables to be the same in the two models.

Several indexes of fit have been suggested to evaluate the goodness-of-fit of SEM (Marsh, 2007; Schermelleh-Engel et al., 2003; West et al., 2012). Here, we considered the root mean square error of approximation (RMSEA) in combination with its 90% confidence interval, the Tucker-Lewis Index (TLI), the Comparative Fit Index (CFI), and the standardized root mean square residual (SRMR). We also considered a robust χ2 test statistic, which should be nonsignificant in a good-fitting model, and the significance of parameter estimates, which may provide biased results due to its sample size dependence. For testing the significance of indirect effects, we used the default Mplus bootstrap method that uses 10,000 draws. TLI and CFI values greater than 0.90 or 0.95 are typically interpreted to reflect an acceptable or excellent fit to the data. RMSEA values smaller than 0.05, 0.06, or 0.08 and SRMR values smaller than 0.08 or 0.10 are typically interpreted to reflect a close or a reasonable fit to the data. After treatment of single missing answers in the item-response modeling step, there were no missing data remaining in the data set.

Results

Descriptive Statistics

As shown in Table 1, all correlations among the variables in study 1 were positive, significant, and moderate to high. The variable showing the strongest correlation with text comprehension was vocabulary as expected from prior research (Cromley & Azevedo, 2007; Cromley et al., 2010b).

Table 1 Means, standard deviations, and latent correlations of the variables in Study 1

As shown in Table 2, in study 2 all correlations among the study variables were positive, significant, and moderate to high. As in study 1 and as expected from prior research, vocabulary correlated most strongly with reading comprehension.

Table 2 Means, standard deviations, and latent correlations of the variables in Study 2

Prediction of General Reading Comprehension of Narrative Text in German (Study 1 and Study 2)

This first research question asked for the applicability of the DIME model to high school students and narrative text in German. Since the general reading comprehension test was used in both studies, analyses were based on the combined sample (N = 704) from both studies. The model showed excellent fit to the data (see Fig. 1): χ2 (129) = 145.20, p = 0.156, CFI = 1.00, TLI = 1.00, RMSEA = 0.01 (90% CI [0.00, 0.02]), SRMR = 0.02. All predictors positively and significantly predicted reading comprehension, with the strongest path from vocabulary to reading comprehension. There were significant indirect effects from vocabulary to reading comprehension through reading strategies (β = 0.07, p = 0.016) and from vocabulary to reading comprehension through inference (β = 0.07, p = 0.009), whereas the indirect effect from vocabulary on reading comprehension through both reading strategies and inference did not reach significance (β = 0.01, p = 0.059). The sum of the indirect effects also was significant (β = 0.14, p < 0.001), indicating that vocabulary has an effect on reading comprehension directly as well as through the assumed mediators.

Fig. 1
figure 1

Adapted DIME model for study 1 and study 2 predicting general reading comprehension of narrative text in German. For the sake of clarity, manifest indicators of the latent variables are not depicted; standardized path coefficients are presented. *p < 0.05, ***p < 0.001

Prediction of Topic-Specific Reading Comprehension of a Longer Science Text (Study 1)

In our second research question, we aimed to compare the direct and indirect influences of the variables in the DIME model between narrative and expository science texts. Therefore, we analyzed to which extent the domain-general variables predict topic-specific reading comprehension. We first tested the model without including previous knowledge as a predictor so as to parallel the model employed in the first analysis and, thus, compare the results for predicting narrative and expository reading comprehension. In study 1, we drew on a sample of 443 students. For predicting topic-specific reading comprehension without controlling for previous knowledge, the model in Fig. 2a fit the data well: χ2 (129) = 227.77, p < 0.001, CFI = 0.96, TLI = 0.95, RMSEA = 0.04 (90% CI [0.03, 0.05]), SRMR = 0.04.

Fig. 2
figure 2

a Adapted DIME model for study 1 predicting topic-specific reading comprehension of a science text about atomic physics without controlling for previous knowledge. Note: For the sake of clarity, manifest indicators of the latent variables and correlated residuals are not depicted. Standardized path coefficients are presented. *p < 0.05, **p < 0.01, ***p < 0.001. b Adapted DIME model for study 1 predicting topic-specific reading comprehension of a science text about atomic physics with controlling for previous knowledge. Note: For the sake of clarity, manifest indicators of the latent variables and correlated residuals are not depicted. Standardized path coefficients are presented; the dashed line represents a nonsignificant path. *p < 0.05, **p < 0.01, ***p < 0.001

All predictors positively and significantly predicted topic-specific reading comprehension, with vocabulary as the strongest predictor. Moreover, the same pattern of indirect effects as for the narrative text was found. Vocabulary significantly predicted topic-specific reading comprehension as mediated by reading strategies (β = 0.11, p = 0.001) and by inference (β = 0.08, p = 0.032), whereas the indirect effect from vocabulary on topic-specific reading comprehension through both reading strategies and inference did not yield significance (β = 0.02, p = 0.095). Compared to the model for narrative text, the influence of vocabulary seemed somewhat smaller. The sum of all indirect effects for vocabulary also was significant (β = 0.20, p < 0.001).

When we added previous knowledge as an additional predictor of student reading comprehension, the model in Fig. 2b again fit the data well: χ2 (195) = 322.67, p < 0.001, CFI = 0.96, TLI = 0.96, RMSEA = 0.04 (90% CI [0.03, 0.05]), SRMR = 0.04. The same pattern of results was obtained except that the path from inference to topic-specific reading comprehension did not yield significance. Moreover, previous knowledge was a positive significant predictor of reading strategies and inference, and it was the strongest predictor of topic-specific reading comprehension. None of the indirect effects on topic-specific comprehension—from vocabulary through reading strategies, from vocabulary through inference, from previous knowledge through reading strategies or inference—yielded significance when controlling for previous knowledge. The influence of vocabulary seemed somewhat smaller.

Prediction of Topic-Specific Reading Comprehension of a Shorter Science Text (Study 2)

In study 2, the identical procedure as in study 1 was followed except that a shorter text on physics related to atomic structure was used to assess students’ topic-specific reading comprehension. As in study 1, items were constructed and piloted according to the same framework to assess students’ topic-specific previous knowledge and reading comprehension. In study 2, we drew on a sample of 261 students. In the following section, we first present the results for the model predicting topic-specific reading comprehension of the short atom physics text without controlling for previous knowledge. The model in Fig. 3a again fit the data well: χ2 (129) = 102.02, p = 0.96, CFI = 1.00, TLI = 1.00, RMSEA = 0.00 (90% CI [0.00, 0.00]), SRMR = 0.03. All predictors positively and significantly predicted topic-specific reading comprehension except for reading strategies. Again, vocabulary was the strongest predictor of topic-specific reading comprehension, but the coefficient was more comparable to the model for narrative text than for expository text.

Fig. 3
figure 3

a Adapted DIME model for study 2 predicting topic-specific reading comprehension of a science text about atomic physics without controlling for previous knowledge. Note: For the sake of clarity, manifest indicators of the latent variables and correlated residuals are not depicted. Standardized path coefficients are presented; the dashed line represents a nonsignificant path. *p < 0.05, **p < 0.01, ***p < 0.001.b Adapted DIME model for study 2 predicting topic-specific reading comprehension of a science text about atomic physics with controlling for previous knowledge. Note: For the sake of clarity, manifest indicators of the latent variables and correlated residuals are not depicted. Standardized path coefficients are presented; dashed lines represent nonsignificant paths. *p < 0.05, **p < 0.01, ***p < 0.001

Only the indirect effect from vocabulary on reading comprehension through inference was significant (β = 0.11, p = 0.002). The model shown in Fig. 3b for topic-specific reading comprehension controlling for previous knowledge again fit the data well: χ2 (195) = 179.25, p = 0.78, CFI = 1.00, TLI = 1.00, RMSEA = 0.00 (90% CI [0.00, 0.02]), SRMR = 0.04. The same pattern of results was obtained—again, the path from reading strategies to topic-specific reading comprehension was not statistically significant. As in study 1, previous knowledge was the strongest positive significant predictor of topic-specific reading comprehension. No indirect effects were significant when controlling for previous knowledge.

Discussion

Reading comprehension is an essential skill for all learning, including the learning of science. It is already known that a lack of general language skills might result in decreased learning or test performance in science (Härtig et al., 2015b). Furthermore, science text includes specific language features that distinguish them from narrative text (Fang, 2006), which leads to the question of whether general reading comprehension of narrative text is comparable to topic-specific reading comprehension of science text or if there are important skill differences specific to science comprehension. A sound framework for such questions is the DIME model of reading comprehension (Härtig et al., 2005). Cromley et al. (2010b) presented evidence of DIME for a science domain (i.e. biology) based on scientific measures for all skills, while Ahmed et al. (2016) generalized the model for a broad range of age groups in a heterogeneous sample. Both of these studies were in English; for this reason, our first question asked for the applicability of DIME in another language.

Furthermore, even if both former studies presented evidence for narrative and expository texts, a direct comparison between these studies is not possible due to methodological differences. The initial test of the DIME model was done with general reading skill and the second test was with biology text. As the existing literature cannot provide an answer to the question whether generic skills might be as important for both genres, equally for narrative as for scientific texts, our second research question considered this direct comparison. On the one hand, we know that the DIME model fits well for a mix of narrative and expository texts; but on the other hand, less is known about the extent to which generic skills support reading comprehension of expository text.

Applicability of DIME for Another Language

The presented results provide evidence for the applicability of DIME in general reading comprehension of narrative text in German, expanding the application of the model beyond general comprehension in English. The influence of vocabulary knowledge and reading strategies on general reading comprehension is comparable to findings from studies on the original DIME model conducted in English. The measurement of inference-making in our study is different because we used a measure that reduces the g-factor in reading as much as possible. Despite this difference, its influence within the model seems to be comparable to results using reading-specific measures of inference. In general, we attempted to use more generic measures for vocabulary knowledge—for inferences as well as for reading strategies—that resulted in differences in the position of these measures in the model and the predicted paths. This approach was intended to take findings from Ahmed et al. (2016) into account, indicating a g-factor as a general influence on reading ability in all instruments. Considering our approach, we would argue that even if one tries to reduce such a general influence the relative importance and interplay (indirect effects) of components in the model were not different when general reading ability in German was the focus.

Comparison of DIME for Narrative and Expository Texts

Regarding RQ2, we analyzed students’ topic-specific comprehension of science text to gain deeper insights into how reading comprehension of expository text might differ from narrative text. From our perspective, this is important as the influence of general reading ability in science learning is still under discussion. We chose to modify the approach taken by Cromley et al. (2010b) and used the same instruments in study 1, except for the general reading comprehension measure for narrative text. We could directly compare general and topic-specific reading comprehension between identical models and can state that comprehension of science text does not differ much from comprehension of narrative text. It is important to note that, while we did not include previous knowledge for the expository text in the first step in order to compare models, we did include previous knowledge in the second step to see what changes there were. Findings for the first model indicate that the role of reading strategies and vocabulary knowledge differed slightly between both genres. Compared to the findings of Ahmed et al. (2016), it could be hypothesized that in the approach of Cromley et al. (2010b) an additional g-factor might occur that represents previous knowledge. This factor could explain the high correlation between previous knowledge and vocabulary knowledge in the study of Cromley et al. (2010b) as well as why we found a smaller correlation where vocabulary knowledge is not domain-specific. This explanation is supported by the fact that all path coefficients decreased from the first model to the second model when previous knowledge was added. For this reason, we interpret our results to mean that not only topic-specific measures (as in Cromley et al., 2010b) but also topic-specific learning opportunities are needed for vocabulary, for drawing inferences from expository text, and for using reading strategies. In general, teaching vocabulary already seems to be part of teaching science as it is closely related to conceptual knowledge. However, we do not have a good understanding of the relation between general skills for drawing inferences and topic-specific aspects of inference-making. Nevertheless, it seems to be important to teach these skills explicitly with science text in science class since general skills might help not that much. From a research perspective, a next step might be to integrate and differentiate both general and topic-specific measures at once to determine whether these comprise distinct skills and how they are related.

Based on a shorter text, results from study 2 led to a comparable pattern of findings but with some slight differences, especially regarding the role of strategies and indirect effects. For the first model without previous knowledge, vocabulary seems to be more important with shorter text than with longer text. As the shorter text contained less unique information, it also contained fewer conceptual and scientific terms. One possible interpretation of these results is that students can handle comprehension of shorter text with general skills but it becomes harder with longer text. The different patterns for the model including previous knowledge could also lead in this direction as no significant direct paths could be found predicting inferences or strategic knowledge. Building a situational model that is more dependent on generic skills may be possible when the amount of new content is not too large. While specific factors may influence reading comprehension only in texts of a critical length, it is difficult to generalize these findings based on the two texts employed in this study. More studies regarding the generalizability of the DIME model are needed to determine what exactly influences the interplay of the different variables when text length, topic, or structure vary. Our results provide this tentative suggestion for teachers: reading long passages over more than a page in a science textbook will be more problematic without training than reading a few paragraphs. However, the DIME model itself seems to be a sound general framework to guide the design of subsequent research studies.

Limitations

Due to methodological choices and other considerations, there were differences between our study and the approaches taken in the original tests of the DIME model (Ahmed et al., 2016; Cromley & Azevedo, 2007) as well as the test of the DIME model adopted for science text (Cromley et al., 2010b), mostly regarding instrumentation. First, we were not able to include a word recognition test due to the amount of available testing time in the schools. Second, due to text and test content, no previous knowledge of the content of the general reading comprehension instrument was assessed; this relates to the nature of the standardized reading comprehension measure for German. As explained above, it does not really measure learning in a sense of drawing inferences from the text based on previous knowledge. Perhaps for this reason, the text is much longer than those used for testing the DIME model in English. A more comparable, standardized instrument for German is not available.

Regarding the reading strategies measure, one could argue that we are measuring students’ strategy knowledge rather than their strategy use. Thus, the measure primarily represents metacognitive knowledge with such knowledge not necessarily resulting in effective strategy use. However, another standardized measure was not available in German. One could generally question why there is no international standardization of reading comprehension measures. There seem to be highly traditional, country-specific approaches to what is meant by reading comprehension at least when it is operationalized by test instruments. This might have implications on the comparability of studies applied in different languages that are not discussed as instruments are not directly comparable. Regarding RQ1, the differences in instrumentation and measurement model might indeed have influenced the results. However, the hypothesized paths could all be found. We suggest, first, that the DIME model can be used in studies in the German language and, second, that the influence of the g-factor (Ahmed et al., 2016) might be smaller than expected for general reading comprehension.

Regarding the limitations of our study design on interpreting findings from RQ2, two critical issues should be discussed. Compared to a similar study on reading comprehension of science text (Cromley et al., 2010b), the choice of instruments requires a discussion about generic or non-generic abilities. In our case, the comparability of the measurement models for general and topic-specific reading understanding was one of the most important aspects. Compared to the approach of Cromley et al. (2010b), we did not create a topic-specific vocabulary or inference test for the science text topic. Consequently, our results represent the direct and indirect effects of more generic skills on topic-specific reading comprehension, which differs from what Cromley et al. (2010b) reported. While we cannot comment about the influence of any previous knowledge, the measurement models are highly comparable between narrative and science texts, which allowed us to compare them and show evidence for the generalizability of the model. Finally, we made use of science text of different lengths, leading to different results for some instruments as well as for the whole model. As we have observed differences in these factors, more research might be needed on their influence, especially those of length and topic.

Conclusions

The results indicate that the DIME model can be used as a sound approach for reading comprehension in different languages and for scientific as well as narrative texts. From a science teaching perspective, more information is needed about the interplay of generic and topic-related skills. Our study provides evidence that for plain text generic skills are to some degree sufficient. A typical concern for German science teachers is that it is difficult to teach German in addition to the subject content to ensure students’ reading comprehension. Regarding general language skills, this does not seem necessary based on the findings presented here; but domain-specific aspects of language deserve special attention. From a science learning perspective, differences regarding the influence of specific abilities indeed make sense; a possible explanation might be that even introductory text differs regarding intended learning goals. For example, a text about the life cycle of bugs might be more chronological and fact-based than a text about the conservation of energy. This would lead to the question of whether text from different domains have a higher similarity than different texts within one domain. Further research is needed in which longer topic-specific texts are considered. If we indeed want to determine why specific students struggle, we ought to know which types of text require which generic skills. Even though Ahmed et al. (2016) were able to replicate the DIME model in a broad age range, they measured only general reading comprehension and with an age group in which the focus of reading instruction had been vocabulary and syntactic patterns. Science text seems to be different and perhaps more complex because elements such as graphs, diagrams, various kinds of representations, or tables are used. These elements require unique abilities (Cromley et al., 2010a). Future studies should try to get more insights into the complex interplay of topic-specific and generic abilities along the progression of science teaching.

Nevertheless, the stable influence of vocabulary knowledge and inferences—even when both instruments are not contextualized regarding the topic—leads to the conclusion that generic skills support expository reading comprehension throughout all topics and text lengths and even if the path coefficients are mostly lower than for general reading comprehension. In the end, it is reasonable that throughout most large-scale studies in science an influence of skills in the native teaching language on learning is found (Rauch et al., 2016). In general, language teaching seems to be an important prerequisite for learning in science; however, as discussed in the model of scientific literacy, general reading comprehension needs to be both used and transformed into topic-specific reading comprehension for which specific learning opportunities are needed.