Development of silent reading fluency and reading comprehension across grades 1 to 9: unidirectional or bidirectional effects between the two skills?

This study examines the developmental interplay between silent reading fluency and reading comprehension from Grade 1 to Grade 9 (age 7 to 15) in a large Finnish sample (N = 2,518). Of particular interest was whether the associations are bidirectional or unidirectional. Children’s silent reading fluency and reading comprehension skills were assessed using group-administered tests, at seven time points, in Grades 1, 2, 3, 4, 6, 7, and 9. A random intercept cross-lagged panel model with latent factors was used to identify between- and within-person associations between silent reading fluency and reading comprehension. The use of latent factors allowed for the controlling of measurement error. The model showed that silent reading fluency and reading comprehension correlated at the between-person level, indicating that those who were proficient in one reading skill were typically good at the other also. At the within-person level, however, only some developmental associations emerged: in the early reading acquisition phase (Grade 1–2), silent reading fluency predicted reading comprehension, and in adolescence, reading comprehension weakly predicted silent reading fluency (Grade 7–9). The results thus suggest only weak developmental within-person associations between silent reading fluency and comprehension, although some unidirectional associations emerged with a change in the direction of the associations over time.


Introduction
One of the key objectives of education is to teach young children to read. After learning the basic decoding rules, building up reading fluency is important so that children can use reading efficiently for learning. Reading fluency is a complex construct incorporating multiple skills concerning word recognition and understanding connected text (e.g., Fuchs et al., 2001;Hudson et al., 2009;Kim et al., 2021;Kuhn et al., 2010;Rasinski et al., 2009;Wolf & Katzir-Cohen, 2001). Typically, it is assessed by the fast and accurate reading of words either in the form of lists (e.g., Torgesen et al., 1999) or as text (e.g., Good & Kaminski, 2002). Word reading fluency (i.e., reading unconnected words such as in word lists) and text reading fluency (i.e., reading of connected text) are strongly related (e.g., Fuchs et al., 2001;Hudson et al., 2009;Kim, 2012) as they both require the recognition of individual words fast and accurately but differ because text reading also requires higher-level processing skills (e.g., syntactic parsing, semantic integration) (Jenkins et al., 2003;Kim, 2015;Kim & Wagner, 2015;Stafura & Perfetti, 2017). Consequently, the assessment of reading fluency requires not only quick word recognition in the form of unconnected words but also efficient processing of word sequences (Altani et al., 2020;Protopapas et al., 2018).
The longitudinal association between reading comprehension and reading fluency has traditionally been understood as a unidirectional effect from fluency to comprehension. Accurate and fluent reading skills have been found to facilitate (among other skills) reading comprehension (e.g., Cain & Oakhill, 2009;Torppa et al., 2016). This direction is, of course, plausible during the reading acquisition phase because a certain level of decoding skills is necessary for reading comprehension. It is, however, possible that, once some decoding skills are established, the associations would become reciprocal, with reading fluency being supported by comprehension processes (Nation, 2019). This possibility has not yet been fully explored, because previous longitudinal studies addressing the issue of bidirectionality have typically focused only on the early years of development, when limited decoding skills are likely to form a barrier to or ceiling for the development of reading comprehension (e.g., Lonigan & Burgess, 2017).
The present study fills this gap by investigating the relationship between reading fluency and reading comprehension at seven time points over a critical period of eight years, from Grade 1 to Grade 9 (age 7 to 15), in a transparent orthography (Finnish). In addition, to provide a stronger test of the longitudinal associations, we utilize a random-intercept cross lagged panel model instead of the traditional cross-lagged panel model. The latter has recently been shown to have serious shortcomings because it tends to confound changes within an individual and differences between individuals (Berry & Willoughby, 2017;Curran et al., 2014;Hamaker et al., 2015;Mund & Nestler, 2019), thus making it difficult to interpret the findings meaningfully.

3
Development of silent reading fluency and reading comprehension…

The association between reading fluency and reading comprehension
There are many theoretical frameworks that have been suggested for the understanding of reading comprehension development such as the direct and indirect effects model of reading (Kim, 2020), the lexical legacy hypothesis (Nation, 2017), the reading systems framework (Perfetti & Stafura, 2014), the direct and mediational inference model (Cromley & Azevedo, 2007), and the construction-integration model (Kintsch, 1988). However, the model that has been used most broadly is the simple view of reading (SVR) model (Gough & Tunmer, 1986;Hoover & Gough, 1990). Over the past thirty years, the SVR model has received substantial support in empirical studies (e.g., Catts et al., 2006;Hjetland et al., 2019;Torppa et al., 2016;Tunmer & Chapman, 2012; see García & Cain, 2014 for a meta-analysis). According to the SVR model, reading comprehension is based on two broad separable components: decoding and linguistic comprehension. Both decoding and linguistic comprehension are necessary to facilitate reading, and neither is independently sufficient. The contribution of decoding to reading comprehension diminishes over time as decoding becomes automatized, and the importance of linguistic comprehension skills increases (Castles et al., 2018;Nation, 2019). Although the SVR has helped in the understanding of reading comprehension, the conceptualization of the processes involved as well as how these might contribute to individual differences, it may also lead to false impression about how complex reading comprehension is (Catts, 2018). The other theoretical frameworks include more factors affecting reading comprehension such as working memory, perspective taking, lexical legacy, background knowledge (Cromley & Azevedo, 2007;Kim, 2020;Kintsch, 1988;Nation, 2017;Perfetti & Stafura, 2014). All models, however, include linguistic and decoding components. One of the relevant questions for the current study regarding the decoding component is whether reading fluency needs to be added to the model or if word reading accuracy is sufficient to capture the variance due to decoding Kershaw & Schatschneider, 2012;Language & Reading Research Consortium, 2015;Protopapas et al., 2012;Tilstra et al., 2009). The studies on the matter are inconsistent. However, some factors that may explain this inconsistency are the grade in which reading was assessed as well as the language. For example, word reading accuracy was identified as the best predictor of reading comprehension (beyond linguistic comprehension) in grades one and two, but in grade three the best predictor was word reading fluency (Language & Reading Research Consortium, 2015), suggesting that once children become more accurate in word reading, fluency may be a more sensitive indicator of word reading ability. Moreover, in more transparent orthographies, fluency has been shown to be a stronger predictor of word reading ability than accuracy from the beginning of school (Florit & Cain, 2011). Consequently, the grade and the transparency of the orthography may affect the components of the SVR. In the present study, as we are focusing on reading development from Grade 1 to Grade 9 and our sample concerns a highly transparent orthographic context (Finnish), our focal decoding measure is reading fluency and not accuracy. While the SVR model suggests that both components are necessary to facilitate reading comprehension, it does not specify whether the relationship between the two components and reading comprehension is bidirectional or unidirectional. In fact, a recent paper (Nation, 2019) suggested an expanded view of the SVR that includes also bidirectional associations between decoding, linguistic comprehension, and reading comprehension. On the one hand, good reading fluency can be presumed to support reading comprehension because well-automatized word reading skills reduce the resource demands of cognitive processes (e.g., memory and attention), which can then be devoted to understanding meaning in text rather than identifying and decoding words (Perfetti, 1985(Perfetti, , 2007. Furthermore, fluent reading (silent and/ or oral) could support reading comprehension because the reader can read the text accurately, quickly, and with proper expression, which facilitate the more efficient construction of a mental representation of the text (National Reading Panel, 2000).
On the other hand, according to the expanded view of the SVR (Nation, 2019), there are bidirectional associations between decoding, linguistic comprehension, and reading comprehension while according to the direct and indirect effects model of reading (Kim, 2020), text reading fluency and reading comprehension have an interactive or bidirectional relation. In addition, according to the interactive models of reading (Rumelhart, 1977;Stanovich, 1980), reading processes operate in parallel, which could imply bidirectional relationships between fluency and comprehension. Rumelhart (1977) suggested a model in which reading involves the parallel processing of information from various levels of linguistic representation (e.g., phonological, orthographic, lexical, semantic, syntactic) and information from each level can be used as a database for the other levels. The acquisition of the semantic level (i.e., linguistic comprehension) is not considered to be the final phase of the processing but a source of information that interacts with the other levels. An interactive model would thus suggest that various reading processes interact and influence one another in a bidirectional fashion. Good reading comprehension could thus support reading fluency because the reader can use contextual information (such as semantic and syntactic cues), which facilitates both word reading and the prediction of text structure (Fuchs et al., 2001). This interaction may contribute to more accurate and faster reading because higher-order processes would compensate for shortages in lowerorder processes (Stanovich, 1980).

Empirical evidence for unidirectional and bidirectional effects between reading fluency and reading comprehension
Previous studies have consistently reported predictive links from reading fluency to reading comprehension (Cadime et al., 2017;Kim et al., 2015;Kim et al., 2011;Santos et al., 2020;Tilstra et al., 2009). The relationship between reading fluency and reading comprehension has been found to be particularly strong during the early school years. , for example, using multilevel growth modelling in Grades 1-3, showed that both the initial level of reading fluency and growth in reading fluency were significant predictors of reading comprehension. The association, however, has been shown to diminish over time as children become "fluent enough" to be able to allocate more cognitive resources to comprehension (Florit & Cain, 2011;Santos et al., 2020;Torppa et al., 2016). For example, in a study of English-speaking children in Grades 3, 7, and 10, the effect of decoding (word reading accuracy and fluency) on reading comprehension decreased across grades (standardized regression weights decreased from 0.38 to 0.06) (Kershaw & Schatschneider, 2012). The association between reading comprehension and reading fluency seems to diminish even earlier in more transparent orthographies. For example, Torppa et al. (2016) showed, using a cross-lagged panel model, that the direct effects of reading fluency on reading comprehension became insignificant among Finnish-speaking children already after Grade 2.
Studies seeking to test bidirectional, rather than unidirectional, associations between reading comprehension and reading fluency development and, thus, evidence of bidirectionality remain scant. A recent study examining bidirectional effects between reading fluency and reading comprehension among Englishspeaking children across Grades 1-4 using a twin study design (Little et al., 2017) reported bidirectional associations. However, the associations were not equally strong, because the effects of fluency on comprehension were stronger than those of comprehension on fluency. The study of Santos et al. (2020), conducted in a relatively transparent orthography (European Portuguese) in Grades 2, 3, and 4 using cross-lagged panel models, also reported bidirectional effects between reading fluency and reading comprehension but only between Grades 2 and 3. Bidirectional effects have also been documented in a sample of 5-and 6-year-old Korean-speaking children (Kim, 2015). Another study conducted in Italian-speaking children (8 to 16 years old), reported significant effects from reading comprehension to reading fluency which diminished across time (Carretti et al., 2020). It should be noted though that this study did not examine bidirectional effects. In this study, we add to the present knowledge on the developmental associations between reading fluency and comprehension in the context of a transparent orthography (Finnish) by using longitudinal follow-up data extending to the later school years (Grade 1 to 9). In addition, the above-mentioned studies have used measures of oral reading fluency while in the present study we use group-administered measures for the assessment of reading fluency. Furthermore, we use a more advanced analysis method that provides a more stringent test of the cross-lagged paths (random intercept cross-lagged panel model) than previous studies.

Traditional cross-lagged models versus random intercept cross-lagged panel models
Autoregressive cross-lagged panel models (CLPM) are often used to examine unidirectional or reciprocal associations between two or more measures assessed at multiple time points. The CLPM provides estimates of the autoregressive relationships (i.e., stability paths) of two or more measures over time, as well as cross-lagged estimates. Recent studies, however, criticize the use of CLPM because of its clear shortcomings (Berry & Willoughby, 2017;Curran et al., 2014;Hamaker et al., 2015;Mund & Nestler, 2019). One critique of traditional CLPM is that the estimates produced are difficult or even impossible to interpret meaningfully because the cross-lagged estimates represent both within-person changes and differences between individuals' skill levels, without separating these. This is a major mismatch with various theoretical models on development, which typically separate betweenperson differences from within-person changes. Furthermore, one assumption of the CLPM is that the within-person change of a construct (e.g., reading) remains consistent among all participants over time (Berry & Willoughby, 2017;Mund & Nestler, 2019). However, when the construct is an individual characteristic, this assumption is often unlikely to hold. Consequently, the interpretation of the findings of traditional autoregressive CLPM includes caveats requiring careful consideration.
The random intercept cross-lagged panel model (RI-CLPM) has been proposed as an alternative to the CLPM because it includes the important features of the CLPM and, in addition, is able to separate within-and between-person variance (Hamaker et al., 2015). In the RI-CLPM model, there is a between-person construct representing the variance that exists due to the differences between persons at the overall, across-time level. Moreover, there is a within-person construct representing the variance due to change from the person's overall level at each time point. The RI-CLPM separates these within-and between-person variances so that the within-person level variance can be examined while controlling for the between-person variance. This is important because inferences based on within-person change in a construct and its associations with changes in another construct over time are of particular interest in most developmental theories, as well as this study.
One of the most notable differences between the CLPM and the RI-CLPM is the estimation of latent factors for the RI-CLPM that represent the between-person stability and the within-person construct that measures the intra-individual fluctuations (change) at each time point. However, there are also differences concerning the interpretation of the structural parameters. As mentioned above, in the CLPM the autoregressive paths reflect the stability of the variables from one measurement occasion to the next (e.g., reading fluency scores in Time 1 to reading fluency scores in Time 2). In contrast, for the RI-CLPM, the autoregressive parameters reflect the amount of within-person carry-over effect. That is, a positive autoregressive path reflects the likelihood that when a person scores above (or below) their average at one occasion, then their following score at the next occasion will again be above (or below) their average score. A negative autoregressive path reflects the likelihood that when a person scores above (or below) their average at one occasion, then reverses their score relative to their average at the next occasion by scoring below (or above) their expected score. The within-level, carry-over effects are also referred to as inertia (Mulder & Hamaker, 2021) to indicate the extent to which individuals return to their expected score (i.e., recovered from their momentary deviations).
There are also differences in the interpretation of the cross-lagged paths. In the CLPM, cross-lagged paths are used to test whether a change in one variable (e.g., reading fluency in Time 1) is related to a change in another variable over time (e.g., reading comprehension scores in Time 2). In contrast, in the RI-CLPM, they reflect the degree to which individual's change in one measure is predicted by a previous deviation from an individual's score on another measure, controlling for preceding expected scores (Mund & Nestler, 2019). Consequently, cross-lagged estimates from CLPM and RI-CLPM are not directly comparable. Because of the latent factors that capture the between-person variance, the cross-lagged paths in the RI-CLPM reflect whether changes from an individual's expected score on one variable are predicted from preceding deviations on a second variable, that is, a marker of within-person change. In the CLPM, the cross-lagged estimates include both between-and withinperson variation, while in the RI-CLPM, they reflect the average within-person change relative to individuals' estimated average level.

The present study
The aim of the present study is to investigate whether there are bidirectional associations between reading fluency and reading comprehension development from Grade 1 to Grade 9 (age 7 to 15) in a context of a transparent orthography, Finnish. Given the decoding measures used in this study were based on silent reading tasks, we are using the term silent reading fluency. The specific research questions of this study were: (1) Are differences between individuals in one reading skill associated with differences in the other skill (between-person level association)? (2) Does becoming better in one reading skill predict becoming better in the other skill (within-person level associations)? We use silent reading fluency as the measure of decoding from Grade 1 onwards, rather than reading accuracy, because most Finnish children can read accurately after the first year of formal education and accuracy measures are at a ceiling by that time (Lerkkanen et al., 2004). The use of fluency and comprehension as indicators of reading progress is typical for languages that have a high level of transparency of the orthographic system (Seymour et al., 2003). Finnish is a highly transparent orthography (Aro, 2017), in which the consistency of grapheme-phoneme correspondence is close to 100% in both directions, with every letter almost always having the same sound and every sound almost always being represented by the same letter. The combination of efficient phonics-based reading instruction and the high transparency of Finnish orthography supports faster reading acquisition than in most less consistent orthographies, such as English (e.g., Seymour et al., 2003).
Of particular interest in the present study is the second question, that is, whether the associations are bidirectional or unidirectional. If the associations are only unidirectional, from silent fluency to comprehension, this supports the theoretical accounts that assume unidirectional associations between reading fluency and reading comprehension, such as, the SVR (Gough & Tunmer, 1986;Hoover & Gough, 1990), the lexical quality hypothesis (Perfetti, 1985(Perfetti, , 2007, and the direct and mediational inference model (Cromley & Azevedo, 2007). If the associations are bidirectional, there is support for models suggesting interactive relations between the two skills such as the direct and indirect effects model of reading (Kim, 2020), the expanded view of the SVR (Nation, 2019), and the interactive models of reading (Rumelhart, 1977;Stanovich, 1980). Based on previous empirical studies that have examined this question it is difficult to draw firm conclusions due to the mixed results and differences in the language context, age, measurements, and methodological approach (e.g., Santos et al., 2020;Torppa et al., 2016;Little et al., 2017;Kim, 2015;Carretti et al., 2020).
Based on previous research, we expect that silent reading fluency and reading comprehension will be associated at the between level since previous studies both in English and in Finnish have shown that children with good performance in one skill tend to have good performance also at the other (e.g., Nation, 2019;Psyridou et al., 2021). Regarding the association at the within level, we expect to find bidirectional effects between the two skills (e.g., Carretti et al., 2020;Kershaw & Schatschneider, 2012;Little et al., 2017;Santos et al., 2020;Torppa et al., 2016).
The present study is positioned in a transparent orthographic context, a unique circumstance for the analysis of bidirectional associations between reading fluency and reading comprehension, for which some evidence is available only with respect to English (Little et al., 2017), Korean (Kim, 2015), and European Portuguese (Santos et al., 2020). Furthermore, the longer follow-up data extending from the early reading acquisition phase (7 years of age) to the end of comprehensive school (15 years of ages) allow us to examine whether there are developmental changes in the associations between silent reading fluency and reading comprehension from the beginning phases of reading to the acquisition of consolidated reading skills. In addition, based on recent findings (Altani et al., 2020;Protopapas et al., 2018), silent reading fluency has been assessed with the use of a word reading task as well as with the use of a sentence reading task in order to assess quick word recognition in the form of unconnected words but also efficient processing of word sequences. The identification of bidirectional or unidirectional associations between the two skills can also have significant implications in teaching practice. For instance, the existence of significant associations between the two skills would suggest that continued teaching in reading fluency should not be neglected when promoting reading comprehension. In the same vein, reading fluency would benefit from better text comprehension.
We use the RI-CLPM to obtain meaningful cross-domain estimates, which will help us to determine whether and to what extent silent reading fluency and reading comprehension skills predict one another across various time points. An association between the between-person level factors would suggest that participants who are more fluent readers are also better at reading comprehension. Cross-lagged associations for the within-person level factors would suggest that changes from the average level in one construct at one time point, predict changes from the average mean level of the other construct at a later time point (McNeish & Hamaker, 2020). For example, a positive cross-lagged path estimate from a within-person silent reading fluency factor at one time point to a within-level reading comprehension factor at the subsequent time point would suggest that getting more fluent in reading predicts a change towards better reading comprehension. In addition, in the RI-CLPM, autoregressive paths reflect carry-over effects (Hamaker et al., 2015). The positive autoregressive paths suggest that fluctuation from overall level is predicted by a similar difference from the overall level at previous time point. That is, a positive autoregressive path reflects the likelihood that when a person scores above (or below) their average at one time point, then their following score at the next time point will again be above (or below) their average score. On the contrary, a negative autoregressive path reflects the likelihood that when a person scores above (or below) their average at one time point, then reverses their score relative to their average at the next time point by scoring below (or above) their expected score. For example, scoring consistently above (or below) average would be reflected by positive autoregressive paths. Significant autoregressive paths would suggest carry-over effects and that individual changes have a cumulative effect on skill development while non-significant autoregressive path would suggest higher randomness as a change at one time point cannot be predicted by a change at a previous time point.

Participants
The present study is part of the Finnish longitudinal First Steps Study, a follow-up of 2,518 children from Grade 1 to Grade 9 (Lerkkanen et al., 2006). Children's silent reading fluency and reading comprehension skills were assessed in Grades 1, 2, 3, 4, 6, 7, and 9. The sample was drawn from four municipalities: two in central, one in western, and one in eastern Finland. In three of the municipalities, the participants form the entire age cohort of children, and in the fourth, the participating children comprised about half of the age cohort. One municipality was mainly urban, one was mainly rural, and two included both urban and semi-rural environments. Of the parents who were contacted, 78-89%, depending on the town or municipality, agreed to take part in the study. The parental education distribution was very close to the national distribution of Finland (Statistics Finland, 2007). The study was approved by the Ethical Committee of the University of Jyväskylä, and at the beginning of the study, the children's parents and teachers provided informed written consent to participate.

Measures
Reading skills of all participants taking part in the First Steps Study were assessed with three group-administered tasks as described below.

Silent reading fluency
Because of the large number of participants, individual testing was not possible. Therefore, two group-administered silent reading fluency tests were administered at each grade (from Grade 1 onwards) by trained testers, a word reading fluency task and a sentence reading task. The Cronbach's alphas for the silent reading fluency sum scores were 0.86 for Grade 1, 0.82 for Grade 2, 0.82 for Grade 3, 0.83 for Grade 4, 0.78 for Grade 6, 0.81 for Grade 7, and 0.80 for Grade 9.
Word reading fluency task The word reading fluency task used in Grades 1, 2, 3, 4, and 6 is a subtest of the nationally normed reading test battery (ALLU-Alaasteen lukutesti [ALLU-Reading Test for Primary School]; Lindeman, 2000). Each of the 80 items consists of a picture with four phonologically similar words attached to it. The child silently reads the four words and then draws a line to connect the picture with the word, semantically matching it. The words and pictures are frequently used words that are familiar to young children. For example, an item consists of a picture of a bunny (in Finnish, pupu) and the correct word, along with three distractors (English word is in parentheses): pipo (cap), papu (bean), and apu (help). Completing the test requires very accurate and fluent decoding with a minimum weight placed on comprehension. The score is the number of correct answers within a 2-min time limit. Because of the nature of this timed test, the score reflects both the child's fluency in reading the stimulus words and his or her accuracy in making the correct choice from among the alternatives. A similarly structured word reading fluency task with phonologically more difficult words was used in Grades 7 and 9 (YKÄ-test, Lerkkanen et al., 2018).
Sentence reading fluency task The Test of Silent Reading Efficiency and Comprehension (TOSREC; Wagner et al., 2010;Finnish version by Lerkkanen et al., 2008) was used to assess silent reading efficiency in Grades 1, 2, 3, and 4. Children were given 3 min to read the maximum of 60 sentences and verify the truthfulness of as many sentences as possible. In Grade 6, the Finnish version of the Salzburg Lese-Screening test (Mayringer & Wimmer, 2003) was used, which is highly similar to the Woodcock-Johnson sentence verification task (Woodcock et al., 2001). Children were given 2 min to read a maximum of 69 sentences and verify the truthfulness of as many sentences as possible. In Grades 7 and 9, a standardized Finnish sentence-reading test for lower secondary school students was used (YKÄ test; Lerkkanen et al., 2018). In this test, children were given 2 min to read the maximum of 70 sentences and verify the truthfulness of as many sentences as possible. The sum score for all tasks was the number of correct answers given within the time limit. All three tests had the same aim, the same instructions, and similar items, but a different number of items. In all grades, the sentences used were very short (e.g., "milk is yellow"), being easy to read and comprehend, thereby intentionally minimizing requirements for comprehension such as syntactic parsing or semantic integration. The correlations between the tests used at the different ages corresponded closely with the across-age stability correlates within the tests, suggesting that the same skill was assessed despite changes in the test items.

Reading comprehension
A group-administered subtest of a nationally normed reading test battery (ALLU test; Lindeman, 2000) was used to assess reading comprehension in Grades 1, 2, 3, 4, and 6. The children silently read the given text at their own pace and then answered eleven multiple-choice questions and one question in which they had to arrange five statements in the correct sequence based on information gathered from the text. For each correct answer, 1 point was given (maximum = 12). The test used in Grades 7 and 9 was a similar standardised reading comprehension test developed for the lower secondary grades (YKÄ test; Lerkkanen et al., 2018). The tests had the same aim and the same instructions, as well as the same number of tasks, but different texts and items. The Kuder-Richardson reliabilities from the test manual were

Statistical analysis
The analyses were carried out within the structural equation framework of the Mplus statistical package (Version 7.4; Muthén & Muthén, 1998. Full information maximum likelihood estimation with robust standard errors (MLR) and scale corrected chi-square value was used. The model fit was tested using chi-square values and a set of fit indices as follows: (a) the comparative fit index (CFI), (b) Tucker-Lewis index (TLI), (c) root-mean-square error of approximation (RMSEA), and (d) standardized root-mean-square-residual (SRMR). Good model fit is indicated by a small, preferably non-significant χ 2 , CFI > 0.95, TLI > 0.95, RMSEA < 0.06, and SRMR < 0.08 (Hu & Bentler, 1999). Because the chi-square test depends on sample size and is sensitive to a large sample size, the chi-square statistics were not regarded as conclusive.
A RI-CLPM was estimated (Fig. 1), as suggested by Hamaker et al. (2015). However, in addition to the model, we used latent factors to handle measurement error (Mulder & Hamaker, 2021). For silent reading fluency, we used a latent variable at each grade (composed of the two measures and by setting equal factor loadings for each measure across grades). Because reading comprehension had only one measure at each time point, we calculated the correction of attenuation using the Kuder-Richardson reliability estimates for reading comprehension in each grade from the test manual (Lindeman, 2000) for Grades 1 -3 and Revelle's omega reliabilities for Grades 4-9. We used Revelle's omega instead of Cronbach's alpha because it provides more unbiased results when the assumptions of Cronbach's alpha are violated (McNeish, 2018). In this way, we can set measurement error also for reading comprehension. We have also tested the model without using the correction of attenuation, and the two models were very similar. The model without the correction of attenuation is provided in the Appendix 1. The RI-CLPM included two between-person factors (one for silent reading fluency and one for reading comprehension in Grades 1-9). The between-person factors represent the stable interindividual differences across Grades 1 to 9. In addition, there were seven latent factors for silent reading fluency and seven for reading comprehension (one for each time point), which represent the within-person changes around the participant's overall level. For the estimation of the model, all stability and cross-lagged paths, the correlation between the two between-person factors, and correlations between the residual variances of the within-person changes at each time point were included. In addition, the cross-lagged paths between the within-person factors of silent reading fluency and comprehension at two consecutive time points were compared with one another. The comparisons were made one by one. The Satorra-Bentler Chi-square difference was calculated by setting the model with equal cross-lagged paths as the 1 3 strictest model and the model with free paths as the less strict model. The Mplus input for the model is available as supplemental material.

Descriptive statistics
Descriptive statistics for the two tasks of silent reading fluency and for reading comprehension are presented in Table 1. Table 2 reports the correlations between tasks across Grades 1-9. The cross-domain correlation coefficients between the Word reading task and the Reading comprehension task were moderate in Grade 1 and 2 (0.37-0.48) and weaker (0.23-0.31) at each time point and between the subsequent time points after Grade 2. The cross-domain correlation coefficients between the Sentence reading task and the Reading comprehension task were moderate to strong (0.49-0.58) in Grade 1 and 2 and moderate (0.37-0.39) at each time point and between the subsequent time points after Grade 2.

Random intercept cross-lagged panel model
A RI-CLPM for silent reading fluency and reading comprehension across Grades 1-9 was estimated (Fig. 1). The model fitted the data well; χ 2 (142) = 376.18, p < 0.001, RMSEA = 0.03, CFI = 0.99, TLI = 0.99, SRMR = 0.03. The two between-person factors, the one for silent reading fluency and the one for reading comprehension that represent the stable differences between individuals, were positively correlated with one another (0.85).
The within-person factors reflect the fluctuations of each individual around their overall level (denoted with F1 -F9 for silent reading fluency and C1 -C9 for reading comprehension in Fig. 1). The autoregressive paths between the within-person factors for silent reading fluency were positive and ranging from 0.78 to 0.91 while for reading comprehension were ranging from 0.04 to 0.29, which suggests carry-over effects and that individual changes in each skill had a cumulative effect on skill development. That is, individuals who scored above (or below) their average scores tended to have above (or below) their average scores in the next time point as well. For silent reading fluency, the autoregressive paths were statistically significant for all time points, while for reading comprehension, they were significant from Grade 4 onwards.
Interestingly, there were also cross-domain associations between the withinperson factors. The Grade 1 silent reading fluency and reading comprehension factors were statistically significantly correlated, suggesting that children scoring above (or below) their average in one reading skill in Grade 1 had scores above (or below) their average in the other reading skill at the same time point. There were also predictive cross-lagged paths from one reading skill to the subsequent assessment of the other. Between Grades 1 and 2, there was a significant unidirectional association from silent reading fluency to reading comprehension, predicting 5.76% of the variance. The association was positive, suggesting that the participants who showed higher than expected silent reading fluency performance in Grade 1 were likely to show a change towards better reading comprehension performance in Grade 2. The cross-lagged path estimates between Grades 1 and 2 differed significantly: Satorra-Bentler corrected Δχ 2 (1) = 6.23, p < 0.05. This suggests that the direction of the effects between Grade 1 and 2 reading skills are from silent reading fluency to comprehension, rather than the other way around. Between Grades 2 and 7, there were no significant cross-domain associations between the within-person factors. The estimated paths explained only 0.01-1.21% of the variances in reading skills.
Between Grades 7 and 9, there was a significant unidirectional path from reading comprehension to silent reading fluency, predicting 1% of the variance. The path was positive, suggesting that the participants who showed higher than expected performance in reading comprehension in Grade 7 were also likely to show a change towards better performance in silent reading fluency in Grade 9. However, the difference between the estimates of the cross-lagged paths between Grades 7 and 9 was not significant: Satorra-Bentler corrected Δχ 2 (1) = 1.61, p > 0.05.

Discussion
The present study examined the developmental interplay between silent reading fluency and reading comprehension from Grade 1 to Grade 9 (age 7 to 15) in the context of a transparent orthography (Finnish). Of particular interest was the question of whether the associations between the two domains of reading are bidirectional or unidirectional. We applied a stringent test to this question by adopting a random intercept cross-lagged panel modelling approach (RI-CLPM). The use of this approach was a critical addition to the literature on this issue because it overcame some of the problems of the traditional cross-lagged panel models and allowed us to focus on the within-person level of changes in development. We aimed to examine whether becoming better in one reading skill predicts becoming better in the other skill (within-person level associations), in addition to whether differences between individuals in one reading skill are associated with differences in the other skill (between-person level association). The model showed that silent reading fluency and reading comprehension correlated strongly at the between-person level and, thus, those who were better at one reading skill were typically good also at the other. At the within-person level, however, only some developmental associations emerged: in the early reading acquisition phase (Grade 1-2), silent fluency which represent the stable differences between individuals. WR_G1-9 and SR_G1-9 are the two silent reading fluency measures that were used. WR_G1-9 are the word reading fluency measures and SR_ G1-9 are the sentence reading fluency measures. Please note that error covariances between the Word reading and Sentence reading measures at subsequent time points were added to the model based on the modification indices, although these are not visible in the figure. * p < 0.05, ** p < 0.01, *** p < 0.001 predicted comprehension. There was also some evidence to suggest that, in the adolescent phase, comprehension weakly predicted silent fluency (from Grade 7 to 9). These findings suggest that, at least in a transparent orthography, such as Finnish, the developmental pathways in silent reading fluency and reading comprehension diverge early on. There seems to be, however, some unidirectional associations from silent fluency to comprehension in the early years, which may function in the opposite direction in adolescence. These findings reflect the differential developmental associations between the reading skills at different stages of reading development; slow and laborious reading can act as a bottleneck for reading comprehension during the early years while comprehension processes may later on also promote reading fluency.
Regarding the between-person association, that is, the question of whether children with good silent reading fluency skills have better reading comprehension skills than children with poor silent reading fluency skills, the model strongly indicated that silent reading fluency and reading comprehension are positively associated with one another. This finding supports many previous studies (e.g., Florit & Cain, 2011;Kershaw & Schatschneider, 2012;Lonigan et al., 2018;Psyridou et al., 2021;Santos et al., 2020;Torppa et al., 2016). Our results add to the previous literature by showing that the associations between the two skills exist but seem to diminish across Grades 1-9. This is noteworthy because most previous studies include only data from the early school grades (e.g., Cadime et al., 2017;Santos et al., 2020) and tend to be conducted among English-speaking children (e.g., Kershaw & Schatschneider, 2012;Lonigan et al., 2018). Moreover, the positive autoregressive paths suggested that individuals who scored above (or below) their average scores tended to score above (or below) their average scores again in the next time point as well. The autoregressive paths for silent reading fluency were significant from Grade 1 onwards while for reading comprehension they became significant only from Grade 4 onwards suggesting more randomness in the fluctuation around the individual level during the early school years.
However, the main focus of the present study was not on the between-person correlation or the autoregressive paths but the cross-lagged within-person level associations across time. We found that, during Grades 1-2, a cross-lagged path ran from silent reading fluency to comprehension. In the later grades, however, the association vanished and only one small path from reading comprehension in Grade 7 to silent reading fluency in Grade 9 was significant. At no time point did we identify significant bidirectional cross-lagged paths between the two skills. Thus, our analyses using a long-term follow-up sample and a stringent test of the associations indicated that some small unidirectional predictive effects exist and that there seems to be a change in the direction of the associations across time.
The finding of a predictive effect of silent reading fluency on reading comprehension during the early grades is consistent with previous studies (e.g., Cadime et al., 2017;Florit & Cain, 2011;Kim et al., 2015;Kim et al., 2011;Santos et al., 2020). The present findings show that the predictive association is found even for the groupadministered and less commonly used silent reading fluency measures we used in the present study and even for the within-person variance. In other words, we can infer that the children who in Grade 1 were particularly fluent (in comparison to their overall level across Grade 1-9) were improving fast in reading comprehension from Grade 1 to 2, irrespective of the differences between individuals at the overall level (between-person variance). This early association is understandable because, during the early grades, children must acquire at least some level of decoding before reading comprehension is possible. These findings, thus, support the theoretical accounts that suggest association from reading fluency to reading comprehension, e.g., SVR (Gough & Tunmer, 1986;Hoover & Gough, 1990), lexical quality hypothesis (Perfetti, 1985(Perfetti, , 2007; direct and mediational inference model (Cromley & Azevedo, 2007). The lexical quality hypothesis for example explains the association through the following process: well-automatized word reading skills reduce the resource demands of cognitive processes (e.g., memory and attention), which can then be devoted to understanding meaning in text (Perfetti, 1985(Perfetti, , 2007. Furthermore, fluent reading could support reading comprehension because the reader can decode the text accurately, quickly, and with proper expression, which facilitates the construction of a mental representation of the text (National Reading Panel, 2000). However, the findings did not support some of the previous longitudinal studies that identified bidirectional effects between reading fluency and comprehension (Kim, 2015;Little et al., 2017;Santos et al., 2020). In our study, we found only unidirectional effects from silent reading fluency to reading comprehension (Grades 1-2) and from reading comprehension to silent reading fluency (Grades 7-9). Thus, our findings did not lend support for models suggesting interactive relations between the two skills (e.g., direct and indirect effects model of reading (Kim, 2020), the expanded view of the SVR (Nation, 2019), the interactive models of reading (Rumelhart, 1977;Stanovich, 1980)). However, developmentally over a long time-period also a significant albeit small predictive path from reading comprehension to silent reading fluency was obtained. This suggests that eventually comprehension processes may also support fluency. However, given that reading comprehension predicted only 1% of silent reading fluency and our fluency measures were not fully independent of reading comprehension processes, this effect needs to be confirmed by other studies.
It is likely that the orthographic depth of the target language affects the pace at which the association between silent reading fluency and reading comprehension diminishes over time. Orthography could also be a reason for the different findings between our study and those reporting bidirectional effects. In the context of the highly transparent Finnish language, a strong early association between reading fluency and comprehension is expected because reading acquisition is fast. The majority of Finnish children learn to read fluently already in Grade 1 (Lerkkanen et al., 2004;Soodla et al., 2015). Therefore, it is understandable that we found significant cross-lagged association in Grades 1-2 but not afterwards. During the early grades, when decoding is not an automatized skill, decoding plays a strong role in reading comprehension, whereas later, when reading becomes "fluent enough", linguistic comprehension begins to have a greater contribution to reading comprehension than decoding (Castles et al., 2018;Nation, 2019;Torppa et al., 2016). The prior literature, thus, has also documented a developmental shift in the factors that contribute to reading comprehension. After achieving proficiency in decoding, reading comprehension is less limited by decoding skills, and the influence of linguistic comprehension increases. In transparent orthographies, such as Finnish, this shift takes place earlier than in opaque orthographies, such as English, because decoding skills are learned more quickly (Caravolas et al., 2013;Florit & Cain, 2011;Joshi et al., 2015).
During the later grades, there was a significant positive effect from reading comprehension to silent reading fluency. In particular, having better reading comprehension in Grade 7 predicted a change towards more fluent silent reading in Grade 9. Although the effect was modest, it should be noted that the model with autoregressive controls and the inclusion of within-person variance only in the cross-lagged part of the model provides a stringent test of the associations. This finding supports the idea that reading automatization is also promoted by reading comprehension. Good reading comprehension could support reading fluency, for example, because the reader can use contextual information (such as semantic and syntactic information), which facilitates both word reading and the prediction of text structure (Fuchs et al., 2001). This can lead to more accurate and faster reading. In this case, higherorder processes could compensate for shortages in lower-order processes (Stanovich, 1980). A previous study (Perfetti et al., 1979) using reaction times suggested that readers use context to assist with word recognition and that a better understanding of context leads to improved word-reading speed and accuracy. In addition, Carretti et al. (2020) showed that reading comprehension was a significant predictor of text reading fluency, but the role of reading comprehension declined over time. However, other studies have drawn mixed conclusions on the matter (Bowey, 1984;Jenkins et al., 2003).
In contrast to previous studies (Kim, 2015;Little et al., 2017;Santos et al., 2020), at no time point did we identify significant bidirectional cross-lagged paths between the two skills. As discussed above, orthography could be one of the reasons for the differences compared to the previous studies. It is likely though that the use of RI-CLPM may also be one reason for these differences. The RI-CLPM separates the within-and between-person variances so that the within-person level associations can be examined while controlling for the between-person variance. That is, as the differences in the overall reading level between individuals were controlled in the model, the cross-lagged effects were between child's changes over time in reading comprehension and silent reading fluency. This suggests that regardless of the individual's overall level of silent reading fluency and reading comprehension, improvement in silent reading fluency predicted improvement in reading comprehension in the early grades, while improvement in reading comprehension predicted improvement in silent reading fluency, but only in the later grades. Consequently, a model with autoregressive controls and the inclusion of within-person variance only in the cross-lagged part of the model does not muddle within-person and between-person variance together and provides a more stringent test of the crossdomain associations.
In addition, measurement differences may explain the difference. In this study, reading fluency was assessed with silent, group-administered measures in a classroom setting. As a result of the large sample, we could not have the more commonly used tasks for the assessment of reading fluency, namely the individually administered oral word list/pseudoword list/oral text reading tasks for the full sample. Overall, it is difficult to have a "clean" measure for the assessment of reading fluency, as even in the more commonly used measures something else is also involved in addition to decoding (e.g., articulation, motor response). Our measures included words and short sentences which involved some semantic processing. The tasks also required motoric planning and execution which may affect the results. However, a sub-sample of our initial sample (N≈200-350) participated in individual assessments in addition to the classroom assessments in Grades 1, 2, 3, 4, 7, and 9. As a part of the individual assessment they read aloud a word list, a pseudoword list, and a text. As shown in the correlation table (Appendix 2), the group-administered tasks correlated well with the oral word list reading, pseudoword list reading and text reading tasks and as such tapped much of the same variance as those task types. In addition, importantly for the current research questions, the correlations between the group-administered silent reading fluency tasks and reading comprehension were very similar to those between the individually administered oral reading fluency tasks and reading comprehension. This suggests that had we been able to conduct the rather complex RI-CLPM model with the smaller sample, the result would have likely been very similar.
Some limitations concerning this study need to be addressed. Firstly, we used only one measure to assess reading comprehension at each time point. Although we calculated the correction of attenuation in each grade to control for measurement error, having more measures for the assessment of reading comprehension would have increased the strength of our model. By having more texts and items in the reading comprehension assessment, we could have increased the reliability of the reading comprehension assessment. Unfortunately, given a frequently repeated data collection with multiple measures collected from approximately 2,000 children, it was not possible to include more reading comprehension tasks, which can be quite lengthy. Secondly, the two tasks used for the assessment of silent reading fluency examined word-and sentence-level reading fluency. In comparison to text reading, in such tasks, a reader can use less contextual information to support reading fluency. In theory, text reading fluency tasks could create stronger associations with reading comprehension tasks than the tasks we used. However, the correlations (Appendix 2) suggest very similar correlations between the differential reading fluency measures and reading comprehension. Future studies are needed to investigate the association between reading fluency and reading comprehension including also for example language comprehension and reading accuracy (e.g., Tobia & Bonifacci, 2015). For example, poor accuracy could affect comprehension because of possible misunderstandings. It should be noted, though, that in transparent orthographies, such as Finnish, problems in accuracy are rare and accuracy measures are at a ceiling already after the first year of formal education (Lerkkanen, 2004). Finally, the assessment at Grades 5 and 8 are missing which causes a longer time interval between some assessments.
In conclusion, the results suggest that, although reading comprehension and silent reading fluency levels over time are strongly correlated, the predictive within-person associations are modest. That is, although the children with high performance on one reading skill tend to have good performance on the other also, within-person changes in either of the skills are not systematically predicted by the other (e.g., becoming stronger in one reading skill does not predict improvement in the other). The two modest associations identified at the within-person level could suggest that in addition to Grade 1 silent reading fluency that predicted Grade 2 reading comprehension and Grade 7 reading comprehension that predicted Grade 9 silent reading fluency many other more specific components may explain changes in children's reading skills such as instruction, reading amount, linguistic comprehension, working memory, perspective taking, lexical legacy, background knowledge etc. which worth to be examined in future studies (Cromley & Azevedo, 2007;Gough & Tunmer, 1986;Hoover & Gough, 1990;Kim, 2020;Kintsch, 1988;Nation, 2017;Perfetti & Stafura, 2014). In practice, the existence of the significant association from silent reading fluency to reading comprehension suggest that supporting reading fluency is important when promoting reading comprehension and potentially over the long run improved reading comprehension may promote reading fluency. The two cross-lagged associations that were found were unidirectional and varied as a function of time. The findings showing a unilateral association from silent reading fluency to comprehension were in line with the lexical quality hypothesis (Perfetti, 1985(Perfetti, , 2007 rather than the interactive models of reading (Rumelhart, 1977;Stanovich, 1980). However, during the later grades, there was a small but significant positive predictive association from reading comprehension to silent reading fluency. Because the cross-lagged effects were found at the within-person level, the findings are not due to differences between individuals but to changes over time within each child. This means that, regardless of the individual's overall level of silent reading fluency and reading comprehension, good silent reading fluency can promote reading comprehension in the early grades, while good reading comprehension may promote silent reading fluency in the later grades.

Appendix 1 RI-CLPM for silent reading fluency and reading comprehension development without using the correction of attenuation for reading comprehension
See Fig. 2.   Fig. 2 All path estimates are standardized. The paths with dashed lines represent non-significant coefficients. The paths with solid lines represent significant coefficients. The factors Level_RF and Level_RC are between-person factors, which represent the stable differences between individuals. WR_G1-9 and SR_G1-9 are the two silent reading fluency measures that were used. WR_G1-9 are the word reading fluency measures and SR_G1-9 are the sentence reading fluency measures. Please note that error covariances between the Word reading and Sentence reading measures at subsequent time points were added to the model based on the modification indices, although these are not visible in the figure. * p < 0.05, ** p < 0.01, *** p < 0.001

3
Development of silent reading fluency and reading comprehension…

Appendix 2
See Table 3. Table 3  Correlation table between