Vocabulary, text coverage, word frequency and the lexical threshold in elementary school reading comprehension

Vocabulary knowledge is one of the most important elements of reading comprehension. Text coverage is the proportion of known words in a given text. We hypothesize that text comprehension increases exponentially with text coverage due to network effects and activation of prior knowledge. In addition, the lexical threshold hypothesis states that text comprehension increases faster above a certain amount of text coverage. The exponential relationship between text coverage and text comprehension, as well as the lexical threshold, are at the heart of text comprehension theory and are of great interest for optimizing language instruction. In this study, we first used vocabulary knowledge to estimate text coverage based on test scores from N = 924 German fourth graders. Second, we compared linear with non-linear models of text coverage and vocabulary knowledge to explain text comprehension. Third, we used a broken-line regression to estimate a lexical threshold. The results showed an exponential relationship between text coverage and text comprehension. Moreover, text coverage explained text comprehension better than vocabulary knowledge, and text comprehension increased more quickly above 56% text coverage. From an instructional perspective, the results suggest that reading activities with text coverage below 56% are too difficult for readers and likely inappropriate for instructional purposes. Further applications of the results, such as for standard setting and readability analyses, are discussed.


Introduction
Reading comprehension is a prerequisite for lifelong learning and one of the key goals of elementary education (e.g., Artlet et al., 2003).It is a multi-faceted construct that involves multiple components (e.g., Graesser et al., 2004).Vocabulary knowledge is one of the most influential determinants of reading comprehension during elementary school (e.g., McElvany et al., 2009;Quinn et al., 2015).According to the Simple View of Reading (Gough & Tunmer, 1986), reading comprehension involves two components: word recognition and language comprehension.Vocabulary knowledge is related to both language comprehension and word recognition (Duke & Cartwright, 2021).Vocabulary knowledge provides a link between phonology, orthography, and word meanings (e.g., Ehri, 2014).
Based on the hierarchical relations among reading sub-components (e.g., Kim, 2020), problems with lower-order reading components, such as word recognition and vocabulary, result in problems with higher-level components, such as inferencemaking.Thus, Wang et al. (2019) found a minimum level of word recognition fluency that is necessary for higher-level reading processes.Based on the Model of Lexical Quality (Perfetti, 2007), they suggested that efficient word recognition clears the way for higher-level reading processes, and therefore, problems in word recognition eventually lead to problems in higher-level processes (also Karageorgos et al., 2020).Similarly, O' Reilly et al. (2019) argued regarding vocabulary knowledge that the activation of prior knowledge only spreads properly if a critical number of known content words are present in a text.Thus, text comprehension increases above a certain level of known words in the text.
In this article, we first discuss the relationship between vocabulary knowledge (i.e., the overall number of words a person knows) and text coverage (i.e., the number of known words in a specific text).Second, we examine the linear and non-linear relationship between text coverage and text comprehension.Third, we identify thresholds that can help improve instruction and assessment of reading comprehension.

Vocabulary knowledge and reading comprehension
Vocabulary knowledge is a multi-faceted construct (e.g., Perfetti & Hart, 2002) that is highly associated with the ability to read fluently and comprehend texts (Perfetti, 2007).Two important sub-dimensions of vocabulary knowledge are vocabulary breadth, i.e., the number of words known, and vocabulary depth, i.e., how much knowledge about semantic, orthographic, and phonological aspects of a word are available (Li & Kirby, 2015).Previous research has shown that vocabulary breadth is more strongly associated with reading comprehension than vocabulary depth (e.g., Li & Kirby, 2015;Ouellette, 2006).Additionally, semantic knowledge has a stronger association with reading comprehension than orthographic or phonological 1 3 Vocabulary, text coverage, word frequency and the lexical… knowledge (Richter et al., 2013).Studies generally report strong associations between vocabulary knowledge and reading comprehension (e.g., English : Quinn et al., 2015;German: Richter et al., 2013).Thus, it seems that knowing the meaning of many words increases the probability of correctly recognizing and comprehending the words in a particular text (Perfetti, 2007).

Vocabulary knowledge and text coverage
Text coverage is usually defined as the proportion of words in a text that are known by a particular reader (Hsueh-Chao & Nation, 2000).More specifically, text coverage can be understood as the intersection between the words in a given text and a reader's vocabulary knowledge.It takes relatively few unique words to reach a relatively high text coverage in most texts (Hsueh-Chao & Nation, 2000).According the Zipf's theorem, when the words in a text are ordered according to their frequency, their probability of occurrence is inversely proportional to their place on the frequency list (Piantadosi, 2014).Thus, a small number of words occur very often and many words occur very rarely in authentic texts.Corpus analysis with large samples of texts shows that knowledge of only the 2000 most frequent words is sufficient to achieve an average text coverage of 90.6% for narrative texts and an average text coverage of 78.4% for academic texts (Nation & Waring, 1997).Text coverage for academic texts is lower because such texts include more rare words.Additionally, the relationship between text coverage and the length of the frequency ranked word list (FRWL) is logarithmic; for instance, the first 1000 most frequent words provide 72% text coverage, and the next 1000 only add 7.7 percentage points to text coverage (Nation & Waring, 1997).
To our knowledge, no previous study has examined the relationship between text coverage and readers' actual vocabulary knowledge for a representative sample of texts and/or participants.In a FRWL, the frequency of a word determines whether the word is included in the list or not.For vocabulary knowledge, this relationship is not deterministic but probabilistic, as frequent words are more likely to be known than rare words (e.g., for a review: Brysbaert et al., 2018).Overall, the correlation between the probability of knowing a word and its frequency is high (German third and fourth graders r = 0.74 : Trautwein & Schroeder, 2018).Therefore, the text coverage of a given FRWL and actual vocabulary may be very similar.
Figure 1 panel a illustrates probabilistic relationships between vocabulary knowledge and word frequency.Students with larger vocabularies are more likely to know more rare words compared to students with smaller vocabularies (Brysbaert et al., 2018).
Figure 1 panel b illustrates the logarithmic relationship between vocabulary knowledge and text coverage.The relationship between vocabulary knowledge and text coverage should have a logarithmic shape, similar to the relationship between a FRWL and text coverage.Additionally, given the same vocabulary knowledge, text coverage should be lower for a text with lower compared to higher average word frequency.

Text coverage and reading comprehension
Text comprehension substantially depends on text coverage.According to the construction-integration model (Kintsch, 1988), readers' mental representation of a text is an associative network of concepts and propositions.In this network, concepts represent nodes and associations represent links.The more words are known, the more concepts and the more prior knowledge can be activated.The number of possible associations between concepts grows exponentially with the number of activated concepts.It is much easier for readers to disambiguate the meaning of a text if the words in the text immediately activate the correct concepts.Disambiguating text meaning is highly important for integrating text information with prior knowledge (Richter & Schnotz, 2018).On one hand, readers are usually able to comprehend texts even when they contain some unknown words.This is because readers can make inferences based on contextual information to infer the meaning of unknown words if the network of associations between the known concepts is strong enough (Share & Stanovich, 1995).However, contextual inferences require additional cognitive resources or can lead to false interpretations, which makes text comprehension more challenging when text coverage is low (Cain et al., 2004).Indeed, drawing inferences from the context and building up an understanding of the text is only possible once text coverage reaches a certain level.In the lexical threshold hypothesis (Hsueh-Chao & Nation, 2000), text comprehension is assumed to be significantly impaired below a certain amount of text coverage.

Lexical threshold hypothesis
The lexical threshold hypothesis states that text comprehension increases faster above a certain amount of text coverage (Hsueh-Chao & Nation, 2000).Relativity few and heterogeneous findings exist about the lexical threshold hypothesis.For instance, Hsueh-Chao and Nation (2000) found that individuals need to know the meaning of 98% of the words in a fictional text for comprehension in a reading for pleasure situation, where unknown words were assessed by self-report.In another study, Laufer (1989) reported that reading comprehension increased more rapidly Vocabulary, text coverage, word frequency and the lexical… if individuals knew at least 95% of the words in a text.In this study, individuals were required to translate a vocabulary list in order to determine their text coverage.Laufer and Ravenhorst-Kalovski (2010) suggested two thresholds, 98% and a minimum at 95%.Their analysis was based on participants with high prior knowledge as well as a standardized test of vocabulary size and reading comprehension test.More recently, O'Reilly et al. (2019) found that the reading comprehension of ninth-to twelfth-graders increased rapidly when they knew more than 59% of the critical content words in a text.In this study, knowledge of critical content words was assessed with a multiple-choice test.By contrast, Schmitt et al. (2011) were not able to determine a clear lexical threshold in a carefully designed study with a wordnonword recognition test and a standardized reading comprehension test.

Summary of the theoretical background
Vocabulary knowledge, text coverage, word frequency, and text comprehension are theoretically related: Vocabulary increases text coverage logarithmically, and this relationship depends on the word frequency in the text (word frequency effect: Brysbaert et al., 2011;Zipf's theorem: Piantadosi, 2014).Text comprehension theory (i.e., the construction-integration hypothesis) assumes exponential growth in connectivity and activation of prior knowledge, which means that increasing text coverage should exponentially improve text comprehension (Share & Stanovich, 1995).The lexical threshold hypothesis states that text comprehension increases faster above a certain threshold of text coverage.
Figure 2 summarizes the described relationships.The larger dashed circles represent a person's vocabulary knowledge and the smaller solid circles represent texts.The intersection between the two circles (i.e., the area with diagonal lines) is the text coverage.Texts with many rare words are more likely to be covered when persons have a larger vocabulary knowledge.The discontinuous color scale from white (upper left corner) down to almost black represents the degree of text comprehension.Text comprehension increases with more text coverage.It takes a certain amount of text coverage before comprehension increases more rapidly.

Research question
Although the relationships between (1) vocabulary knowledge and reading comprehension, (2) vocabulary knowledge and text coverage, and (3) text coverage and reading comprehension have been investigated in separate contexts, they have rarely been researched using an integrative approach.
In this study, we used data from a vocabulary knowledge and a text comprehension test administered to a large number of fourth graders participating in a reading support program.We analyzed word frequencies from vocabulary test items and the reading comprehension texts to estimate text coverage for each participant for each text.We compare linear and non-linear models of text coverage explaining text comprehension.In addition, we investigated whether vocabulary knowledge or text coverage were better able to predict children's text comprehension.Finally, we determined potentially relevant amounts of text coverage in order to define various thresholds.
The study's three central research questions (RQ) can be summarized as follows: RQ1: What is the shape of the relationship between text coverage and text comprehension?
We hypothesize that text comprehension increases exponentially rather than linearly, due to the effect of network connectivity on the propositional network and activation of prior knowledge.
RQ2: Does text coverage explain text comprehension better than vocabulary knowledge?
We hypothesize that text coverage better explains vocabulary knowledge because it more accurately describes the words known in a given text.
RQ3: Is there an amount of text coverage can be defined as a lexical threshold?
We hypothesize that text comprehension increases faster above a certain level of text coverage.Vocabulary, text coverage, word frequency and the lexical…

Participants
The children who participated in the study attended 4th grade and were tested at the beginning of the second half of the school year.Fourth graders are typically required to comprehend texts independently, so vocabulary knowledge (i.e., knowing the meaning of words) and especially text comprehension become important.This study is a program evaluation of a project to promote language and literacy skills among fourth graders at public schools in six different German states.The program provided teachers with scientifically grounded teaching materials and handouts.Only students with parental consent were included in the present analysis.The study involved N i = 949 fourth graders from N c = 64 classes and N s = 35 schools.About half of the participants were female, 52.05%, and children were on average M = 10.28 years old, SD = 0.52.Overall, 64.91% of the students reported exclusively speaking German at home.The program was conducted in federal states where the share of public school students from immigrant backgrounds ranged from 50.1 to 28.9% (Stanat et al., 2017, p. 299).Thus, participants are relatively representative of these federal states.However, we conducted robustness checks to assess the impact of language background and discuss this in the limitations.We excluded 25 (2.63%) participants because they answered fewer than 50% of the items for either the vocabulary or the text comprehension test.Thus, we analyzed the test results of N i = 924 participants.

Vocabulary knowledge test
Vocabulary knowledge was assessed with the synonym-based vocabulary knowledge test, KFT 4-12 + R V1 (Heller & Perleth, 2000).We used this test because we considered it a good measure of 'knowing' the meaning of words in line with the theory that words represent nodes in an associative network.This paper-pencil test included 25 items presented in fixed order and administered under low time constraints.Thus, most students responded to all items.The items consisted of one item stem word and five response options with one key (see Fig. 3).The distractors were orthographically similar (i.e., curved versus covered) and/or semantically related (e.g., anonyms or meronyms), but not synonyms.

Text comprehension test
The standardized text comprehension test was the Aspects of the Learning Situation and Learning Development Test (LAU; Lehmann et al., 2002).This test includes four texts with multiple-choice (MC) items, "Mosquito" (124 words, 11 sentences, 4 items), "Candle" (106 words, 8 sentences, 7 items), "I am not blind" (206 words, 8 sentences, 7 items), and "Plastic duck" (125 words, 7 sentences, 7 items).Figure 4 shows an example item.The test was administered with low time constraints; thus, most students completed all items.

Word frequencies
We derived word frequencies for the vocabulary test items and reading comprehension text from the 'childLex' corpus.The childLex corpus (www.child lex.de) includes 500 books classified as appropriate for children 6-12 years of age, including overall 9.85 million running words (i.e., "token") and 182 thousand unique words (i.e., "types").Normalized lemma frequencies were used for all analyses.
All words in the vocabulary test, but not all words in the reading comprehension texts, were part of the childLex corpus (see Table 1  Vocabulary, text coverage, word frequency and the lexical… words were proper nouns or compound words, and their frequencies were interpolated using Laplace approximations (Diependaele et al., 2013).Based on the interpolated normalized lemma frequencies, so-called Zipf values were computed (Van Heuven et al., 2014).This scale is logarithmic and scaled such that a value of 3 corresponds to the frequency of a word occurring once in a million words, a value of 4 corresponds to a frequency of ten times in a million words, a value of 5 100 times in a million words, etc.The word frequencies in the vocabulary test were on average M = 3.99, SD = 0.56, and ranged from 2.77 to 4.87.The "Mosquito" and "Candle" texts had similar word frequency distributions and a similar length.The "Plastic duck" text was similarly long as these two texts but encompassed more infrequent words.The "Blind" text was longer and encompassed more infrequent words than "Mosquito" and "Candle".The differences in word frequency distributions between texts indicate that the texts had different vocabulary knowledge requirements.

Procedure
The study was conducted in the morning hours in all classes and administered in paper-pencil format.First, the text comprehension test was administered (30 min), Table 1 Overview of word frequencies by text Token = running words in the text, types = unique words in the text, n = number of tokens,% = relative proportion of tokens 1 Zipf value with Laplace transformation = log('lemma frequency + 1'/'number of unique lemma in corpus + number of words in the corpus'), 2 Lower boundary defined as larger than and upper boundary as smaller than or equal to, 3 Not-found tokens were assigned a value of 2.00 based on the Laplace transformation and were counted in the interval '2-3'

Data quality
In a preparatory step, we conducted an item fit analysis because misfitting items in the text comprehension and vocabulary tests might lead to false interpretations of the test results.We applied the Rasch model (Adams & Wu, 2007) to the response data for the text comprehension and vocabulary tests using the package Test Analysis Modules (TAM; Robitzsch et al., 2021) within R (R Core Team, 2021).We identified three items in the text comprehension test with an outfit or infit below 0.7 or above 1.3 (Gustafsson, 1980).An inspection of these items suggested that they had somewhat ambiguous answers.Even readers with otherwise high reading comprehension abilities did not answer these items correctly.We decided to exclude these three items from the text comprehension test since they might not actually measure comprehension.No items were excluded from the vocabulary knowledge test based on this analysis.The relationship between item difficulty in the vocabulary knowledge test and the item's word frequency was very important for the text coverage estimation.In the original 25 items, item difficulty and minimum word frequency for the synonym pair correlated only with r(23) = − 0.33.However, after excluding five items with highly synonymous and orthographically similar distractors, the correlation was r(18) = − 0.64.We considered this to be more consistent with previous findings on word frequency effects in German 4th graders (e.g., r = − 0.74: Trautwein & Schroeder, 2018).
The overall rate of missing (i.e., omitted) responses was low (LAU: 2.97% and KFT: 5.65%).Missing responses were treated with the full information maximum likelihood method (FIML).

Modeling vocabulary knowledge and text coverage
Text coverage is the intersection between a text's words and the reader's vocabulary knowledge.We estimated text coverage values for each child and each text.The rationale behind the estimation process was to reference children's vocabulary knowledge test scores to the word frequency level they are likely to know and then determine which words in a text were likely to be known by each child.

3
Vocabulary, text coverage, word frequency and the lexical… Based on these results, we transformed the Rasch scale, N (0, 1), so that the item parameters were on the same scale as the expected word frequency of each item.We refer to this as the Zipf scale because this scale represents students' vocabulary knowledge as a function of word frequency, N (4.59, 1.30).
Figure 5 panel a shows vocabulary knowledge on a Rasch scale with a mean of 0 and a standard deviation of 1. Negative values represent low vocabulary knowledge because the probability of answering a vocabulary test item correctly is low.Positive values represent high vocabulary knowledge because the probability of answering a vocabulary test item correctly is high.Figure 5 panel b shows the linear relationship between item difficulty and word frequency.An item of average difficulty has an expected word frequency of 4.59, a difficult item (i.e., M-1 SD) an expected word frequency of 3.29, and an easy item (i.e., M+1 SD) an expected word frequency of 5.90. Figure 5 panel c shows the distribution of vocabulary on the Zipf scale.On this scale, an average person has a value (θ wf p ) corresponding to the expected word frequency of an item with average item difficulty. .This function was used to calculate the probability of each person knowing each word as described by Brysbaert et al. (2018) and illustrated in Fig. 6.The text coverage is the average probability of knowing each word in the text or the proportion of words estimated to be known out of the total number of words in the text.

Modeling text coverage and text comprehension
We addressed RQ1 by comparing linear and exponential models of text coverage explaining text comprehension.We used a latent regression Rasch model (De Boeck & Wilson, 2004) that in the baseline model explains the probability of correctly solving a test item based on random effects for item difficulty and a random effect for person ability.For the explanatory models, we additionally included linear and quadratic terms for text coverage as a text-by-person covariate or vocabulary knowledge as a person covariate.
A latent regression Rasch model has the advantage that the regression coefficients represent the relationship between text coverage and measurement error-adjusted Vocabulary, text coverage, word frequency and the lexical… text comprehension.This modeling approach increases interpretability, as the hypotheses relate to a text comprehension measure that is free of measurement error, and increases the reproducibility of our estimates, as imperfect reliability of the text comprehension test should bias the regression coefficients much less.The model was specified within the generalized linear mixed-effect model framework (GLMM) and fitted using the package 'lme4' (Bates et al., 2014) in the R environment (R Core Team, 2021).
The difference in random variance in person ability (σ 2 θ ) between a model without text coverage (i.e., baseline model) and the explanatory models with text coverage were used to calculate the explained variance in person ability R 2 θ , σ 2 θ baseline − σ 2 θ text coverage ∕σ 2 θ baseline .We used marginal R 2 (mR 2 ) to estimate the variance in the responses explained by the fixed effects (Nakagawa & Schielzeth, 2013).
We compared the model fits using the Bayes Information criterion (BIC).The BIC are model fit indicators (i.e., goodness of fit) that prevent overfitting by penalizing the number of variables in the model (in contrast to deviance or pseudo R 2 ) and can be used to compare nested and unnested models.Lower values correspond to a better goodness of fit and fit differences of 5 -10 points can be considered substantive (Burnham & Anderson, 2002).Additionally, we use Akaike weights (w i ), which can be directly interpreted as conditional probabilities for each model (Wagenmakers & Farrell, 2004).
We addressed RQ2 by estimating similar linear and quadratic models using vocabulary knowledge as the predictor variable.We evaluate which models (i.e., out of all text coverage and vocabulary knowledge models) fit the data better using the same fit indicators described above.

Modelling the lexical threshold
We addressed RQ3 using a broken-line regression (Muggeo, 2008) with average text coverage predicting the sum score on the text comprehension test.Broken-line regression is a statistical method that identifies a changepoint in a linear regression.It also provides a significance level and confidence interval for the changepoint (i.e., threshold).Instead of estimating one regression slope, as in linear regression, broken-line regression estimates two regression slopes, divided at the identified changepoint.This method has been used in related research (O'Reilly et al., 2019;Wang et al. 2019).Based on our theoretical background regarding activation and propositional network connectivity, we expected an exponential increase and not necessarily a linear relationship with a changepoint.However, the changepoint is important from a practical perspective because decisions about factors such as text alignment are often binary (i.e., is the text too difficult for a particular student or not).

Vocabulary knowledge and reading comprehension
The descriptive results on raw score mean and standard deviation, range, reliability and correlations indicate that the tests for vocabulary and text comprehension worked in the intended way (Table 2).Both the vocabulary knowledge (Rel wle = 0.67) and text comprehension (Rel wle = 0.69) tests had an acceptable reliability for a large-scale assessment context.The vocabulary and text comprehension tests correlated highly with each other, r(922) = 0.61, t = 23.2,p < 0.001.

Vocabulary knowledge and text coverage
The process of text coverage estimation might be best demonstrated with an example and a visual overview.Table 3 shows the words from one sentence of the "Mosquito" text together with the probability that each specific word will be known by children with varying levels of vocabulary knowledge.High-frequency words such as "die [the]" (WF = 7.7) had a high probability of being known by both high-(100%) and low-skilled readers (92%).The probability that a low-frequency word such as "Stechmücken [mosquitoes]" (WF = 2.8) would be known was 36% for a high-skilled reader and 2% for a low-skilled reader.Words with intermediate frequency such as "saugen" [suck] had a 79% probability of being known for a highskilled reader and 11% for a low-skilled reader.Thus, the difference between children with high and low vocabulary knowledge was more pronounced for low-and average-frequency words.The text coverage is the average probability of knowing each word in a text.
The text coverage estimation yielded average text coverage scores ranging from 65% for "Plastic duck" to 74% for "Mosquito".Figure 7 shows the text coverage relative to vocabulary knowledge for each text.The estimated text coverage increases with vocabulary knowledge up to mean + 2 SD and then nears 100% for all texts.
For instance, the "Plastic duck" text has the most infrequent words and "Candle" the fewest infrequent words.The differences in text coverage between texts are higher for students with high and mean vocabulary knowledge than for students with 1 3 Vocabulary, text coverage, word frequency and the lexical… Table 3 Example sentence with probabilities of knowing each word relative to the WF of words and vocabulary knowledge Sentence from "Mosquito" text. 1 Ability estimates on Zipf scale: N (4.59, 1.30).High = 3.3 (M-1 SD), Mean = 4.59, Low = 5.9 (M + SD).On the Zipf scale, low values correspond to high vocabulary knowledge because the value describes the probability of knowing infrequent words.The probability of knowing a word is 50% when the value of the vocabulary knowledge estimate equals the value of the word.mean or low vocabulary knowledge.Text coverage for students with high (i.e., M + 1 SD) vocabulary knowledge ranged from 78.27% for the Plastic Duck text to 86.60% for the Mosquito text, while text coverage for students with low vocabulary knowledge (i.e., M-1 SD) was around 42% for each text.Text coverage and reading comprehension

RQ1: shape of the relationship between text coverage and text comprehension
The upper part of Table 4 summarizes the results comparing the baseline model (BLM) to a linear and a quadratic text coverage model.Both the linear and quadratic models had a substantively better fit than the BLM.This was indicated by a much lower BIC, Δ BLM-TCL (BIC) = 430, Δ BLM-TCQ (BIC) = 488.However, the quadratic text coverage model fit significantly better, χ 2 = 51.32,p < 0.001, and had the lowest BIC Δ TCQ-TCL (BIC) = 42.Although the variance explained by the quadratic term was rather small R 2 θ Δ = 0.026 , the fit indices suggest that the quadratic text coverage model fits better than the linear model.
The model parameters for the quadratic model are provided in Table 5.Both the linear, β 1 = − 1.26, SE = 0.63, p = 0.046, and the quadratic trend, β 2 = 3.90, SE = 0.54, p < 0.001, were significant.The signs of the predictors show that reading comprehension increased with text coverage, but that the effect leveled off in the low text coverage range.

RQ2: the better predictor for text comprehension
We performed the same analysis including linear and quadratic terms with vocabulary knowledge to determine whether text coverage was a better predictor of text comprehension than vocabulary knowledge.Both vocabulary knowledge models were better than the baseline model.As expected, both vocabulary  θ .χ 2 and p were the test statistic and p-value of the likelihood ratio test comparing nested models ˟w i can be interpreted as the probability of each model being the best model in a BIC sense among the compared models (Wagenmakers & Farrell, 2004) 1 3 Vocabulary, text coverage, word frequency and the lexical… knowledge models explain a significant amount of variance in the responses and thus reading comprehension ability.In contrast to text coverage, the quadratic model for vocabulary knowledge was not substantively better than the linear model, χ 2 = 3.34, p = 0.068.This was most clearly indicated by the only marginally lower Δ VKQ-VKL (BIC) = − 6, suggesting that the gains in explained variance and goodness of fit were due to overfitting.The linear vocabulary model showed a significant linear trend, β 1 = 0.63, SE = 0.03, p < 0.001 effect, that was in line with previous findings in terms of size and direction (see Table 5).
In a direct comparison of the two text coverage and two vocabulary models, the quadratic text coverage model turned out to be the best model as indicated by the much lower BIC.1 For better interpretability, we calculated the w i , which represents the probability of each model being the best out of the five models (Wagenmakers & Farrell, 2004).The w TCQ > 0.999 implies that there was an above 99.9%chance that the quadratic text coverage model was the best model among the five.In terms of explained variance, the differences between the quadratic text coverage model and the linear vocabulary knowledge model were small but significant for both outcomes concerning reading comprehension ability, ΔR 2 θ = 0.011 .

RQ3: amount of text coverage that defines the lexical threshold
A broken-line regression was significantly better at explaining text comprehension scores than a linear regression, F(2, 920) = 11.121,p < 0.001.The broken-line text comprehension increases at a rate of β 1<0.56 = 6.59,SE = 1.28, p < 0.001, and above the threshold, it increases at a rate of β 1>0.56 = 7.89, SE = 1.68, p < 0.001.The expected test score at the threshold was 9.56, slightly below the mean test score of M = 11.1.Thus, the threshold occurs at a mean reading comprehension level (Fig. 8).

Discussion
In the present study, we investigated the relationship between (1) vocabulary knowledge and reading comprehension, (2) vocabulary knowledge and text coverage, (3) text coverage and text comprehension, as well as associated lexical thresholds.
In line with previous studies, our findings show a strong association between vocabulary knowledge and reading comprehension.As expected, the association between text coverage and comprehension was best described by a non-linear relationship (i.e., exponential or broken-line).Text comprehension increases with text coverage exponentially rather than linearly, and we were able to identify a threshold at 56% text coverage, above which text comprehension increases more rapidly.Overall, text coverage outperformed vocabulary knowledge as a predictor of text comprehension.
Our study provides strong evidence for the view that text coverage and text comprehension are non-linearly related.This is in line with the construction-integration model (Kintsch, 1988) that conceptualizes text comprehension as building an Vocabulary, text coverage, word frequency and the lexical… associative network of links and nodes.A network loses connectivity exponentially when nodes are missing.Hence, there is a certain amount of text coverage that is necessary to activate relevant background knowledge, enable contextual inference and subsequently build comprehension.
Our estimation of the lexical threshold differed from previous studies.Previous studies reporting higher values used self-reports (Hsueh-Chao & Nation, 2000) or word translations (Laufer, 1989).The study with the most comparable research method found the most similar results, but only considered content words (59%; O' Reilly et al., 2019).Due to methodological differences, there is probably no 'one' lexical threshold, as the estimation crucially depends on how word knowledge and text comprehension are assessed.As a consequence, thresholds should only be used within the context in which they were defined.
Our findings also have implications for research and practice.In particular, the relationship between text coverage and text comprehension can be used to determine theoretically and statistically justified cut-off values to inform the selection of adequate learning material and for standard-setting procedures.
Selection of appropriate reading materials could benefit from the text range model and thresholds if text range estimation were implemented in a software tool or within a readability analysis.In most situations, readers benefit most from reading activities that are neither too difficult nor too easy (e.g., Wolfe et al., 1998).Reading activities that are too difficult may be more detrimental to motivation and reading engagement than activities that are too easy (Kahmann et al., 2022).The lexical threshold might be helpful for identifying too little text coverage.It is probably advisable to match readers to texts so that text coverage is above 56%.
A similar application is standard-setting.In the context of educational monitoring, it is often of great interest to determine what test score on a vocabulary test best represents "core vocabulary".Core vocabulary for reading is the vocabulary that allows students to understand a text on a basic level.Core vocabulary has been defined as the vocabulary that covers some (high) percentage of texts in corpora (e.g., Chujo & Utiyama, 2005).Thresholds for core vocabulary are usually defined based on expert ratings or norm values (Brown & Kappes, 2012).However, the text coverage function could be a model-based way to determine these thresholds.Wang et al. (2019) found a decoding threshold that is stable between Grades 5 and 10.They suggested that such thresholds are a function of skill rather than grade or age.According to our model, readers with a vocabulary corresponding to a text coverage above 56% start to gain comprehension.Whether this threshold is consistent across grades need to be investigated in further research.

Limitations and outlook
One of the major limitations of the present study is that our results are based on a large group of participants but only a small number of texts.Our results should be replicated with a larger and more representative sample of authentic texts, different vocabulary tests and other participants in order to test the generalizability of the reported results.
The difference in explained variance between the non-linear and linear model of the relationship between text coverage and text comprehension was significant but very small.On the one hand, this small effect could still be relevant, because it could improve reader-text matching without requiring more test time and is based on already available information (vocabulary test and text word frequencies).On the other hand, several aspects of this study might have led to a particularly small effect.First, the vocabulary test used in the present study was not ideal for the purpose of estimating students' text coverage.Although it was a widely used, standardized instrument, items were not selected systematically based on their word frequency.Additionally, the test had relatively few items and a relatively low reliability.Future researchers may be advised to use tests that systematically manipulate word frequency and are more reliable (i.e., SET 5-10: subtest "lexicon"; Petermann, 2012).Second, the non-linear relationship between text coverage and text comprehension might have been more pronounced if we had investigated a broader range of texts, including texts with only frequent words or texts with very rare words, and a broader range of students, for instance, second to fifth graders.
There was relatively little precise information about the students' language background.About 35% of the students reported that they did not primarily speak German at home.This group of students could include recently arrived non-native speaking students (about 4% of German fourth graders at the time of the study) or bilingual students.However, we performed robustness checks and used language spoken at home as a mediator of the relationship between text coverage and text comprehension and did not find a significant difference in effects.Thus, the relationship we described is probably generalizable to students with diverse language backgrounds.However, future studies should take a more in-depth look at language background-specific effects.
There are also some theoretical problems that might need to be addressed in the future to further develop this technique.In particular, the present framework does not take into account the context in which words are encountered.Frequent words are usually useful in many contexts, whereas infrequent words are more contextspecific.This issue is not addressed in our model.It is also not clear how different psychometric aspects such as measurement error, guessing and slipping influence the text coverage estimation.However, these aspects might primarily influence the absolute text coverage scores.
Despite these limitations, our study demonstrated that text coverage and the lexical threshold are useful concepts that are not yet well established in elementary school reading research and could help to align reading materials with readers and conduct standard-setting.Further research should further refine the method and test whether the thresholds are actually useful.
Funding Open Access funding enabled and organized by Projekt DEAL.

Declarations
Conflict of interest We have no known conflict of interest to disclose.

Fig. 1
Fig. 1 Diagram illustrating relationships between word frequency, vocabulary knowledge, and text coverage.Note Illustrative diagram (no actual data displayed).Panel a analog to Brysbaert et al., 2018.Panel b analog to Chujo and Utiyama (2005)

Fig. 2
Fig. 2 Diagram summarizing the relationship between word frequency, vocabulary knowledge, text coverage, text comprehension and the thresholds.Note Solid circles: text, dashed circles: vocabulary knowledge, area with diagonal lines: text coverage, position of the solid circle on the y-axis: number of rare words in a text, X-axis with increasing diameter of dashed circles: increase in vocabulary knowledge from left to right.Color gradient from white (left upper corner) = low text comprehension to black (right lower corner) = high text comprehension

Fig. 3
Fig. 3 Example item of the vocabulary knowledge test.Note Illustrative example of a typical item from the vocabulary knowledge test.This item was not actually in the test.The vocabulary knowledge test is protected by copyright

Step 1 :
referencing vocabulary test score to word frequencyThe vocabulary test responses were modeled with a Rasch model using the TAM(Robitzsch et al., 2022) within R (R Core Team, 2021).Then, we regressed the minimum word frequency of the synonym pair on the item difficulty parameter σ (WF = b 0 + b 1 σ + ε).The regression revealed significant regression coefficients of b 0 = 4.59, SE = 1.51, p = 0.007 and b 1 = − 1.31, SE = 0.37, p = 0.003.The intercept implies that an average item σ = 0 has an expected word frequency of WF = 4.59.The slope indicates that a difficult item σ = 1 has an expected word frequency of WF = 3.28 and an easier item σ = − 1 has an expected WF = 5.90.

Fig. 5
Fig. 5 Distribution of vocabulary knowledge before and after the linear transformation.Note Panel a shows the distribution of vocabulary knowledge before the linear transformation, panel b shows the relationship between vocabulary knowledge test scores on the z-standardized and Zipf scales, and panel c shows the distribution of vocabulary knowledge after the linear transformation

Fig. 6
Fig. 6 Relationship between word frequency and the probability of knowing a word relative to vocabulary knowledge.Note Figure analogous to Fig. 2 in Brysbaert et al. (2018).Vocabulary: High = 1 (M + 1 SD), Mean = 0, Low = − 1 (M-1 SD).The probability of knowing a word is 50% when vocabulary knowledge on the Zipf scale is equal to the frequency of a word

Fig. 7
Fig. 7 Text coverage for each text in relation to vocabulary knowledge.Note Percent text coverage estimate (y-axis).Vocabulary knowledge on z-standardized scale and the Zipf scale (x-axis).The Zipf scale is inverse to a z-standardized scale.Low values correspond to high vocabulary knowledge because the scale refers to the word frequencies individuals are likely to know.Vertical lines indicate low vocabulary knowledge (i.e., M-1 SD), mean vocabulary knowledge, and high vocabulary knowledge (i.e., M + 1 SD above) Sample: N = 924, Items: I = 20, Text: T = 4, Observations = 17,932 (924 × 20 = 18.480, difference due to 2.97% omitted and not reached responses) np i = Number of estimated parameters for model i ; log(L i ) = Natural logarithm of the maximum likelihood for model i ; BIC = Bayesian information criterion; Δ i BIC = [BICi-min(BIC)]; w i (BIC) = Rounded Schwarz weights.R 2 θ = Person variance explained by fixed effects obtained by (baseline model 2 θ − 2 θ of model i)/baseline model 2

Fig. 8
Fig. 8 Relationship between text coverage and comprehension.Note Broken-line regression with a changepoint at 56% text coverage.x-axis: Average text coverage estimate for a student across texts.y-axis: Text comprehension test score with a maximum of 20.The grey area is the 95% confidence interval of the expected text comprehension test score.The dots represent the distribution of the test score ).Most of the non-includedExample Item of the Vocabulary Knowledge Test Which word has the most similar meaning to the bold word?
V1).We only use the text comprehension test and the vocabulary knowledge test in the analysis.The tests were administered in accordance with their test manuals.

Table 2
Results of the The categorization (high, mean, low) was only used to derive illustrative examples and did not influence the actual estimation

Table 4
Model comparisons between models with linear and quadratic terms explaining the probability of a correct answer in the text comprehension test