When we read, a lot needs to happen for us to grasp what a text is saying: from understanding words, to drawing connections, to monitoring when comprehension goes astray (Castles et al., 2018). For most of us who are reading this specific manuscript, all these things will need to happen in our second language (L2), making this task even more remarkable. And yet, some lower-level processes—such as word reading—are so automatic that skilled readers can even not stop them in their L2, as famously demonstrated in the Stroop task (Braet et al., 2011; Jensen & Rohwer, 1966). Even complex reading processes may arise without additional effort, such as monitoring one’s comprehension—but only as long as high-quality word knowledge allows smooth word processing, as laid out by the Reading Systems Framework (Stafura & Perfetti, 2014). However, most theories agree that reading requires not only automatic, but also active and resource-demanding processes that may particularly matter when re-analysing coherence breaks (O’Brien & Cook, 2016; Singer et al., 1994; van den Broek & Helder, 2017). In contrast to the Reading Systems Framework, these latter theories would then predict increased active processing in the face of difficulties (see for instance van den Broek & Helder, 2017), rather than a reduction of reading processes under high load (as it follows from Stafura & Perfetti, 2014). Just as word knowledge lies at the heart of efficient word processing, strong executive functions lie at the heart of active processing (Kaakinen et al., 2003; van Moort et al., 2018). Additionally, strong executive control may also allow readers to compensate for less than smooth word processing (Hamilton et al., 2016). In the current study, we therefore ask whether executive control and word processing difficulty interact in their influence on adolescents’ comprehension monitoring. This question is particularly relevant for adolescents L2 learners for whom online monitoring is a strong predictor of their overall comprehension (Mulder et al., 2021). Additionally, word processing tends to be more difficult when reading in one’s L2 vis-à-vis L1 (Cop et al., 2015), which impacts sentence and higher-level processing (Hopp, 2017; Lim & Christianson, 2014; Pérez & Bajo, 2019; van den Bosch et al., 2018; Whitford & Titone, 2012). In this context, strong executive control abilities have been found to be more influential, arguably because they allow to compensate difficulties (Prehn et al., 2018; Raudszus et al., 2017). It is thus pivotal to understand how adolescents’ (here, aged 13–17 years) comprehension monitoring emerges in this interplay of word processing difficulty and executive control when reading in the L2.

Readers engage in comprehension monitoring whenever they notice and correct mismatches between what they currently read and what they have read before or know to be true. Comprehension monitoring has frequently been studied offline, that is, based on readers’ explicit reflections on a text. Typically, this is done by inserting inconsistent information—be it a syntactic mismatch between a pronoun and a verb, or a semantic mismatch between sentences—and ask participants to verbalize or underline any inconsistencies they notice (Baker, 1984; Kim & Phillips, 2014; Oakhill et al., 2005). These offline studies have demonstrated a close link between comprehension monitoring and general reading comprehension (Kim & Phillips, 2014): older children (e.g., 9 to 11-year-olds) and skilled comprehenders monitor their comprehension more skilfully (that is, they noticed and identified more and also harder-to-detect types of errors) compared to younger children (e.g., 5 to 7-year-olds) and less-skilled comprehenders (Baker, 1984; Oakhill et al., 2005; Zabrucky & Ratner, 1989).

Offline studies alone cannot, however, reveal how comprehension monitoring proceeds without prompting, nor are they free of secondary tasks that require metacognitive and linguistic efforts. To study comprehension monitoring directly while it happens, we need to look at the reading process itself using online methods. A prime method for this is eye-tracking, or the recording of readers’ eye movements on the text. Eye-tracking captures reading behaviour while leaving readers free to roam the text with no secondary tasks. In the eye movement record, comprehension monitoring shows in increased rereading of inconsistent compared to consistent information, a sign that readers notice and re-analyse the inconsistent information (Connor et al., 2015; Hessel et al., 2021; Joseph et al., 2008; Rayner et al., 2004). This increased rereading is important for comprehension: more pronounced slow-downs on inconsistencies are associated with stronger reading comprehension (Mulder et al., 2021) and comprehension suffers when rereading is made impossible (Schotter et al., 2014). Thus, rereading of inconsistences provides an excellent window into ongoing monitoring.

A key question is what readers need to successfully monitor their comprehension or indeed, what may be lacking when they fail to do so. A prominent theoretical position is provided by the Reading Systems Framework. It suggests that higher-level comprehension rests primarily on the efficiency of word processing, which in turn depends on high-quality word knowledge (Perfetti, 1985; Perfetti & Stafura, 2014). The prediction is that when word meanings are unclear or slow to retrieve, drawing higher-level connections becomes difficult to impossible. Support comes from eye movement studies that find that stronger vocabulary and comprehension skills go hand in hand with children’s and adolescents’ more pronounced monitoring, as visible in increased and targeted rereading of inconsistent compared to consistent information (Connor et al., 2015; Eilers et al., 2018; Hessel et al., 2021). Thus, eye movement evidence supports the idea that comprehension monitoring relies on efficient word processing.

However, individual difference studies can only indicate but never directly pinpoint the source of reading differences as more and less skilled readers always vary on multiple dimensions (Castles et al., 2018). For this reason, experimental manipulations of word processing efficiency have been used to pinpoint the source of differences between L1 and L2 readers (Hopp, 2016; McDonald, 2006) or more or less successful comprehension monitoring (Hessel & Schroeder, 2020). In these studies, readers are presented with texts where the phenomenon of interest—for example a certain syntactic structure or semantic inconsistency—is surrounded by either easy (that is, short and frequent) or difficult (that is, longer and less frequent) words. Both lowering a word’s frequency and increasing its lengths are known to increase word and text processing difficulties (Inhoff & Rayner, 1986; Joseph et al., 2013; Tiffin-Richards & Schroeder, 2015). Of interest is how syntactic processing or monitoring change in response to the variations in word processing difficulty. Using such an experimental design, adult L2 learners have been found to indeed reduce their monitoring under higher word processing difficulty (Hessel & Schroeder, 2020), supporting the idea that effortful word processing directly impacts higher-level monitoring. So far, however, this experimental evidence is limited to university students in the lab who differ from L2 learners at school in both age and language experience. The current study takes the same experimental paradigm into the field by investigating how word processing influences comprehension monitoring amongst younger, adolescent L2 readers.

Word processing difficulty is, however, not the only variable of interest when studying comprehension monitoring. Most reading theories agree that beyond the passive spread of activation from word knowledge to textual coherence (Perfetti & Stafura, 2014), reading at times also requires more active, reader-initiated processing (O’Brien & Cook, 2016; Singer et al., 1994; van den Broek & Helder, 2017). While passive processes such as word recognition are thought to be mostly automatic and effortless, reader-initiated processes require executive control in order to store information (Just & Carpenter, 1980), bind word meanings into a coherent whole (Hagoort, 2005), and focus attention on relevant information (Engle, 2002). This link is supported by studies that associations between stronger executive control and better text (but not word) comprehension (Sesma et al., 2009), more elaborative inference-making (Calvo, 2001), better metacognitive monitoring on difficult texts (Ikeda & Kitagami, 2013), and reading tailored to text structure (Hyönä et al., 2002) or reading purpose (Kaakinen et al., 2003; Linderholm & Van den Broek, 2002). In the context of the current study, we define executive control as domain-independent cognitive resource that allows people to not only store and rehearse information, but to also focus their attention on task-relevant information (Engle & Kane, 2003). In line with this definition, we chose the automated operation span task as our measure of executive control, as it is a complex task that does not involve reading comprehension (as opposed to the reading span task for example; Engle & Kane, 2003).

When it comes to online monitoring, however, the exact influence of executive control is hard to predict. While one study found that readers with stronger executive control adapted their reading less to inconsistencies compared to those with weaker control—which the authors interpreted as more efficient re-analysis (van Moort et al., 2018)—another study reports that given momentarily stronger control capacities (thanks to the absence of secondary tasks), readers adapted their reading more to inconsistencies compared to when taxed by secondary tasks—which the authors interpreted as more thorough re-analysis (de Bruïne et al., 2021). That is, stronger executive control was linked to either more or less pronounced monitoring across studies. Beyond differences in study design (individual differences vs. experimental manipulation), a reason for the differing results could be that active, reader-initiated processes may only kick in when comprehension falls below a reader’s expectations (van den Broek & Helder, 2017). If this was true, stronger executive control would only lead to increased monitoring when a reader judges their comprehension to be too low. Support for this notion comes from studies that find that stronger executive control is particularly beneficial when comprehension is impeded due to weaker L2 abilities (Prehn et al., 2018), slower decoding (Hamilton et al., 2016), or higher text difficulty (Ikeda & Kitagami, 2013). Note that these findings are in line with the Reading Framework which predicts a switch from passive to active processing in the face of difficulties (van den Broek & Helder, 2017). In the current study, we therefore ask how adolescents comprehension monitoring in the L2 emerges in the interplay of word processing difficulty and executive control.

The present study

We wanted to know whether word processing difficulty and executive control would interactively influence adolescent readers’ comprehension monitoring in their L2. In our study, German adolescents read short expository texts in English while their eye movements were monitored. Texts contained an inconsistency manipulation (consistent vs. inconsistent information) crossed with a manipulation of word processing difficulty (difficult vs. easy context words). The inconsistency manipulation served to tap comprehension monitoring (by observing the main effect of reading inconsistent words for longer than consistent words), while the word processing difficulty manipulation served to see whether word processing difficulty (as visible in longer reading of words and texts with higher word processing difficulty) would reduce such signs of comprehension monitoring. We made the following predictions: we expected that comprehension monitoring would show in longer and increased (re)reading of inconsistent over consistent information. We further expected adolescent readers to monitor their comprehension less strongly under higher word processing difficulty (which would show in reduced inconsistency rereading), in line with the idea that more effortful passive word processing limits higher-level monitoring (Perfetti & Stafura, 2014). Moving on to the exploratory part of our study, we also measured participants’ individual differences in executive control. Given the complexity of previous evidence (de Bruïne et al., 2021; van Moort et al., 2018), we took a more explorative approach to the role of executive control by merely asking if online comprehension monitoring was additionally influenced by the readers’ executive control, which would indicate the involvement of active, reader-initiated processing (van den Broek & Helder, 2017).



Thirty-nine adolescent L2 learners were recruited from 9 and 10th grade at two German secondary schools.Footnote 1 We secured consent from their legal guardian and the participants themselves. Two participants were excluded because their German language skills were not sufficient to understand the German-language task instructions of our experiment. Another participant was excluded for having learned English as their family language, as confirmed through our language questionnaire. Three further participants were unable to finish the experiment, two due to technical problems and one due to a developmental reading difficulty. The final sample thus comprised 34 participants (n = 20 female, mean age of 14.5 years, SD = 0.8). All participants had normal or corrected-to-normal vision.

In order to describe our sample, we asked participants about their language learning history in a language background questionnaire (all materials are available on the Open Science Framework (OSF), All participants had grown up learning German and additionally, 11 children had grown up learning one or two non-German home languages (two each with Russian, Croatian, and Portuguese, and one each with Arabic, Indonesian, Rumanian, Polish, and Vietnamese) and had started learning German by the age of 2.0 years (SD = 3.3) on average. As to English, all but one participant had learned the language since primary school and all of them learned English at secondary school. Roughly half of the sample (n = 17) were receiving English-medium instruction in subjects such as biology or science and four had had additional English instruction through voluntary school clubs or language study trips. Participants also reported considerable exposure to English in their free-time: the three most common activities were playing English-language video games (with n = 32 playing daily and n = 24 mostly in English), engaging with English-language social media (with n = 29 using social media daily and n = 14 mostly in English), or watching English-language videos (with n = 26 watching daily and n = 10 mostly in English).

To be able to describe our sample, we also assessed our participants’ English-language abilities using standardised tests of vocabulary and word reading (see materials section for a description of tasks). We chose vocabulary and word reading to tap our participants’ L2 proficiency as both measures are widely used as such since they tap important facets of L2 comprehension (cf. de Cat, 2020; Lervåg & Aukrust, 2010), namely, knowledge of the written from and of word meanings (Torgesen et al., 2012; Wiig & Secord, 1992). On average, participants in our sample had word reading skills and vocabulary knowledge comparable to that of 13- or 10-year-old L1 learners, respectively (see Table 1).

Table 1 English language learning abilities in our sample

Reading task


Thirty-six 3- to 5-sentence texts were adapted from a previous study (Hessel & Schroeder, 2020). They had been developed based on short texts from instructional websites (Word Generation, Using English, BBC Bitesize, or TOEFL test preparation materials) and checked by two native speakers of British English for acceptability (Hessel & Schroeder, 2020). To adapt the reading task for use with adolescent readers within a school lesson of 45 min, we shortened the texts (to M = 69, SD = 8 words per story) and reduced their number to 36. Furthermore, we had the texts checked by an experienced German English-language teacher who identified and replaced grammatical constructions and words beyond the language skills of our target population. Finally, we piloted the reading task with two adolescent L2 readers to confirm its suitability. The final texts cover a range of topics, including climate change, education, vegetarian diet, and pets. An example is shown in Table 2 (for a full list of stimuli, see the projects’ online repository on the OSF).

Table 2 Example story and question

Comprehension monitoring was assessed using an inconsistency manipulation which resulted from a combination of either consistent or inconsistent pre-target and target words. To reiterate, successful comprehension monitoring would then be visible in increased (re)reading of inconsistent vis-à-vis consistent target words. Pre-targets were created by replacing key words with a plausible alternative that did however not match the following target word (e.g., Spanish was replaced with Russian). As pre-targets were plausible, target words were the first inconsistent words in each text (e.g., when Spanish appeared in a text that previously talked about Russian). Pre-targets were placed at equal distance before targets (12 words and 1 sentence) and matched across conditions in lengths (< 2 letters difference) and frequency (M = 4.7, SD = 0.67 Zipf frequency for consistent and M = 4.8, SD = 0.6 for inconsistent pre-targets in the SUBTLEX-UK). Target words were high-frequency nouns or verbs (M = 4.7, SD = 0.6 Zipf frequency in the SUBTLEX-UK; Heuven et al., 2014) of on average 6 (SD = 1.4) letters length. To avoid interference through clause or sentence wrap-up, target words were followed by minimally three words. Eye movement signatures of comprehension monitoring were subsequently only analysed on the target words, allowing us to study monitoring on the same word (here, Spanish), beyond any confounds from word characteristics.

Word processing difficulty was manipulated by replacing five words in each text with a less frequent and longer alternative (e.g., by replacing want with prefer), thereby creating a difficult and an easy version of each text (see Table 3). As a reminder, increased word processing difficulty was expected to be visible in longer reading times on words and texts with higher word processing difficulty, and furthermore, to reduce signs of comprehension monitoring in the eye movement record. Words were spread across texts to create an overall difference in difficulty. Differences in word frequency were confirmed using two English-language corpora that tap main sources of our participants’ English-language input, the SUBTLEX-UK (Heuven et al., 2014) that stems from films and series and the British National Corpus that is based on written texts (2007).

Table 3 Matching of easy and difficult word processing difficulty manipulated words

We split our texts into four lists following a Latin square design where each participant read all 36 texts once, 18 in the consistent and 18 in the inconsistent condition. Inconsistency was crossed with word processing difficulty by having participants read 9 consistent texts with easy words and 9 consistent texts with difficult words, and vice versa for the inconsistent condition. To encourage reading for meaning, each text was followed by one local and one global comprehension question that required students to rate statement as either true or false. On average, participants answered about four in five questions correctly (M = 79%, SD = 3%), confirming that they had read for meaning.


The texts were presented via the Experiment Builder software (SR Research, 2009) and eye movements were recorded using a laptop-mounted Eyelink Duo eye tracker at a sampling rate of 1000 Hz. Texts were formatted in black, Consolas font, size 22, with double spacing, and on a white background. Participants read the texts silently from a Dell Latitude E5530 15-inch laptop monitor, set at a refresh rate of 60 Hz with 1920 × 1080 resolution, at a viewing distance of 55 cm. Participants’ eye movements were tracked for the whole duration of the reading task. Although our participants read with both eyes, only the right eye was monitored, unless problems (e.g., reflections on glasses) made it necessary to switch to the left eye to gain better data quality. A chin and forehead rest was used to minimize head movements. The camera was calibrated on a five-point grid and each trial started with drift corrections to make sure calibration was sufficiently precise (< 0.5° of accuracy). Calibration was repeated after each break and as necessary.


Executive control test

Executive control was measured using the automated operation span task as it is implemented in the testing software Inquisit (Inquisit4, 2015) on the Millisecond Library (Engle & Kane, 2003). In the automated operation span task, participants are asked to remember letter sequences that are presented one-by-one. Each letter is followed by a simple mathematical statement which participants have to judge to be true or false. The final absolute operation span score is accuracy-based and records the number of letters recalled within correct sequences. To adapt the automated operation span task for use with adolescents, we simplified its instructions and tested only 3 trials for each sequence length, which ranged from 2–6 letters. Additionally, we used only single-step mathematical equations with numbers ranging from 1 to 5 (Gradisar et al., 2008). To check the task’s reliability, we assessed its split-half reliability (adjusted using the Spearman-Brown prophecy formula). Cronbach’s α was 0.82, similar to the operation span’s reliabilities with adult test-takers (Engle & Kane, 2003). On average, our participants recalled 38 letters in correct sequence (SD = 12).

Word reading test

Word reading was assessed using the Test of Word Reading Efficiency 2 (TOWRE-2; Torgesen et al., 2012) where participants are asked to read out loud as many words and non-words as possible within 45 s.

Vocabulary test

Vocabulary was assessed using two subtests from the Test of Word Knowledge (TOWK; Wiig & Secord, 1992) where participants had to choose either the correct synonym (for the synonyms subtest) or antonym (for the word opposites subtest) out of three or four answer options. The synonym and opposite subtests of the TOWK proved highly reliable in our sample, with Cronbach’s α of 0.91 and 0.89, respectively.


Each participant took part in one individual and one group test session, each lasting one school lesson or 45 min. During individual sessions, participants completed the reading task: after a short instruction phase where participants also read one trial text, they read at their own pace and moved from one text to the next by pressing a button on a gamepad. During one of their two breaks from reading, participants completed the TOWRE-II. During group sessions, all participants from one class (4–10 students each time) worked side-by-side but individually on one laptop each. They completed the vocabulary test, the executive control task, and the background questionnaires. All procedures and measures were approved by the departmental ethics committee.

Eye movement pre-processing

The eye movement data was cleaned and analysed using popEye (Schroeder, 2019). During pre-processing, gaze drift was corrected semi-automatically and fixations shorter than 80 ms were first merged within 1 letter distance and then, fixations shorter than 40 ms were merged within 2 letters distance. Eye movements on inconsistency targets (e.g., Spanish) and word processing difficulty manipulated words (e.g., want or prefer) were distilled into early measures (gaze duration, or the times spent initially fixating a word, thought to tap word identification) and late measures (such as go-past time, or the time spent (re)fixating a word and its preceding context before continuing to the right; rereading time, or the time spent refixating a word; as well as regression probabilities, or the likelihood to refixate a word—all thought to reflect text-integration processes). To tap overall word processing difficulty, we additionally computed global measures that summarise reading behaviour on the entire text, both in terms of the time spent with initial reading (first-pass text reading time) and time spent rereading words (second-pass text reading time). Note that it is common in eye-tracking research to compute and analyse both early and late measures of reading as a means to provide a complete picture of the reading process—even if, as in our case, predictions for inconsistency target words focused on later measures, in particular rereading.


The data were analysed in R (R Core Team, 2021) using linear and binomial mixed effect models from the lme4 package version 1.1.21 (Bates et al., 2015). All data and the analysis script are available on the projects’ online repository on the OSF. Before running the models, the data were cleaned as follows. First, trials were deleted if data quality was too poor or if the target word had been skipped in the first run, which was the case for 3.3% and 22% of all trials, respectively. Then, extremely long and short reading times (gaze durations below 50 ms or above 800 ms; go-past time, rereading and total times above 4000 ms; whole text reading times above 30 s for first-pass and above 60 s for second-pass reading) were deleted, followed by model-based deletions based on a cut-off of 2.5 standard deviations from the person and item means for each reading time measure. This removed on average 2.2% (SD = 3.4%) of the data. Reading times were log-transformed. We then ran models with the fixed effects inconsistency (consistent vs. inconsistent), word processing difficulty (easy vs. difficult words), individual differences in executive control, and their interactions to predict reading times on both inconsistency targets (to tap comprehension monitoring) and words manipulated in their difficulty (to tap the effects of word processing difficulty). Note that these three fixed effects (inconsistency, word processing difficulty and executive control) were present in all computed models to allow us to test our predictions. Although executive control was not correlated with L2 vocabulary (r = 0.12, ns), we further wanted to ensure that the impact of executive control was independent of that of English-language vocabulary. To this end, we reran all our models while controlling for L2 vocabulary, a measure widely used as a proxy of overall proficiency (cf. de Cat, 2020; Lervåg & Aukrust, 2010). These new models confirmed all previous results answering our research questions on inconsistency, executive control and word processing difficulty, the results of which we thus continue to report in this manuscript.

Crossed random intercepts and slopes by inconsistency and word processing difficulty were included for participants and items. Item random effects were estimated by stimuli texts (for models predicting reading on inconsistency targets) or by word processing difficulty targets (for models predicting reading on the words manipulated for word processing difficulty). This random structure was only reduced when high inter-correlation and singularity caused convergence problems (Barr et al., 2013).Footnote 2 Assumptions of linearity and normality of residuals as well as homoscedasticity of variance were confirmed through visual inspection of residual plots and histograms of residuals (Winter, 2013). Effect coding and Type II model comparisons were used to determine the significance of the fixed effects using the Anova function of the car package (Fox et al., 2013). Random effects were tested using likelihood ratio chi square comparisons as implemented in the lmerTest package for linear models (Kuznetsova et al., 2017) and the anova function of the stats package (R Core Team, 2021) for binomial models. Model fit was assessed with reference to marginal and conditional R2 (Nakagawa et al., 2017). Post-hoc tests were computed using cell means models and single degree of freedom contrasts implemented in the glht function of the multcomp package (Hothorn et al., 2008).

Inconsistency main effects

We begin by reporting L2 learners’ reading of consistent and inconsistent target words as indication of their comprehension monitoring. The data are summarized in Tables 4 and 5 and model results are summarized in Tables 6 and 7. The models showed that inconsistent targets were reread for longer and more often than consistent targets, confirming that L2 learners had monitored their comprehension on inconsistent words during their reading (as evident in longer rereading and total times and higher rereading and regression in probabilities, see Fig. 1). Other than predicted, however, our adolescent readers in the current study did not read inconsistent targets longer in early reading times (inconsistency effect ns for gaze duration and go-past time).

Table 4 Model means for reading times of target words by inconsistency, word processing difficulty and executive control
Table 5 Model means for reading times of word processing difficulty manipulated words (e.g., written or prepared) and whole texts by inconsistency, word processing difficulty and executive control
Table 6 Models predicting the reading of target words (e.g., Spanish) from inconsistency, word processing difficulty, and executive control
Table 7 Models predicting reading of word processing difficulty manipulated words (e.g., written or prepared) and whole texts from inconsistency, word processing difficulty and executive control
Fig. 1
figure 1

Main effect of inconsistency on the reading of target words. Reading times are model means in ms. Note Results stem from mixed effect models that predicted reading times from the fixed effects inconsistency, executive control, word processing difficulty, and their interactions

Word processing difficulty main effects

Word processing difficulty affected L2 learners’ reading of manipulated words as well as their reading of the whole texts. As intended, difficult words and texts containing them were read for longer than easy words, both in early and late reading measures (as evident in longer gaze durations and longer go-past, rereading, and total times as well as first-pass text reading). They also received more rereading and direct regressions in, as summarized in Fig. 2.

Fig. 2
figure 2

Word processing difficulty main effects on the reading of word processing difficulty manipulated target words. Reading times are model means in ms. Note Results stem from mixed effect models that predicted reading times from the fixed effects inconsistency, executive control, word processing difficulty, and their interactions

Interactions between word processing difficulty and inconsistency

We also examined how word processing difficulty influenced L2 learners’ monitoring on inconsistencies. Against our predictions, the two variables did not interact (interaction effects were not significant in go-past time, rereading time, and rereading and regression in probability), despite clear main effects of inconsistency and word processing difficulty in all these reading measures.

Effects of individual differences in executive control

We also investigated the effect of individual differences in executive control. While we found no main effects, there were clear interactions with both inconsistency and word processing difficulty that indicated a relationship of comprehension monitoring with word processing difficulty and executive control. We explored these interactions by evaluating predicted reading times for adolescents at 1 SD above or below the sample mean of executive control.Footnote 3 This approach allowed us to study individual differences based on the entire variation of our continuous variable without having to assign participants to groups (Halekoh & Højsgaard, 2009), which would be associated with a loss of data precision (Gelman & Hill, 2007). For ease of reading, we will however refer to these model-based estimates as the reading times of adolescents with stronger (+ 1 SD) or weaker executive control (− 1 SD), respectively.

We found several two-way interactions of executive control with inconsistency or word processing difficulty which showed that in general, adolescents with weaker executive control were more strongly affected by inconsistencies and difficult words. This was visible in them taking longer to read texts that contained both inconsistencies and difficult words (z = 2.5, p < 0.05 for first-pass text reading), to read difficult words (z = 2.1, p < 0.05 for go-past time and, marginally for total times, z = 1.9, p = 0.058) or difficulty manipulated words in inconsistent texts (z = 2.1, p < 0.05 for rereading time) vis-à-vis their peers with stronger executive control. They also took longer to read texts that contained both inconsistent and difficult words (z = − 2.1, p < 0.05 for first-pass text reading), as confirmed by a three-way interaction on overall text reading.

Further three-way interactions indicated that adolescents adapted their rereading of inconsistent information differently to easy or difficult texts, depending on their executive control (see Fig. 3): adolescents with weaker executive control took significantly longer to read inconsistent as opposed to consistent targets only in texts with easy words (z = 2.4, p < 0.05 for rereading time, z = 2.3, p < 0.05 for rereading probability, and z = 3.2, p < 0.01 for total time), but not in difficult texts (all effects ns). Adolescents with larger executive control, however, did the exact opposite: they regressed more directly into target words (z = 2.1, p < 0.05) and reread inconsistencies for longer when texts were difficult, as visible both in significant interactions on rereading time (z =  − 2.7, p < 0.01), rereading probability (z =  − 2.7, p < 0.01), and total time (z =  − 2.3, p < 0.05). This interaction effect seemed almost reversed on texts containing easy words, but this difference remained non-significant across all reading measures. In summary, while adolescents with weaker executive control showed the expected smaller adaptations to inconsistencies under increased word processing difficulty, adolescents with stronger executive control adapted their rereading more (instead of less) clearly to inconsistencies when reading difficult texts, indicating that comprehension monitoring under increased word processing difficulty varied by executive control abilities.

Fig. 3
figure 3

Interaction between word processing difficulty, inconsistency and executive control on the (re)reading of target words. Reading times are model means in ms. Note Results stem from mixed effect models that predicted reading times from the fixed effects inconsistency, executive control, word processing difficulty, and their interactions


We had investigated how adolescents’ comprehension monitoring in the L2 arises in the interplay of word processing difficulty (manipulated on the text-level) and individual executive control abilities. We had expected successful comprehension monitoring to show in increased rereading of inconsistent vis-à-vis consistent information, indicating re-analysis. We had further expected that higher word processing difficulty would reduce this re-analysis, in line with the bottleneck idea that effortful word processing limits higher-level processing (Hessel & Schroeder, 2020; Perfetti & Stafura, 2014). Finally, we were interested in seeing whether adolescents’ comprehension monitoring on inconsistent words (and the way it decreased with difficult word processing) would also vary with adolescents’ executive control, which would indicate an additional influence of active, reader-initiated processing (van den Broek & Helder, 2017). The overall emerging picture is that while increased word processing difficulty can reduce comprehension monitoring in the L2, sufficient executive control allows to compensate and even reverse this influence: as predicted, we found clear signs of adolescents’ comprehension monitoring in slowing down on and rereading inconsistent information. We further saw clear signs of increased word processing difficulty on words manipulated in their word processing difficulty and in overall text reading times. Other than predicted, however, not all adolescents reduced their comprehension monitoring under higher word processing difficulty. Instead, we found that while adolescents with weaker executive control monitored their comprehension less clearly under high word processing difficulty, adolescent with stronger executive control instead monitored their comprehension more clearly under higher difficulty, indicating a switch to active, reader-initiated monitoring on demanding texts. In the following, we consider these key results in relation to theories and previous evidence.

We had measured comprehension monitoring online, by recording adolescents’ eye movements on inconsistencies. We found clear signs that adolescents were able to monitor their comprehension when reading in the L2, as visible in increased rereading of inconsistent information. This finding is akin to other evidence with children and adults reading in their L1 (Connor et al., 2015; Hessel et al., 2021; Joseph et al., 2008; Rayner et al., 2004) and extends the evidence base on online monitoring in the L2 by adding detailed insights into adolescent readers (as previous L2 studies have either focused on younger children or adults; Hessel et al., 2021; Hessel & Schroeder, 2020; or been limited to less detailed reading time data from self-paced reading; Mulder et al., 2021). On a more general note, this finding could be considered encouraging news for all language practitioners as it indicates that adolescent learners can reliably process information from expository texts, even when reading in their L2.

We had also examined under which conditions students would be able to monitor their comprehension successfully. To find out, we had purposefully altered word processing difficulty across texts. As we had expected, texts designed to trigger higher word processing difficulty (that is, those that contained longer and less frequent and thus more difficult words) slowed down adolescents’ reading. These results are in line with previous studies on word length and frequency and show that our manipulation of word processing difficulty worked as intended (Gagl et al., 2015; Inhoff & Rayner, 1986; Joseph et al., 2009, 2013; Juhasz & Rayner, 2006; Tiffin-Richards & Schroeder, 2015). However, higher word processing difficulty did not influence comprehension monitoring in the predicted way. As a reminder, we had expected that all students would be less able to monitor their comprehension under higher difficulty due to a processing bottleneck, which would have been visible in reduced rereading of inconsistencies (Perfetti & Stafura, 2014). However, only a sub-group of students showed this predicted reading pattern, namely, those with relatively weaker executive control. For those adolescents, higher word processing difficulty limited their monitoring, as visible in them adapting their rereading less to inconsistent information when text contained difficult words. In line with the Reading Systems Framework (Perfetti & Stafura, 2014) and previous evidence with adults (Hessel & Schroeder, 2020), we can say that for adolescents with weaker executive control, less efficient word processing appears to have started a chain-reaction that ended in reduced higher-level coherence building.

For adolescents with stronger executive control, however, the relationship between word processing difficulty and monitoring was reversed: they engaged in more (instead of less) pronounced monitoring when word processing was difficult, as visible in increased rereading of inconsistencies. While going contrary to the mechanisms described in the Reading Systems Framework (Perfetti & Stafura, 2014), this reading pattern aligns with theories that emphasise active processing, such as the Reading Framework (van den Broek & Helder, 2017). If it is true that readers switch from passive to active processing in the face of insufficient comprehension (van den Broek & Helder, 2017), the observed pronounced monitoring on difficult texts could be adolescent’s active response to increased comprehension demands. However, such active repair seems to only have been available to L2 readers with stronger executive control.

Note that numerically, adolescents with stronger executive control furthermore appeared to spend more time rereading consistent information when reading easy texts (rather than inconsistencies, as predicted). Importantly, this difference failed to reach significance and is thus hard to interpret without corroborating evidence. With that caveat in mind, we would like to suggest, however, that if this effect was found to be true, one explanation for it could be that high executive readers may have felt overly confident in their comprehension of easy texts. This could have led to shallow processing and overlooking coherence breaks, as previously found for metacognitive monitoring (Ikeda & Kitagami, 2013). However, this suggestion remains purely speculative without further supporting evidence.

Our results only partially replicate previous ones: in another study with comparable reading materials, more difficult word processing reduced readers’ monitoring in a clear two-way interaction (Hessel & Schroeder, 2020). This was not the case in the current data. We would like to suggest two explanations for this incongruence—one based on reader, the other on task differences. Regarding reader differences, the adolescents in the current study where less skilled readers than the adults in the previous study (as visible in fewer years of English language experience (about 4 years for adolescents as opposed to 13 years or more for adults) and lower word reading abilities (cf. the average score of 121 for adolescents as opposed to 136 for adults). For adolescents, reading in English was thus plausibly more difficult. On the one hand, more effort overall could have arguably increased (and not decreased) the impact of word processing difficulties on monitoring in the current study (Perfetti & Stafura, 2014), making our results seem even less intuitive. On the other hand, increased difficulties would arguably also increase the likelihood of active processing (van den Broek & Helder, 2017)—which could explain why those adolescents capable of it switched to an active processing style in the current study. Differences in reading tasks may have additionally supported such a switch, as task effects on reading and rereading in particular are well documented in adults (Kaakinen & Hyönä, 2005; Kaakinen et al., 2003; Weiss et al., 2018) and young readers (Kaakinen et al., 2015). While in the previous study, adults had only been asked to recall texts (Hessel & Schroeder, 2020), adolescents had now answered local and global comprehension questions. Recall has been found to tap shallower comprehension compared to multiple-choice questions (Cao & Kim, 2021). Plausibly, the reading task in the current study thus triggered higher comprehension standards. Together with higher comprehension difficulties, these increased standards could expectably inspire active processing in those readers who are capable of it (van den Broek & Helder, 2017), just as we had observed.

At this point, three aspects of our study must be acknowledged that limit the conclusions we can draw. First, as for any school-based study, our sample size was limited by response levels as well as the occasional technical and data quality issues that result from testing at schools. All our results and conclusions would benefit from replication in another, larger sample. Second, while we investigated the impact of higher word processing difficulty experimentally, we tapped executive control through individual differences. The latter thus remains subject to possible entanglements with related individual differences (Castles et al., 2018) until tested in a targeted experimental manipulation (such as in de Bruïne et al., 2021). Finally, we tested comprehension monitoring by observing adolescents’ (re)reading of textual inconsistencies. While the inconsistency paradigm is a widely used approach in online studies of comprehension monitoring (Connor et al., 2015; Hessel et al., 2021; Joseph et al., 2008; Rayner et al., 2004) and while there is widely accepted evidence in favour of rereading as a critical support of comprehension (Mulder et al., 2021; Schotter et al., 2014), it is only fair to say that understanding the rereading of inconsistences as online monitoring remains an interpretative step that relies on the evidence from these previous studies as its foundation.

More generally, our findings contribute to a more nuanced view of what adolescents need to engage in successful comprehension monitoring in their L2. Specifically, ease of word processing is often highlighted as the key driver of higher-level processing (Perfetti & Stafura, 2014) while complementary theories predict additional situation-specific support through active processing (van den Broek & Helder, 2017). Yet, we know little about the tipping point at which readers shift from passive to active processing during moment-to-moment monitoring. Our study illustrates that exact tipping point by showing how comprehension monitoring in the L2 may be reduced under higher word processing load, while adolescents can compensate these effects through active reading given sufficient executive control. Future work could hone in further on this interplay of active and passive reading, for example by observing monitoring across reading tasks that yield varying comprehension standards, word difficulty, and executive control load.

For teaching, our results hold both good news and new directions. The good news is that overall, adolescents who read expository texts in their L2 are capable of monitoring their comprehension from moment to moment—an important message in times of the rise of content teaching in the L2 worldwide (Dearden, 2015). As for new directions, our findings indicate that comprehension monitoring in the L2 is carried by both fluent word processing and active monitoring. In principle, this gives teachers a greater toolkit that includes both vocabulary (Beck et al., 1982) and active monitoring exercises (Burton & Daneman, 2007). We believe that our understanding of higher-level reading comprehension will benefit from more insights into the complex interactions between active and passive processing, as we have provided them here for adolescents’ comprehension monitoring when they read in the L2.