Introduction

Processing text information for learning purposes can be highly influenced by learning activities, such as question-answering. Questions can be understood as specific relevance instructions that help students generate reading goals, as well as access, locate and retrieve relevant resources of information in a systematic and strategic way (Cerdán et al., 2009; McCrudden & Schraw, 2007). These mental processes can be very helpful for learning purposes. Thus, questions can guide students on what text information to focus on and challenge them to infer relations between queried and answered information (Dirkx et al., 2015; Olson et al., 1985). Given the impact that questions have on text processing, when to present them in a question-answering scenario becomes an important issue.

The student’s processing may be different when questions are presented while reading (i.e., inserted questions) or after reading the whole text (i.e., post-reading questions), and these differences in processing may affect the students’ final learning outcome. In this regard, some studies have analyzed the effect of question timing on students’ comprehension and learning (Andre & Womack, 1978; Cerdán et al., 2009; Kapp et al., 2015; Peverly & Wood, 2001; Philips et al., 2020; van den Broek et al., 2001; Weinstein et al., 2016), but they reveal no clear findings. While most of the researchers found that inserted questions contributed to greater understanding and learning compared to post-reading questions, some of them found no effect of question timing on students’ learning. In addition, no study has recorded and examined in detail how students process the text and the questions under each condition. Thus, the goal of this study is to investigate how the timing of comprehension questions (i.e., inserted versus post-reading) influences college students’ text processing and learning of complex conceptual knowledge.

Our study compares a condition in which the adjunct questions were presented after reading the whole text with another condition in which the questions were inserted in the text after reading question-relevant segments. In both cases, students had the text available while answering the questions. We used two science texts with challenging ideas about natural phenomena (e.g., the differences between heat and temperature) and recorded online processing. Our procedure is inspired by the paradigm of adjunct question research. Several reviews concluded that adjunct questions have positive effects on students’ learning (Hamaker, 1986; Hamilton, 1985; Roediger & Karpicke, 2006), the so-called adjunct question effect. We assumed that inserted questions might improve the students’ learning over post-reading questions, as they may support relevant text processing and inference making while building a mental representation of the text.

The role of adjunct questions for prose learning

Answering adjunct questions with an available text is a frequent learning activity in classrooms (Ness, 2011). Three main design features of adjunct questions research were generally assumed to be related to learning (Hamaker, 1986). The first one is the cognitive level of adjunct questions. Several taxonomies have been proposed (Goldman & Duran, 1988; Ozuru et al., 2007; Tawfik et al., 2020), but all of them share the distinction between low- and high-level questions. Most studies conclude that high-level adjunct questions improve deep comprehension and learning (Cerdan et al., 2009; Jensen et al., 2014), whereas others claim that a mix of question types offers the best results (Agarwal, 2019): while low-level questions focus the reader's attention on the recall and understanding of specific text ideas, high-level questions require the reader to analyze, apply, or evaluate textual information. The second feature is the issue of question placement. Recent research has focused on comparing inserted questions versus presenting questions after reading the whole text. This research yields mixed results. Whereas many studies found that inserted questions were more effective than massed post-questions (Kapp et al., 2015; van den Broek et al., 2001; Peverly & Wood, 2001; Philips et al., 2020), other studies found that both produced comparable benefits on final learning (Uner & Roediger, 2018; Weinstein et al., 2016). The last feature affects the relation between the adjunct and the transfer test. The adjunct question effect was mainly found on the retention of related information asked in adjunct questions, but it is rarely observed on the retention of unrelated information (Chan et al., 2006; Dirkx et al., 2015; Hamilton, 1985; van den Broek et al., 2001).

Apart from the discrepancies pointed out earlier, adjunct questions research has some limitations. First, most of the classic research did not use long texts requiring the construction of a coherent mental model of the whole passage (e.g., Hamaker, 1986; Hamilton, 1985). When adjunct questions research used long texts, the different text sections were relatively disconnected one from the other, so understanding a section was not necessary to understand any other (e.g., Cerdán et al., 2009; Uner & Roediger, 2018). Second, few studies allowed students to reread the texts while answering the questions, which differ from usual conditions in school settings (Ness, 2011). Third, most of the questions used were quite limited in scope; for instance, low-level questions referred to proper names, places or similar, which demanded extremely superficial processing, but not understanding ideas that required semantic processing (e.g., Weinstein et al., 2016). Further, most of the higher order questions required students to understand ideas explicit in the texts, but not to apply text information to a new situation or to make deep inferences (e.g., van den Broek et al., 2001). Finally, no studies have collected online processing measures that could explain the effect of the question timing. Our study was designed to overcome these important limitations.

Constructing text meaning for learning: how adjunct questions modify processing strategies

It is assumed that understanding rests in the construction of a coherent text representation that involves representing the explicit text information (i.e., text-based representation) plus incorporating text ideas into the reader’s prior knowledge, called a situation-model level of understanding (Kintsch, 1998). Considering that learning results from the modification of the readers' knowledge structures for the domain, we can accept that it occurs when readers reach an appropriate situation model level of understanding (Coté et al., 1998).

According to these ideas, when constructing meaning from a text, readers may use processing strategies such as paraphrasing and elaborations (Coté et al., 1998). Paraphrases remain close to the explicit text meaning and are relatively superficial levels of comprehension, whereas elaborations involve generating text-connecting inferences and going beyond the text by integrating prior knowledge (McNamara, 2004). Therefore, paraphrasing may lead to a good-quality text-based, whereas inferences are needed to achieve deep comprehension and learning (Coté et al., 1998). However, although both processing strategies contribute to text comprehension, incorrect paraphrases have been associated with lower comprehension levels. Incorrect elaborations are compatible with the text-based level of understanding, although they do not contribute to deep comprehension (McNamara, 2004).

When reading expository texts, engaging in deep comprehension processes is not common, not even for college students (Endres et al., 2017; Linderholm & van den Broek, 2002). In this context, adjunct questions are frequently used to help the students construct a coherent mental representation of the text meaning by promoting inferences during and after reading. To understand how questions influence comprehension, it is important to place question-answering processes within the goal-focusing model (McCrudden & Schraw, 2007). This model explains how questions affect text processing, as they may be considered relevance processing instructions. When reading a question, readers form a task model in their mind, which is a representation of the question goal and the means to achieve it (Rouet et al., 2017). This representation guides the student in finding question-relevant information and strategically processing the text (Cerdán & Vidal-Abarca, 2008; Dirkx et al., 2015). Reading (or rereading) the text with a goal in mind increases the accessibility of relevant background knowledge (McCrudden & Schraw, 2007), making readers more likely to elaborate on text information and increasing the likelihood of deep comprehension.

Assuming that the benefits of adjunct questions rely on their ability to direct the students’ attention to relevant information, inserting questions after reading one or two paragraphs (i.e., while constructing a representation of that information) would have more profound effects on the processing strategies than questions placed after reading the whole text, when the representation has already been constructed and is more resistant to modifications (van den Broek et al., 2001; van Oostendorp & Goldman, 1999). However, although it is clear that adjunct questions have an impact on text processing, no studies have explored how the position of these questions (i.e., inserted vs. post-reading) may modify the processing strategies underlying learning.

Reading behavior when answering adjunct questions

Comprehension processes can be passive (i.e., automatic) and reader-initiated (i.e., controlled). Passive processes always take place through the unrestricted spread of activation mechanisms, but reader-initiated processes operate in a restricted manner as a function of the reader’s standards of coherence and the information returned from passive processes (van den Broek & Helder, 2017). A reader’s standards of coherence are the (often implicit) criteria that a reader has for what constitutes adequate comprehension and coherence in a particular reading situation (van den Broek et al., 2011; van den Broek et al., 1995). Standards can be modified by specific reading instructions. For instance, providing instructions for learning from a text elicits more processes for building coherence (e.g., connecting text ideas and elaborative inferences) than reading for entertainment (van den Broek et al., 2001). Answering adjunct questions with the text available may also modify the standards of coherence as they may lead to specific reader-initiated processes (e.g., rereading decisions while searching the text).

Several studies have found that rereading decisions while searching predict text comprehension (Cerdán et al., 2009, 2011; Gil et al., 2015; Mañá et al., 2009; Máñez et al., 2022). For example, Mañá et al. (2009) found that the number of visits to relevant information and its use explained significant variance in question-answering performance beyond general comprehension skills. Similarly, Máñez et al. (2022) found a strong correlation between question-answering performance and the percentage of relevant information selected to answer the questions.

Nevertheless, reading and search processes while question-answering may be different depending on question timing (McCrudden & Schraw, 2007). When answering inserted questions, the short delay between reading and questioning facilitates the access and retrieval of text information (e.g., Carrier & Fautsch-Patridge, 1981; Hamaker, 1986; Rickards & Di Vesta, 1974; Schumacher et al., 1983). This may make students spend a shorter time searching the text, given that the relevant information is still active in their memory. Furthermore, students can become very efficient in locating and selecting the relevant ideas due to the proximity between the question and the initial reading of the text. In this sense, some studies have found that inserted questions lead to increases in reading time for question-relevant information (Lapan & Reynolds, 1994; Reynolds et al., 1979). However, when answering post-reading questions, the interval between reading relevant information and answering the question may lead to that information not being active in the reader’s memory anymore. Consequently, readers might spend more time rereading the text to find the relevant information and their search processes could be relatively inefficient, making them waste time reading information that is not relevant to the questions’ goal.

Something similar may occur while task model formation, i.e., when reading an adjunct question for the first time, which involves representing the question goal and a set of means for achieving the goal (Rouet & Britt, 2014). Students not only have to understand the question but also build a plan to give a response, which implies recalling the relevant text information or developing a plan to search for question-relevant content (Mañez et al., 2022). Therefore, question timing may affect the time students spend on task model formation. While answering inserted questions, students might still have in memory the information relevant for task model formation; however, when the questions are presented at the end of the passage, the question-relevant information is no longer active in the students’ memory, making it more difficult to represent the question goal and the means to achieve it.

Finally, question timing may also affect the initial reading of the text. Several experiments have found that junior high school students and college students read expository texts relatively quickly when they expect to have the text available to answer post-reading questions (Ferrer et al., 2017; Higgs et al., 2017). Readers may believe that a superficial reading of the text will be enough to form a schema of text information to be used while searching for relevant information. However, when answering inserted questions, students may read the text slowly and more carefully, which may especially favor the generation of inferences between relevant text ideas while reading the text (Olson et al., 1985). That is, under this condition, students may be likely to adopt studying as their primary task, rather than answering the adjunct questions (Hamaker, 1986). In addition, this sort of reading may make searching more efficient in comparison to post-reading question condition since text information is still active in the reader’s memory.

The current study

This study aims to examine the effect of question timing (i.e., inserted in the text or presented after reading the whole text) on students’ processing strategies and online reading behavior when studying a long passage composed of many interconnected text ideas. Thus, we used a between-subjects design. We assumed that inserting the questions would assist readers while text processing compared to post-reading questions. First-year college students studied two science texts while answering inserted versus post-reading adjunct questions. Online processing data while students read the texts and answered the questions were recorded. Five days later, students’ learning was assessed using a test with short-answer questions closely related, but not identical, to the information covered by the adjunct questions.

We formulated five hypotheses regarding the effect of the two experimental conditions on students’ text processing and final learning. First, we expected readers in the inserted question condition (hereafter referred to as inserted condition) to spend more time reading the text initially than those in the post-reading question condition (hereafter referred to as post-reading condition) (H1).

Second, students in the post-reading condition would spend more time reading the questions for the first time than students in the inserted condition, as the task-model formation processes will be more time-consuming in the former (H2).

Third, we expected that students in the inserted condition would allocate their resources more efficiently during the search process than students in the post-reading condition. More specifically, we expected an interaction effect between condition and relevance of rereading (H3a). The interaction is expected to result from two different patterns: first, students in the inserted condition will spend more time rereading question-relevant information than post-reading students, whereas the opposite would be true for reading non-relevant information; second, students in the inserted condition will spend more time rereading relevant information than non-relevant information, whereas the opposite would be true for students in the post-reading condition. We also predicted that students in the post-reading condition would reread more text segments to look for question-relevant information in comparison to students in the inserted condition, which is indicative of inefficiency in the search process (H3b).

Fourth, regarding the processing strategies used when answering the questions, we predicted that students in the inserted condition would make more correct elaborations and fewer incorrect elaborations than students in the post-reading condition (H4a), due to the better task model formation and search processes in that condition. However, no significant differences between the two experimental conditions were expected regarding correct and incorrect paraphrases (H4b).

Fifth, as a consequence of the processing mentioned, the inserted condition was expected to be more effective than the post-reading condition for learning (H5).

Method

Participants and design

The total sample consisted of 84 freshmen from the Faculty of Teacher Training at the University of Valencia, Spain. We excluded eight students because of missing sessions. The final sample included 76 participants (M age = 18.89, SD = 2.41; 76.30% female): 39 for inserted condition and 37 for post-reading condition. At least 34 participants per condition were needed to detect a medium-to-large effect size (Cohen’s d = 0.81), at an alpha error of .05, and statistical power of 0.95. This effect size was based on studies with similar outcomes (e.g., van den Broek et al., 2001). All participants were native Spanish speakers.

Participants were randomly assigned to one of the two experimental conditions, ensuring equality in prior knowledge between conditions, measured with a test (see below). No difference in prior knowledge between inserted condition (M = 8.46; SD = 4.64) and post-reading condition (M = 8.59; SD = 4.58) was apparent, t(74) = 0.13, p = .900, d = 0.03.

Materials

Materials included a test on prior knowledge, texts and adjunct questions, and a transfer test. These materials were validated in previous studies (e.g., Máñez, 2020).

Test on prior knowledge

It included 30 items about science with three options (i.e., True/False/I don’t know). Items were about general knowledge in science (e.g., “Density is the proportion of mass to volume”), and specific knowledge related to both text topics, i.e., Atmospheric Pressure and Heat Transmission, not included in the texts (e.g., “The thermometer is the instrument with which we measure heat”). Cronbach’s alpha revealed reasonable reliability, α = .74.

Texts

Two science texts about Atmospheric Pressure and Heat Transmission were used. The text order was counterbalanced. Although the students had some prior background knowledge (since the topics were addressed in the secondary school curriculum), we selected them for being sufficiently challenging for students to still struggle to comprehend. Atmospheric Pressure text included 965 words in length distributed in four sections: the weight of the air, the Torricelli’s experiment and the discovery of the barometer, the influence of altitude and temperature on atmospheric pressure, and the origin of the wind and its displacement. Heat Transmission text consisted of 895 words in length distributed in two sections: differences between heat, temperature and internal energy, and the thermal conductivity plus the different thermal sensations depending on the material type. A group of experts divided both texts into segments by idea-units, with 37 segments for the Atmospheric Pressure text and 26 segments for the Heat Transmission text. A segment may include only one sentence, e.g., Atmospheric pressure is the force exerted, at a given point, by the weight of the column of air extending above that point, up to the upper limit of the atmosphere”, or several sentences closely related by the unit-idea, e.g., “The Earth is surrounded by a layer of gases that separates it from the space that constitutes, for the most part, the Universe. This layer is called the atmosphere and is made up of a mixture of gases that we call air” (see Appendix 1).

Adjunct questions

We developed a set of open-ended questions referring to the above-mentioned ideas. After testing the questions in a pilot study, we selected five high-level (e.g., “If Torricelli’s experiment were replicated at the top of Everest, would more or less mercury come out of the tube into the bucket? Why?”) and five low-level questions (e.g., “What happens when there are high-pressure air masses next to low-pressure air masses?”). Please, note that low-level questions required understanding text ideas, rather than identifying factual information (e.g., names, locations, etc.), whereas high-level questions required making inferences by applying text information to new situations not described in the text. The distribution was three low-level and two high-level questions for “Atmospheric Pressure”, and the opposite for “Heat Transmission”. In the inserted condition, both types of questions were inserted immediately after reading all the information needed to answer the question. However, the last segment read does not provide all the information to answer the question, especially in high-level questions, in which the relevant information is located in several segments that may not be contiguous. Both the text and the questions have been used in previous experiments with good results.

Transfer test

It included 10 open-ended questions that addressed the same key information as the adjunct questions, so they were near transfer questions (e.g., “In which direction will the wind move when there are nearby areas with different atmospheric pressure?” and “Someone says to you: If you replicated Torricelli’s experiment on top of a mountain would less mercury come out of the tube. Would you agree, and why?”). There were five high- and five low-level questions with the same distribution that the adjunct questions.

Measures

The measures are divided into three categories: processing strategies, online reading behavior, and transfer test. Both processing strategies and transfer test were manually coded by two examiners. After several training sessions, they independently coded approximately 15% of the sample. The first examiner continued coding the remaining responses after resolving disagreements. It should be noted that data from both texts were aggregated to obtain the measures.

Processing strategies

We distinguished between paraphrases and elaborations in the students’ responses to adjunct questions. Paraphrases were counted when students’ responses included idea-units from the text. A paraphrase would be correct if the idea of the text was correctly reported, but it would be incorrect if the idea-unit implied misunderstanding. Elaborations referred to idea-units not present in the text. Depending on the meaning, elaborations were also coded as correct. Please note that a student’s response may include a combination of correct and incorrect paraphrases and elaborations. The total number of paraphrases and elaborations were computed by adding correct and incorrect strategies. For example, “Atmospheric pressure is a force (1st correct paraphrase) caused by the weight of the column of air on a point (2nd correct paraphrase). The amount of pressure will depend on the length of the air column (1st correct elaboration)”. The students’ responses may also be coded as non-analyzable (NA) when they were too short, incomplete, or with incongruent meaning. Cohen’s kappa indicated a high inter-rater agreement (κ = .81, p < .001).

Online reading behavior. We used these indices:

  1. (a)

    total number of text segments while searching is the sum of unmasked segments in search time to answer the total number of questions. Each segment may have been counted more than once in case it has been consulted by the student several times.

  2. (b)

    time reading the questions for the first time is the sum of the time spent during the first unmasking of each question.

  3. (c)

    time reading the text for the first time is the time spent prior to accessing the questions. For the post-reading question condition, this measure corresponds to the time reading text segments before moving to the question screen. For the inserted question condition, this time corresponds to the sum of the time reading the different text segments before accessing each question.

  4. (d)

    Other measures are time rereading relevant information and time rereading non-relevant information while searching to answer the total number of questions. The protocol for extracting times by rereading relevant information is adapted from Vidal-Abarca et al. (2010). In the output file provided by Read&Learn, every text segment was previously classified as relevant or non-relevant information for each question depending on its goal. A segment includes one or several sentences that are unmasked at a time. Therefore, the system registers its rereading as relevant or not relevant depending on the question. Consequently, a segment may be reread before answering questions 1 and 2, but be only relevant for question 1. Therefore, the time spent on this segment would be classified as time rereading relevant information for question 1 and as time rereading non-relevant information for question 2. The measure used here is the sum of relevant and non-relevant search time for each question, taking into account which information is key or not depending on the question target.

Learning

It refers to the students’ outcome on the transfer test. Each correct response scored 1 point; partial responses scored 0.5; and incorrect 0, so the maximum was 10. The coding was carried out using an answers key developed by the experts in the field. Cohen’s kappa indicated a high inter-rater agreement (κ = .85, p < .001).

Apparatus

Adjunct questions were answered on a computer web-based system called Read&Learn (Vidal-Abarca et al., 2018). Students in the inserted condition completed the task on a single screen because questions were inserted in the text (see Fig. 1), whereas text and questions were displayed on two separate screens for the post-reading condition (see Fig. 2a, b). Please note that text and questions displayed in Figs. 1 and 2 were masked. To read a masked segment, students had to click on it. This segment remained unmasked until the participant clicked on another segment, then the previous segment was masked again. Through this masking procedure, it was possible to record the students’ reading actions. In both conditions, students could reread or search for information in the text at any time.

Fig. 1
figure 1

Screenshot of inserted question condition: text screen and question unmasked

Fig. 2
figure 2

Screenshots of post-reading question condition: (2a) Text Screen With a Text Segment Unmasked; (2b) Question Unmasked

Procedure

The experiment was conducted in three different sessions with a time limit. In the first session, students completed the test on prior knowledge in a paper-and-pencil format. In the second session (i.e., study phase), participants read the two texts and answered the adjunct questions. Before that, students were instructed on how to use Read&Learn for the experimental task. Whereas students in the post-reading condition were asked first to read the text and then to answer the questions, students in the inserted condition were instructed to read and answer questions in a continuous and sequential way. After 5 days, students completed the transfer test in a paper-and-pencil format (i.e., assessment phase).

Data analyses

Statistical analyses were conducted using R (R Core Team, 2019). Normality assumptions were tested using the Shapiro–Wilk test. Homogeneity of variance assumption was further considered using Levene tests when normality of the response variable was met, and Fligner Killeen tests were employed otherwise. Assumption of multivariate normality was examined with Shapiro–Wilk tests of multivariate normality and homogeneity of covariance matrices assumption was tested using Box’s M test. If assumptions were not met, robust estimation was conducted (Wilcox, 2013).

Descriptive analyses and Spearman correlations were performed as preliminary analyses. Mixed ANOVAs and Student’s unpaired t test, or their robust equivalents were performed to test the differences between inserted and post-reading conditions in processing strategies, online reading behavior, and learning. Effect sizes were also computed, concretely partial eta squared for ANOVA and Cohen’s d (d; Cohen, 1988). The robust Cohen’s d (dR; Algina et al., 2005) was provided in case assumptions for unpaired t-test were not met.

Additional to system library, R packages employed in the analyses were: car (v3.0–10; Fox & Weisberg, 2019), effsize (0.8.1; Torchiano, 2020), WRS2 (1.1–1; Mair & Wilcox, 2020), reshape2 (1.4.4; Wickham, 2007), rstatix (0.6.0; Kassambara, 2020) and heplots (1.3–8; Fox et al., 2018).

Results

Descriptive statistics and correlations of measures

Table 1 presents the descriptive statistics of the investigated variables for each question condition. Furthermore, Table 2 shows the correlations between processing strategies, online reading behavior, and learning. The results revealed significant correlations between processing strategies and learning in the post-reading condition, being positive in the case of correct paraphrases and elaborations and negative with incorrect paraphrases and elaborations. In addition to these correlations, in the inserted questions condition the time reading the text for the first time correlated significantly and positively with learning, while time rereading non-relevant information correlated negatively.

Table 1 Descriptive statistics of the measures
Table 2 Spearman correlations of the measures

Effect of question timing on students’ reading behavior

A t test was conducted to examine differences in time reading the text for the first time as a function of experimental conditions (H1). Shapiro–Wilk normality test results was W = 0.98 (p = .140). Levene’s test indicated equal variances (F = 0.04, p = .852). Results were consistent with the hypothesis H1. Readers in the inserted condition (M = 693.85, SD = 332.32) spent more time reading the text initially compared to readers in the post-reading condition (M = 493.50, SD = 310.96), t(74) = -2.71, p = .008, d = − 0.62.

To test whether students in the post-reading condition would spend more time reading the questions for the first time than students in the inserted condition (H2), a robust t test was conducted. Shapiro–Wilk normality test results was W = 0.92 (p < .001). Fligner-Killeen test yielded a statistically non-significant result regarding homogeneity of variance between experimental conditions for time for the first reading of the question (FK(1) = 1.61, p = .204). Results showed statistically significant differences between post-reading condition (M = 86.59, SD = 29.40) and inserted condition (M = 74.38, SD = 24.67), t(53.71) = 2.01, p = .049, dR = 0.43, which was consistent with our prediction.

Regarding relevance of rereading, Shapiro–Wilk normality test results were W = 0.95 (p = .003) for relevant time and W = 0.85 (p < .001) for non-relevant time. Fligner-Killeen test yielded a statistically non-significant result regarding homogeneity of variance between experimental conditions for time rereading relevant information (FK(1) = 1.15, p = .287) and non- relevant (FK(1) = 3.11, p = .07). To test whether inserted condition would lead students to spend more time rereading relevant text information than post-reading condition (H3a), we conducted a robust two-way mixed ANOVA, with relevance of rereading (i.e., time reading relevant and non- relevant information while searching the text) as a within-subjects variable, and question-answering condition (inserted, post-reading) as a between-subjects variable. Results showed a significant interaction effect between question condition and relevance of rereading, F(1, 32.57) = 15.49, p < .001 (see Fig. 3). Students in the post-reading condition spent more average time rereading non-relevant information (M = 310.09, SD = 258.21) than relevant information (M = 208.66, SD = 139.05), whereas students in the inserted condition spent more average time rereading relevant information (M = 244.62, SD = 146.77) than non- relevant information (M = 183.45, SD = 154.69). In addition, students in the inserted condition spent more time rereading relevant information than post-reading students, whereas time rereading non-relevant information was higher in the post-reading condition than in the inserted condition. These results were consistent with our predictions. There were no statistically significant differences between relevant and non-relevant time of rereading, F(1, 32.57) = 0.02, p = .879. Finally, there were no statistically significant differences between experimental conditions in time rereading, F(1, 43.57) = 0.66, p = .431.

Fig. 3
figure 3

Interaction effect between question condition and relevance of rereading. Relevant = Total time rereading relevant text information while searching; Non-relevant = Total time rereading non-relevant text information while searching

To examine differences in total number of text segments read while searching

between experimental conditions (H3b), a robust t-test and associated robust effect size were computed. Shapiro–Wilk normality test showed a violation of the normality assumption (W = 0.87, p < .001), although Fligner-Killeen indicated equal variances, FK(1) = 3.48, p = .06. Results displayed post-reading condition to search significantly more text segments (M = 99.16, SD = 70.16) compared to inserted condition (M = 59.18, SD = 40.02), t(49.89) = 2.99, p = .004, dR = 0.63. This result was consistent with our prediction.

Effect of question timing on processing strategies

Shapiro–Wilk normality tests showed incorrect elaborations to be non-normally distributed (W = 0.81, p < .001). A robust two-way mixed ANOVA revealed a statistically non-significant effect of experimental condition (F(1, 39.56) = 3.43, p = .071) and a statistically significant difference between correct (M = 9.21, SD = 4.09) and incorrect (M = 2.14, SD = 2.17) elaborations, F(1, 43.10) = 188.31, p < .001. Moreover, a statistically significant interaction was found between correct/incorrect elaborations and experimental condition, F(1,43.10) = 24.35, p < .001 (see Fig. 4). More concretely, students in the inserted condition made more correct elaborations (M = 10.44, SD = 3.68) than those in the post-reading condition (M = 7.92, SD = 4.16), and fewer incorrect elaborations (M = 1.28, SD = 1.34) than post-reading condition (M = 3.05, SD = 2.51). This result provided support for our hypothesis (H4a). Regarding correct and incorrect paraphrases, Shapiro–Wilk normality tests showed correct paraphrases (W = 0.88, p < .001) and incorrect paraphrases (W = 0.82, p < .001) to be non-normally distributed. Fligner-Killeen test proved correct paraphrases variances of inserted and post-reading conditions to be equal, FK(1) = 0.81, p = .368, and this was also the case for incorrect paraphrases variances of inserted and post-reading, FK(1) = 0.59, p = .442. Consequently, a robust two-way mixed ANOVA was conducted. Results showed no effect of experimental condition (F(1, 43.96) = 0.09, p = .766) nor interaction between condition and correct/incorrect paraphrases (F(1, 42.67) = 1.96, p = .168). However, a statistically significant difference was found between correct (M = 18.46, SD = 6.58) and incorrect (M = 1.34, SD = 1.52) paraphrases, F(1,42.67) = 742.71, p < .001. The result was in line with our prediction (H4b).

Fig. 4
figure 4

Interaction effect between question condition and elaborations. Correct = Total number of correct elaborations; Incorrect = Total number of incorrect elaborations

Impact of adjunct question timing on learning outcome

To examine whether inserted condition would be more effective than post-reading condition in the students’ learning (H5), we conducted an unpaired Student’s t-test with learning as a dependent variable, and question-answering condition as a between-subjects variable. Shapiro–Wilk normality test results was W = 0.97 (p = .060). Levene’s test indicated equal variances (F = 0.90, p = .345). Results showed that learning was higher for inserted condition (M = 6.22, SD = 1.81) than for post-reading condition (M = 5.08, SD = 2.01), t(74) = − 2.59, p = .011, d = − 0.59, which is consistent with the prediction.

Discussion

We investigated the effect of question timing on students’ processing strategies and online reading behavior when studying a challenging text, as well as its impact on learning. For this purpose, students either answered adjunct questions after reading the whole text or inserted questions placed right after the question-relevant information. We find that answering inserted adjunct questions is more effective for learning than answering post-reading questions, which confirms our fifth hypothesis. The analysis of online reading behavior and processing strategies may contribute to explaining this advantage. First, we analyzed a series of reading behavior indices that are indicative of cognitive processes. We first predicted that readers in the inserted condition would spend more time reading the text initially than those in the post-reading condition, which was confirmed. Our interpretation is that presenting questions inserted in the text improved the students’ standards for coherence when constructing meaning from a text, whereas post-reading questions induced a quick reading of the text aimed at forming an outline of the text that enables later location of relevant information (Ferrer et al., 2017; Higgs et al., 2017).

Our second hypothesis predicted that students in the post-reading condition would spend more time reading the questions for the first time than students in the inserted condition, which was confirmed by the results. This can be explained by the processes involved in task-model formation. When reading a question, the student builds a representation of the end goal and the information needed for the answer. When the question is inserted in the text, that relevant information can be easily reactivated. In contrast, when the questions are presented at the end of the passage, the relevant information is no longer active in memory. Therefore, the reader has to think about where it was located and possibly start mental search processes, which takes time.

The results also confirmed our third hypothesis, i.e., students in the inserted condition would allocate their resources more efficiently during the search process than students in the post-reading condition. Inserted-questions students spent more time rereading relevant than non-relevant information, but the opposite was true for students in the post-reading condition. Further students in the inserted condition spent more time rereading relevant information than post-reading students, whereas the opposite was true for non-relevant information. We had also predicted that students in the post-reading condition would search a higher number of text segments in comparison to students in the inserted questions condition. The effect of questioning to cue students’ attention toward specific information is well-known (Dirkx et al., 2015; McCrudden & Schraw, 2007). However, no research had explored how this effect varies as a function of question timing. The delay between reading relevant information and reading the question in the post-reading condition, plus the quick initial reading of the text makes searching for relevant information difficult for the students. On the contrary, that search is facilitated when the delay is very short (i.e., inserted condition), and the text has been read more carefully.

Regarding the processing strategies when answering questions, our fourth hypothesis predicted that students’ responses in the inserted condition would include more correct elaborations and fewer incorrect elaborations compared to students in the post-reading condition. Our results were consistent with this prediction. When answering inserted questions, especially high-level questions, students are challenged to apply text information to new situations. Students may easily reread relevant information, which activates the readers’ prior background knowledge to make the appropriate inferences. As a result, correct elaborations are quite likely. However, when answering post-reading questions students struggle to find relevant information, which interferes with the activation of prior background knowledge, and the corresponding correct inferences. As a result, the probability of correct elaborations decreases, while that of incorrect elaborations increases. However, we predicted no significant differences between the two experimental conditions regarding correct and incorrect paraphrases, which was confirmed. For freshmen students, paraphrasing depends on understanding explicit text ideas, which may be produced by recalling text ideas, or by accessing relevant information, at least when text content is accessible, which was the case. Since the text was available to answer the questions, paraphrasing the text was not difficult, independently of the question timing.

In summary, the analysis of online behavior variables while reading the text, the questions and making search decisions, as well as the students’ processing strategies while responding to the questions contribute to explaining why answering inserted adjunct questions enhanced the students’ learning compared to post-reading questions. Despite this, the present study has several limitations. First, the benefits of inserted questions should not be generalized to all types of texts and questions. They may not be as favorable for less challenging texts, or texts with sections relatively disconnected one from the other so that understanding a section was not necessary to understand any other. We also cannot confirm which condition is better for the retention of non-question information in the study phase (i.e., general attention perspective). Second, Read&Learn has some additional registration possibilities, but it also has some costs (e. g., unmasking may make scanning the text slightly more difficult). Although Vidal-Abarca et al. (2010) found no significant differences in strategic patterns between eye-tracking and the previous version of Read&Learn (i.e., Read&Answer), it would be advisable to compare the effect of question timing in a more natural environment (i.e., with unmasked texts. Third, we only examine the learning of ideas closely related to those addressed in the adjunct questions. We made this decision because our goal was to promote the students’ learning of important challenging text ideas. Future studies might examine how the effect of the adjunct question can be extended to information less connected to the questions. Other limitations are the students' age and the absence of a control group. Maybe younger students do not benefit from inserted questions. Actually, van den Broek et al. (2001) found that inserted questions were harmful to young readers when reading narratives because of the overloading of the readers’ working memory when processing text information simultaneously with answering questions. In addition, we cannot say that either timing of questions favors greater learning compared to another condition without questions (e.g., just reading the text). These limitations should be addressed in future studies.

Our findings have important educational implications. Textbook publishers and educational institutions who publish instructional materials should understand when to present questions when developing student learning materials, either inserted into the text or after reading the text. Electronic materials such as e-textbooks open new possibilities in this regard. The present study provides arguments to make decisions on these two points. Furthermore, if teachers are aware of the underlying question-answering processes, they can teach students comprehension strategies to answer different types of questions depending on when they are presented. At a more global level, educational institutions can include the main conclusion of this study when preparing training courses for in-service and pre-service teachers.