Reading is a social practice defined by one’s historical and cultural context (Gee, 2007). Indeed, from its on origins on clay pots, advances in writing technology (e.g., printing press) have radically changed the forms that texts can take; how they are shared and accessed; their roles in social, cultural, and economic practices; and expectations for who should be able to read and produce them (Manguel, 1996; Olson, 1994). This gradual evolution has seen rapid developments over the past 30 years, with advances in digital technology now offering near universal access to text documents, and in effect changing what readers read (Mackey, 2020), how they do it (Baron, 2021; Rouet, 2005; Wolf, 2018), and for what purposes (Coiro, 2021; Leu et al., 2019; Magliano et al., 2018). Consequently, what is now regarded as literate differs markedly from that of even several decades ago (Alexander & The Disciplined Reading and Learning Laboratory 2012; Britt et al., 2018). It is within this new context that we consider one aspect of literacy that has become increasingly common: intertextual integration.

The matter of integrating multiple information sources has been a topic of great interest in several fields (Alexander & The Disciplined Reading and Learning Research Laboratory, 2012; Braasch et al., 2018; Nelson & King, 2022). Given our interest in applications for K-12 literacy learning and instruction, we focus in this study on intertextual integration as it has been conceptualized within educational psychology. In this context, intertextual integration refers to the mental process of selecting, organizing, corroborating, and synthesizing information from multiple document sources for the purpose of constructing a coherent representation of the situation presented among them (Barzilai et al., 2018; Brand-Gruwel et al., 2009; Britt et al., 1999; Leu et al., 2015; Perfetti et al., 1999; Wineburg, 1991). We focus here primarily on connections among text documents made during reading and writing, rather than connections made between information presented in other types of media (e.g., pictures, videos) or in other forms of discourse (e.g., conversations). Note that this definition of intertextual integration overlaps with the concept of intertextuality (Allen, 2022; Bazerman, 2004; Lemke, 2004). The latter has been defined in a variety of ways but generally refers to connections a person makes between texts and their store of accumulated textual, contextual, and cultural knowledge.

The remainder of this introduction is organized into three parts. First, we discuss the importance of intertextual integration in K-12 educational contexts. Second, provide a capsule summary of how intertextual integration has been conceptualized in frameworks, theories, and models of multiple document use. Third, we discuss hypothesized predictors of intertextual integration, and previous efforts to review this literature.

Intertextual Integration in K-12 Education

Over the past century, reading research and practice have focused largely on processes involved in reading single, carefully curated documents for a small range of academic purposes (Anderson et al., 1985; Gibson & Levin, 1975; Huey, 1908; Perfetti, 1985; Reichle, 2021; Scammacca et al., 2016). This work has shown that reading depends on learning how oral and written language are structured and connected (Adams, 1990; Byrne, 1998; Castles et al., 2018; Ehri, 2014); acquiring rich, well-connected, and easily accessible stores of word, domain, disciplinary, and cultural knowledge (Goldman et al., 2016; Hwang et al., 2022; Lee, 2007; McCarthy & McNamara, 2021; Perfetti & Helder, 2022); and efficient use of cognitive and metacognitive strategies to monitor and construct meaning within a document (Garner, 1987; Graesser, 2007). A large body of work has also shown that reader characteristics (e.g., motivation) interact with features of the document (e.g., length, genre) and task (e.g., form an argument) to influence comprehension processes and outcomes (Afflerbach 2016; Snow, 2002).

Although skillful single document reading remains vital, present-day readers often encounter tasks involving multiple printed and digital text documents (Baron, 2021). Therefore, beyond the skills just outlined, readers must now also develop facility with a variety of multiple document use skills, to include searching for, selecting, navigating, evaluating, and integrating multiple printed and digital documents (Britt et al., 2018). Such skills are critical for academic achievement (Alexander & The Disciplined Reading and Learning Laboratory, 2012; National Assessment Governing Board, 2021; National Governors Association & Council of Chief State School Officers, 2010; National Research Council, 2013), to competitively participate in information-based economies (National Research Council, 2012a, 2012b; Rouet et al., 2021), and to engage in informed civic discourse on matters of science, history, and culture (Goldman et al., 2016; Leinhardt & Young, 1996; List, 2023; Stadtler & Bromme, 2013).

Among the skills of multiple document use, intertextual integration plays a prominent role in current college- and career-readiness standards (National Council for the Social Studies, 2013; National Governors Association & Council of Chief State School Officers, 2010; National Research Council, 2013), in national and international assessment frameworks for K-12 students and adults (National Assessment Governing Board, 2021; Rouet et al., 2021; Sparks & Deane, 2015), and in conceptual frameworks for disciplinary and digital literacy (Coiro, 2021; Goldman et al., 2016; Leu et al., 2019). For example, the Common Core State Standards state that kindergarten students should be able to, “…identify similarities in and differences between two texts on the same topic…” (National Governors Association & Council of Chief State School Officers, 2010). Fifth graders are to “[d]raw on information from multiple print or digital sources…” and “[i]ntegrate information from several texts on the same topic…” And by 11th and 12th grade, students are to “[i]ntegrate and evaluate multiple sources of information presented in different media formats…” Similarly, it is expected in K-12 standards for social studies (National Council for the Social Studies, 2013) and science (National Research Council, 2013) that students develop competency with gathering, evaluating, and integrating information from multiple sources.

Tasks involving intertextual integration unfortunately prove difficult for many readers (Cho, 2013; Goldman et al., 2012; Many, 1996; McGrew et al., 2018; Raphael & Boyd, 1991; Rott & Gavin, 2015; Segev-Miller, 2007; Yang, 2002). Indeed, although children and adolescents regularly work with multiple documents in language arts, social studies, and science, they rarely form spontaneous intertextual connections (Cho et al., 2018; Many, 1996; Many et al., 1996; Stahl et al., 1996; VanSledright, 2002; VanSledright & Kelly, 1998; Wolfe & Goldman, 2005). This has been found even for college-level students (Greene, 1993; Kennedy, 1985; McGinley, 1992; Rott & Gavin, 2015; Segev-Miller, 2007; Yang, 2002). As we will discuss in the following sections, many factors have been proposed for why intertextual integration proves so difficult.

As with single document comprehension (Magliano et al., 2023; Wang et al., 2019), it is possible that intertextual integration skills cannot fully develop until foundational reading and writing skills are secured. However, evidence from at least one study suggests that intertextual integration is possible even with impaired word-level reading skills (Andresen, Anmarkrud, Salmerón, et al., 2019). Furthermore, even as children are developing word-level reading skills, they can be capable of performing complex comprehension tasks (Williams et al., 2016). Moreover, even by the age of six, children are able to monitor information sources (Drummey & Newcombe, 2002; Lindsay et al., 1991) and even critically evaluate their trustworthiness (Koenig & Harris, 2005). As reflected in current instructional standards (National Governors Association & Council of Chief State School Officers, 2010), it is not unreasonable to expect that even in kindergarten, and certainly by the upper elementary grades, children may be capable of integrating information from multiple text sources. In the following sections, we discuss what is known about intertextual integration and its predictors.

Frameworks, Theories, and Models of Intertextual Integration

Many psychological frameworks, theories, and models have been proposed to account for intertextual integration within the broader context of multiple document use (Afflerbach et al., 2014; Afflerbach & Cho, 2009; Bråten et al., 2020; Britt et al., 2018; Butterfuss & Kendeou, 2021; Cho & Afflerbach, 2017; Leu et al., 2019; List, 2020; List & Alexander, 2017a, 2019; Perfetti et al., 1999; Richter & Maier, 2017; Rouet et al., 2017; Segev-Miller, 2007; Stadtler & Bromme, 2014; van den Broek & Kendeou, 2022; Yang, 2002). In Table A1, we have listed a selection of 12, ranging from generalized to specialized accounts. We have provided summaries in the table of the major points of each and have noted the cognitive and affective factors that are explicitly specified in them. As context for the rest of the article, we provide a detailed description of the Documents Model Framework (DMF; Britt et al., 1999; Perfetti et al., 1999) as it has provided the foundation for much of the theorizing and empirical work that has followed.

Table 1 Zero-order correlations between language, literacy, and intertextual integration 

The DMF provided the first attempt at a psychological account of how meaning is constructed from multiple documents. It emerged from research on text-based history learning (Perfetti et al., 1995), in which it is common for readers to engage with multiple documents that present discrepant information and often vary in length, layout, format, structure, purpose, and reliability. Within the discipline of history, it is important to evaluate features of the document sources (i.e., sourcing) and consider how information presented among them is related (i.e., corroborate; Wineburg, 1991). Notably, intertextual integration proves important also in science and literature (Goldman et al., 2016).

It is proposed in the DMF that integration is accomplished by constructing two representations: an integrated situations model and an inter-text model (Britt et al., 1999; Perfetti et al., 1999). The integrated situations model contains from each document semantic information about the common topic or issue being discussed. The inter-text model comprises meta-document information, such as the source author, publication, date, and purpose. This information is captured in networks of source-document nodes (Perfetti et al., 1999).

The reader’s comprehension will depend on the forms of these two models (Britt et al., 1999). Following Britt et al.'s (1999) framework, three possible outcomes are illustrated in Fig. 1: (a) a mush model, (b) a separate representation model, and (c) a document model. A mush model is one in which the reader constructs a well-formed integrated situation model but a undeveloped inter-text model. In this case, information from across the documents is well represented but not linked to the sources from which it originated. Moreover, information about the sources and links among them are not captured. The opposite pattern can be seen in the separate representations model, wherein the inter-text model is well developed but the situation model is unformed. In this case, content is recalled and linked to each source but is not cohesively integrated. With the documents model, both the situation model and the inter-text model are well-formed. Information is clearly linked to the documents from which it originated, relations among the documents are captured (e.g., agree, disagree), and a well-integrated representation of the situation is formed.

Fig. 1
figure 1

Adapted from Fig. 3 in List et al. (2019), Fig. 1 in Saux et al. (2021), and Fig. 4.2a in Perfetti et al. (1999)

Multiple document representations. 

Forming intertextual connections is highly effortful—requiring both bottom-up and top-down processes (Kurby et al., 2005; van den Broek & Kendeou, 2022). Accordingly, these connections may not be formed in cases where the reader views them as unnecessary or is unable to do so. In certain contexts (e.g., history class) and for certain tasks (e.g., research report), though, it may be important to construct well-formed intertextual representations (e.g., documents model). Certain tasks can also promote greater intertextual integration than others (e.g., arguments vs. summaries; McNamara et al., 2023; Wiley & Voss, 1999). However, in other contexts, and for other tasks, the high cognitive effort may outweigh the need. Therefore, the likelihood that a reader will construct a particular representational form (e.g., mush model) depends in part on the context and task at hand (Frederiksen, 1975; Rouet & Britt, 2011; Rouet et al., 2017; Spivey, 1995; Van Dijk, 1979). In the following section, we turn the role played by individual differences in shaping intertextual integration.

Individual Differences in Intertextual Integration

A sizeable literature has examined the role that individual differences play in intertextual integration (Barzilai & Strømsø, 2018). However, few of these studies have used intervention, longitudinal, or computational designs. This has greatly limited what can be concluded about the existence, direction, or magnitude of any potential causal relations. However, predictors of intertextual integration have been studied in many cross-sectional investigations. As an initial step toward synthesizing this literature, we conducted a systematic review of studies reporting concurrent associations among individual differences factors (e.g., working memory) and intertextual integration.

Previous efforts to review this literature have revealed many useful insights (Alexander & Disciplined Reading and Learning Research Laboratory, 2020; Anmarkrud et al., 2022; Barzilai & Strømsø, 2018; Bråten et al., 2011; List & Sun, 2023; Richter & Maier, 2017; Tarchi et al., 2021) but are limited by several factors. First, only several reviews have used systematic and transparent search procedures (Anmarkrud et al., 2022; List & Sun, 2023; Richter & Maier, 2017; Tarchi et al., 2021). This is critical for reducing sampling error and for ensuring replicability (Alexander, 2020; Cumming et al., 2023). Second, previous reviews have synthesized results from studies involving K-12, college, and adult readers (Anmarkrud et al., 2022; Barzilai & Strømsø, 2018). This approach has the benefit of comprehensively mapping the terrain. However, combining such a broad range of ages can obscure potential developmental differences. Moreover, it complicates interpretation for both K-12 and college educators. Finally, most previous reviews have examined a wide range of multiple document use processes (e.g., source selection, source evaluation, intertextual integration) without clearly distinguishing how each is associated with particular individual differences factors.

Despite these limitations, previous reviews have provided useful insights into the roles that individual differences play in intertextual integration. For example, Barzilai and Strømsø (2018) identified a broad range of individual differences factors (cognition, metacognition, motivation, affect, socio-culture) that are associated with searching for, selecting, evaluating, and integrating information from multiple documents. Furthermore, they found evidence that these individual difference factors interact with one other, the reading task, and the context to produce different comprehension outcomes. However, few sources were cited for most factors and there were often large differences in the populations, documents, and tasks that were studied. This may point to limitations within the literature.

Other reviews have adopted narrower approaches. For example, Richter and Maier (2017) examined how individual differences in cognition and affect influence how readers identify and resolve discrepancies encountered across multiple documents. Using systematic search procedures, they identified 18 studies involving adolescent and adult readers. They found that readers’ prior beliefs about a topic and epistemic monitoring will determine whether a reader detects belief-inconsistent information. Then, once an inconsistency is detected, the reader’s epistemic goals and beliefs, background knowledge, working memory resources, and store of metacognitive strategies will influence whether they simply form a belief-consistent representation of the controversy or a more balanced mental model. Bråten et al. (2011) also exampled how beliefs influence multiple document comprehension. With a focus on expository texts, they found that epistemological beliefs (i.e., simplicity, certainty, source, justification) play important roles in creating and updating a task model, assessing information from each source and its relevance for the task, processing source contents, and creating and updating intertextual representations.

Finally, Tarchi et al. (2021) examined how executive functions (e.g., inhibition) are associated with multiple document use and comprehension. However, by including a wide variety of executive functions (working memory span, working memory reading span, working memory updating, problem solving, strategic processing, regulation, fluid reasoning), a mix of multiple document comprehension processes (intertextual integration, sourcing), and a broad age range (secondary school, undergraduate, adults), they were unable to any find robust patterns of association.

Present Study

The present study advances upon previous reviews in four ways. First, to reduce sampling error and to ensure transparency, we have used systematic search, selection, and coding procedures (Alexander, 2020; Cumming et al., 2023). Furthermore, to reduce publication bias, we included both peer-reviewed journal articles and dissertation/theses. Second, whereas others have combined results for K-12, college, and adult populations, we focused only on K-12 students. As noted, although intertextual integration is a cognitive taxing processes, it is an integral component of K-12 education (National Governors Association & Council of Chief State School Officers, 2010, 2010). However, in comparison to undergraduate and adult readers, K-12 students have been studied far less in research on multiple document use (Anmarkrud et al., 2022). By focusing exclusively on this age range, implications for K-12 educators may be more clearly understood.

Third, whereas most other reviews have examined the roles played by individual differences factors in several components of multiple document use (e.g., search, selection, evaluation, integration), we focused here only on intertextual integration. To this end, we included only studies that clearly measured intertextual integration; not, as others have, combinations of document searching, selecting, evaluating, and integrating sources. In doing so, we provide a clearer assessment of associations among individual differences and intertextual integration. We decided to focus exclusively on intertextual integration for several reasons. First, intertextual integration is clearly referenced in K-12 standards and assessment frameworks (National Assessment Governing Board, 2021; National Council for the Social Studies, 2013; National Governors Association & Council of Chief State School Officers, 2010; National Research Council, 2013). Therefore, there is clear practical importance for understanding the factors associated with this skill. Second, although individual differences in multiple document sourcing have been the subject of a recent and comprehensive systematic review (Anmarkrud et al., 2022), previous reviews of intertextual integration have either not used systematic search procedures or focused more narrowly on specific individual differences factors or task types.

Fourth, whereas previous reviews have combined findings from studies with markedly different designs (e.g., cross-sectional, longitudinal, intervention), we examine only concurrent associations among individual differences factors and intertextual integration. This provides for a clearer interpretation of the results. In these four ways, this systematic review provides a systematic and transparent assessment of the nature, features, and volume of research that has examined associations among individual differences factors and intertextual integration with K-12 students.

Research Questions

Our review was guided by four primary questions.

  1. 1.

    First, what are the characteristics of the participants involved in research on individual differences in intertextual integration? We predicted that more studies would involve secondary than elementary-level students. Furthermore, we predicted that studies would involve participants from a broad range of Western countries, language backgrounds, and socio-economic backgrounds. Finally, we predicted that few studies would involve participants with educational disabilities (e.g., dyslexia).

  2. 2.

    Second, what are the features of studies that have examined intertextual integration among K-12 students? Specifically, what kinds of tasks (e.g., form an argument, synthesize the information), documents (e.g., domain, genre, number of texts), and measures (e.g., essay, multiple choice) have been used? Based on previous reviews (Anmarkrud et al., 2022; Primor & Katzir, 2018), we predicted that a wide range of tasks, document types, and measures would be represented. Argumentation tasks, essay measures, and informational documents would be the most common, and narrative tasks, oral response measures, and narrative documents would be used infrequently.

  3. 3.

    Third, which types of individual differences factors have been studied? Based on previous reviews (Anmarkrud et al., 2022; Barzilai & Strømsø, 2018) and the frameworks, theories, and models of multiple document use and comprehension listed in Table A1, we predicted that the following types of factors would be examined: language and literacy skills (e.g., word-level reading); cognition and metacognition; motivation, emotion, and personality; and knowledge and beliefs. Given the selected age range, we predicted that language and literacy skills would be studied the most and personality the least. Semantic knowledge (e.g., domain, topic) and cognitive processes (e.g., working memory, attention) would be well represented, whereas emotion and epistemic beliefs would not.

  4. 4.

    Fourth, what is the direction and strength of associations among individual differences factors and intertextual integration? We predicted that certain epistemic beliefs (e.g., belief in authority) would be negatively and moderately correlated with intertextual integration (Bråten et al., 2011). All other factors would be positively correlated with intertextual integration, with literacy and knowledge being strongly correlated and cognitive skills and emotions being moderately correlated (Barzilai & Strømsø, 2018).

Method

We conducted two rounds of searches. The first was in May 2022 and the second in February 2024. For the first round, we used three methods: (1) an electronic database search, (2) a backward search, and (3) a forward search. For the second round, we used a snowballing approach.

Electronic Database Search

We consulted a university librarian to identify databases and search terms to capture relevant records. On 2 May 2022, we searched PsycInfo, ERIC, and ProQuest Dissertations and Theses Global databases for articles and theses/dissertations published prior to 2022. We used the following terms to search titles and abstracts: (noft("multiple text" OR "multiple texts" OR "multi-text" OR ("multiple source" OR "multiple sources") OR ("multiple document" OR "multiple documents") OR ("intertext" OR "intertexual")) AND ti,ab,su("read*" OR "write*" OR "process*" OR "navigat*" OR "integrat*" OR "teach*" OR "instruct*" OR "histor*" OR "scien*" OR "socioscientific") AND noft("elementary" OR "primary" OR "middle*" OR "high* " OR "secondary*")). These terms reflect the various ways multiple text use and comprehension have been defined (Goldman & Scardamalia, 2013), the disciplines in which this work has been conducted (Barzilai & Strømsø, 2018), and the targeted age range (i.e., K-12). The search yielded 4,667 sources.

Backward, Forward, and Snowballing Search

We also conducted a backward search of references from seven reviews and conceptual articles on multiple document comprehension (i.e., Anmarkrud et al., 2022; Barzilai et al., 2018; Bråten et al., 2020; List & Alexander, 2019; Nelson & King, 2022; Primor & Katzir, 2018; Saux et al., 2021). Then, using Google Scholar, we conducted an initial forward search for any records that had cited these reviews. Through these methods, we identified 294 sources.

In February 2024, we updated our search using the SnowGlobe application (McWeeny et al., 2021). SnowGlobe using a snowballing approach to search the references and citations of selected records. We used the 23 records identified through our initial searches. Using SnowGlobe, we identified 1115 records, of which 186 were duplicates. We used the inclusion and exclusion criteria described in the following section to screen studies in both rounds of searches.

Inclusion and Exclusion Criteria

We included only sources that met the following six criteria in this review:

  1. 1.

    The record was a peer-reviewed journal article, dissertation, or thesis. We excluded technical reports, conference abstracts, books, and book chapters. If the study was published as both a dissertation/thesis and journal article, we selected the journal version.

  2. 2.

    Study participants were enrolled in kindergarten through 12th grade (i.e., elementary, middle, or secondary/high school). We excluded studies that only involved undergraduate, graduate, or adult participants. We also excluded studies that did not disaggregate K-12 and adult populations. However, we did include studies involving participants attending prevocational programs.

  3. 3.

    The study reported original data from a cross-sectional, longitudinal, or group experimental design. We excluded studies that used qualitative or single-case designs.

  4. 4.

    The study included at least one measure of participants’ intertextual integration. We excluded studies that measured a dimension of multiple document use (e.g., search, selection, evaluation) but not intertextual integration and studies that only measured intertextual integration in combination with another skill (e.g., source evaluation).

  5. 5.

    The study reported at least one direct (i.e., not partial correlation) and concurrent correlation between an intertextual integration measure and an individual differences measure. If the study used an experimental or quasi-experimental design, measures had to be collected prior to the intervention. Data from any measures collected during or after the intervention were excluded.

  6. 6.

    The study was published in English prior to February 2024.

Screening and Data Extraction

We used covidence, a commercial web-based platform, for screening and data extraction. As illustrated in Fig. 2, the process involved three successive stages: (a) title and abstract screening, (b) full text screening, and (c) data extraction.

Fig. 2
figure 2

PRISMA diagram

Title/Abstract and Full-Text Screening

Results from the three search procedures produced 4961 records. After removing duplicates (k = 512), we (the first and second author) independently screened the titles and abstracts of the 4449 records identified in the first round of screening. The first author independently screened the titles and abstracts of the 1115 records identified in the second round. For title and abstract screening, we used the above noted inclusion and exclusion criteria. When it was unclear whether a record met the inclusion criteria, we advanced it to full-text screening for further review. We addressed all discrepancies through discussions.

Based on the exclusion criteria, we deemed 5561 irrelevant. We independently reviewed the full texts of the remaining 293 from the first search and six from the second. For the first search, we met after coding the first 20 records and then again after the next 50 to discuss discrepancies. After coding the initial 70, we revised wording for several of the exclusion criteria to improve clarity. We met again after the next 100 and then once all the records had been screened to discuss discrepancies. Total agreement for title/abstract screening was 95%, and Cohen’s \(\kappa\) was 0.63. For full-text screening, total agreement was 94% and Cohen’s \(\kappa\) was 0.67. We excluded 274 for various reasons reported in Fig. 2. This resulted in 25 records that met the full inclusion criteria. These are listed in the Appendix.

Data Extraction

We developed a codebook based on variables included in several previous reviews of single and multiple document comprehension (Anmarkrud et al., 2022; Barzilai & Strømsø, 2018; Primor & Katzir, 2018; Toste et al., 2020). The codebook included general information (study ID, publication year, country where the study was conducted, type of source, research question), information about participants (inclusion criteria, exclusion criteria, sample size, demographic information), documents information (number, length, format, mode, type, agreement, difficulty), the intertextual integration task (directions, genre, format), and individual differences in literacy skills (e.g., phonological awareness, rapid naming, decoding, single-text comprehension, written composition), cognition and metacognition (e.g., working memory, attention, processing speed, metacognition), motivation and emotion (e.g., self-efficacy, interest, emotional reactivity), personality (e.g., conscientiousness, need for cognition), knowledge and beliefs (e.g., content/topic knowledge, epistemic cognition), sourcing ability (e.g., trustworthiness), and demographics (e.g., age, sex/gender). We independently coded 24% (k = 6) of the sources. Total agreement was 87%, and Cohen’s \(\kappa\) was 0.73. We handled all disagreements through discussion. We made several revisions to the codebook to capture the full range of individual differences that were studied. The first author then independently coded the remaining 19 sources. Coded studies are available through the Open Science Framework (https://osf.io/xqb4h/). 

Results

Participant Characteristics

Records included 23 peer-reviewed journal articles (92%) and two dissertations/theses (8%), all published within a 13-year range from 2009 to 2022. Each record reported results from a single study with an independent sample. Across the studies, 5600 participants were included (median = 99, range = 44–1434). Twenty-two studies (96%) reported sex or gender, of which there 53% were female. Studies were conducted in five countries: Norway (k = 9, 36%), Italy (k = 6, 24%), the USA (k = 5, 20%), the Netherlands (k = 3, 12%), Israel (k = 1, 4%), and Hong Kong (k = 1, 4%). Mean participant age was reported in 21 studies (84%) and ranged from 9 years 7 months (Florit, Cain, Mason 2020) to 18 years 6 months (Strømsø et al., 2010). Grade levels ranged from fourth to twelfth, with five studies (20%) involving students at the elementary level (grades K-5) and (80%) at the secondary level (grades 6–12). Eight (32%) of the secondary-level studies involved students enrolled in college preparatory and prevocational programs. Although we consider these as part of the K-12 span, specific grade-level correspondences are unclear.

Authors in 19 (76%) and 20 (80%) of the studies did not report inclusionary or exclusionary criteria, respectively. Exclusionary criteria included having a disability (k = 2, 8%), being a minority language speaker (k = 1, 4%), and having poor eye movement registration (k = 2, 8%). Twenty studies (80%) did not report the number of disabled participants included. The two (8%) that did included students with developmental dyslexia (n = 27 across the two studies; Andresen et al., 2019a, 2019b; de Ruyter, 2020). The number of multilingual participants was reported in 15 studies (60%), with percentages ranging from 0% (Florit, Cain, et al., 2020; Florit, De Carli, et al., 2020) to 72% (Davis et al., 2017). Sixteen studies (64%) reported participants’ socioeconomic status. Of these, 14 reported that their samples were largely homogenous and from middle-class families. In the other two studies, participants came from lower socioeconomic families.

Features of Tasks, Measures, and Documents

In each study, participants performed a task requiring them to independently read a set of multiple documents. The most common task (k = 16, 64%) involved having participants read the documents for the purpose of writing an essay. These included writing arguments (k = 9, 36%), opinions (k = 1, 4%), or combinations of arguments and summaries/syntheses (k = 5, 20%). In other studies, participants answered verification questions (k = 6, 24%), oral open-ended questions (k = 2, 8%), multiple-choice questions (k = 1, 4%), or application questions (k = 1, 4%). In 16 of the studies (64%), participants were provided with information about the document source (e.g., author, publication, date).

The number of documents per set ranged from 2 to 10 (mean = 4.72, SD = 1.86), with an average of 2.6 (SD = 0.54) at the elementary level and 5.25 (SD = 1.68) at the secondary. In several studies, participants read multiple sets of documents (e.g., Beker et al., 2019; Mason et al., 2020), bringing the total number of documents read between 2 and 40. In all the studies, the documents were informational rather than narrative. Most studies (k = 23, 92%) involved document sets that addressed socioscientific topics (e.g., human impact on climate change) and presented a combination of complementary and contrasting viewpoints (k = 18, 72%). Of the 20 (80%) studies that reported document format, nine (36%) used only digital documents, ten (40%) used only print documents, and one (4%) used both. Ten (40%) studies reported whether multimedia documents (e.g., text with image) were used. Of these, multimedia documents were used in six (24%) text only documents in four (16%). Seventeen studies (68%) reported efforts to evaluate text difficulty. In each case, a readability formula was used. Several also reported consulting with content-area experts in developing the texts.

Generally similar tasks, measures, and document formats were used at the elementary and secondary levels. For both levels, argumentative essays and inference verification questions were used to measure intertextual integration. However, application questions were used only at the elementary level and multiple-choice questions, open-ended questions, and summary essays were used only at the secondary level. At both levels, print and digital formatted documents were used. Finally, the average number of documents per set differed across these two developmental spans, with a mean 2.6 (SD = 0.55) at the elementary level and 5.25 (SD = 1.68) at the secondary level.

Associations with Individual Differences

Tables 24 report zero-order correlations between performance on measures of intertextual integration and measures of individual differences in language and literacy, cognition and metacognition, knowledge and beliefs, and motivation, emotion, and personality. Results are discussed in the following four sections. To avoid redundancy, we provide information about the participants (e.g., nationality, age, sample size) and measures only once for each study.

Table 2 Zero-order correlations between cognition, metacognition, and intertextual integration

Language and Literacy

Eighteen studies (72%) examined associations among intertextual integration, language, and literacy (see Table 1). Single-document comprehension was the most studied (k = 12, 48%), followed by word-level reading (k = 6, 24%), and then a variety of other factors (e.g., strategy knowledge). In the following sections, we discuss results first for elementary- and then secondary-level participants.

Elementary-Level. Four studies (16%) examined associations between intertextual integration and literacy factors among elementary-level students. Three of these examined associations with single document comprehension. Beker et al., 2019) had fourth- and sixth-grade Dutch children (n = 105) read 20 brief (i.e., about eight sentences) expository text pairs (40 individual texts in total). In each pair, the second text contained an internal inconsistency. Across pairs, they randomly varied whether the first text contained an explanation that could help to resolve the inconsistency presented in the second text. To assess intertextual integration, they asked participants open-ended questions after every fourth pair of texts. They assessed single document comprehension by having participants answer multiple-choice questions about several brief texts. They found an association of r = 0.39 (p < 0.001) between these two factors.

In two studies, Florit and colleagues (Florit, Cain, Mason, 2020; Florit et al. 2020) had Italian participants complete two multiple document tasks that each involved reading three texts about a socioscientific issue (e.g., “Are videogames beneficial?”) and then writing an argumentative essay (Florit, Cain, et al., 2020; Florit et al., 2020). They assessed single-document comprehension by having students answer inferential and literal questions about an informational text. Florit, De Carli, et al. (2020) found associations of r = 0.06 (p > 0.05) and r = 0.15 (p < 0.05) between single document comprehension and intertextual integration for a sample of fourth- and fifth-grade students (n = 184).

With a separate sample of fourth-grade students (n = 94), Florit, Cain, et al. (2020) found associations of r = 0.23 (p < 0.05) and r = 0.47 (p < 0.01). In the same study, they also assessed students’ comprehension monitoring with a task that measured their ability to detect inconsistencies in 24 six-sentence texts. Correlations between comprehension monitoring and intertextual integration were r = 0.22 (p < 0.05) and r = 0.35 (p < 0.01).

Three studies involving Italian children also examined associations between word reading fluency and intertextual integration. Florit, Cain, et al. (2020) had fourth-grade students (n = 94) read 112 words and 48 non-words as quickly as they could; for their analyses, they combined scores from these two tasks. Correlations between word-level reading and intertextual integration were r = 0.14 (p > 0.05) and r = 0.36 (p < 0.01). Florit, De Carli, et al. (2020) tested a separate sample of four- and fifth-grade students (n = 184) with the same word-level reading and intertextual integration tasks, and found correlations of r = 0.10 (p > 0.05) and r = 0.16 (p < 0.05). Finally, Raccanello et al. (2022) found a correlation of r = 0.21 (p < 0.001) between fourth- and fifth-grade students’ (n = 334) word reading fluency and intertextual integration, as measured by an argumentative essay based on a document set.

Secondary-Level. Fourteen studies (56%) examined associations between intertextual integration and literacy factors among secondary-level students. Nine (36%) measured students’ single document comprehension, three of which were conducted by Bråten and colleagues and involved Norwegian students enrolled in college preparatory or prevocational courses. Bråten et al. (2018) had participants (n = 127) complete a cloze comprehension measure involving narrative and expository texts. They randomly divided students into two groups, each assigned to read a document set of ten texts about either climate change or nuclear power. To measure intertextual integration, they had students in both conditions write argumentative essays based on the document sets. Correlations between single document comprehension and intertextual integration ranged from r = 0.20 (p > 0.05) to r = 0.28 (p < 0.05).

Strømsø and Bråten (2009) had participants (n = 282) read a document set comprising seven texts on different aspects of climate change. After reading, they assessed students’ single document comprehension for each of texts using sentence and intratextual inference verification measures; scores were summed across texts for analyses. They also had students complete an intertextual inference verification task based on the document set that they had read. They found an association of r = 0.52 (p < 0.001) between intertextual integration and sentence verification (single document comprehension), and a correlation of r = 0.54 (p < 0.001) between intertextual integration and intratextual integration (single document comprehension). Finally, Strømsø et al. (2010) had participants (n = 233) complete intratextual and intertextual inference verification tasks based on the same set of seven text documents and found a correlation of r = 0.57 (p < 0.01) between them.

de Ruyter (2020) had Dutch prevocational students (n = 83) write argumentative essays after reading a set of four digital documents. They measured students’ single document comprehension by having them read five texts, and after each, answering a series of multiple-choice and open-ended questions. They found a correlation of r = 0.28 (p < 0.05) between these two variables. Finally, Wang et al. (2021) had US ninth-twelfth students’ (n = 1107) read four digital text documents and measured their intertextual integration with a multiple-choice and yes/no questions. To measure single document comprehension, they had students read 10 short passages and answer questions about key ideas and details for each. The correlation between these two variables was r = 0.57 (p < 0.01).

Four studies examined associations among single document comprehension and intertextual integration among seventh-grade students. Mason and colleagues conducted three with Italian students (samples ranged from 47 to 104). In each, they used a multiple-choice task to measure single document comprehension and an argumentative essay task to measure intertextual integration. Across these studies, correlations between these two variables ranged from r = 0.18 (p > 0.05; Mason et al., 2017) to r = 0.48 (p < 0.01; Mason et al., 2020). Mason et al. (2017) also measured single document comprehension with a sentence verification task and found a correlation of r = 0.13 (p > 0.05) between it and intertextual integration. Forzani (2016) had US seventh-grade students’ (n = 1434) complete an online research and comprehension assessment. As part of this, they assessed students’ ability to synthesize information (i.e., intertextual integration) by writing argumentative essays. They also assessed students’ single document comprehension by having them read several brief passages and answer multiple-choice questions about them. The correlation between single document comprehension and intertextual integration was r = 0.37 (p < 0.01).

Six studies examined associations among word-level reading and intertextual integration. In each, researchers used measures that involved reading from word lists with accuracy and speed (i.e., fluency). Three involved Norwegian students and used a word chain task involving 360 words arranged in 30 rows. Across these studies, correlations between word chain reading (i.e., word-level reading) and intertextual integration ranged from r = 0.16 (p > 0.05) to r = 0.43 (p < 0.001). Bråten et al., (2013a, 2013b) had tenth-grade students (n = 65) read six digital text documents and assessed their intertextual integration by having them write brief essays. The correlation between word-level reading and intertextual integration was r = 0.43 (p < 0.001). Andresen et al., (2019a, 2019b) also examined the association between word-level reading and intertextual integration with tenth-grade students (n = 44), half of whom were neurotypical in reading (i.e., > 20th percentile on a national standardized reading test) and half of whom had school-based diagnoses of developmental dyslexia. Participants viewed three webpages containing information presented through video, text, and pictures. Intertextual integration was assessed by having students orally respond to two open-ended questions about the document set. They found an association of r = 0.45 (p < 0.01) between word reading and intertextual integration for the entire sample. Finally, Braasch et al. (2014) had college preparatory students (n = 59) read six printed documents about assessed intertextual integration with an inference verification task. The correlation between this and word-level reading was r = 0.22 (p > 0.05).

Beyond word level reading and single document comprehension, researchers also examined associations between intertextual and several additional language and literacy skills. For example, Davis et al. (2017) examined associations among US fifth- to seventh-grade students’ intertextual integration and receptive (r = 0.47, p < 0.01) and productive syntax (r = 0.24, p < 0.05). To measure intertextual integration, they had students read two informational texts about a scientific issue and then complete a sentence and inference verification task. Wang et al. (2021) found an association of r = 0.53 (p < 0.01) between US ninth-twelfth grade students’ sentence processing and intertextual integration.

Finally, Cheong et al. (2019) examined the association between intertextual integration and written composition among secondary-level Hong Konger students (n = 415). They presented students with two, parallel-structured multiple documents tasks in Chinese (L1) and English (L2). For both tasks, students read six documents. They measured intertextual integration and written composition in Chinese and English with inference verification and argumentative essay tasks, respectively. Associations between Chinese intertextual integration and written composition in Chinese (r = 0.41, p < 0.01) and English (r = 0.28, p < 0.05) were similar to associated between English intertextual integration and written composition in Chinese (r = 0.30, p < 0.01) and English (r = 0.41, p < . 01).

Cognition and Metacognition

Fourteen studies (56%) examined associations between intertextual integration, cognition, and metacognition (Table 2). Of these, five (20%) examined associations between verbal working memory and intertextual integration, with correlations ranging from r = 0.06 (p > 0.05) to r = 0.538 (p < 0.001). Two were conducted with elementary-level students. Beker et al. (2019) measured Dutch fourth- and sixth-grade children’s (n = 105) verbal working memory with a task adapted from Daneman and Carpenter (1980). They had students listen to sets of unrelated sentences, answer a comprehension-related question about one of them, and then recall the last word of each sentence. Performance on this task correlated r = 0.13 (p = 0.183) with intertextual integration, as measured by the open-ended questions task described in the preceding section (Beker et al., 2019). Florit, Cain, et al. (2020) measured fourth-grade Italian students’ (n = 94) verbal working memory with a task involving six lists of nouns. For each list, students had to remember and write down the nouns representing the three smallest objects in the list, and in the order in which they were presented. Correlations between performance on this working memory task and the two intertextual integration measures described above were r = 0.18 (p > 0.05) and r = 0.22 (p < 0.05).

The three studies that examined associations between intertextual integration and verbal working memory with secondary-level students, all measured working memory with a version of a task developed by Daneman and Carpenter (1980). As described above for Beker et al. (2019), Andresen et al., (2019a, 2019b), and Braasch et al. (2014) had students listen to sets of sentences, answer comprehension-related questions about them, and then recall the last word of each. Andresen et al., (2019a, 2019b) found a correlation of r = 0.54 (p < 0.001) between Norwegian tenth-grade students’ (n = 44) working memory and intertextual integration, as measured with the oral opened-ended questions task described in the preceding section. Braasch et al. (2014) found a correlation of r = 0.29 (p < 0.05) between Norwegian secondary-level students (n = 59) working memory and intertextual integration, as measured by the inference verification task also described above. Mason et al. (2017) adapted the working memory task to have students read the sentences themselves. They found that performance on this task correlated r = 0.06 (p > 0.05) with intertextual integration, as measured with the previously described argumentative essay task.

Twelve (48%) studies examined associations between metacognition and intertextual integration. Florit, Cain, et al. (2020) measured Italian fourth-grade children’s (n = 94) comprehension monitoring by assessing their ability to detect inconsistencies in stories with and without inconsistent sentences. They found that comprehension monitoring was weakly but significantly associated with intertextual integration (rs = 0.22–0.35, p < 0.05), as measured by two essay tasks. Braasch et al. (2022) examined the association between sixth-grade US students (n = 54) metacognitive awareness and an argumentative essay task. They measured metacognitive awareness by having students rate items designed to measure their knowledge and regulation of cognition. They found that metacognitive scores were weakly correlated with students’ inclusion of belief-consistent (r = 0.11 (p > 0.05) and belief-inconsistent (r = 0.10 (p > 0.05) ideas in their writing. Davis et al. (2017) examined associations between intertextual integration and fifth- through seventh-grade US students’ (n = 83) comprehension strategy knowledge (i.e., predicting and verifying predictions, previewing, purpose setting, self-questioning, using background knowledge, and summarizing; r = 0.23, p < 0.05) and strategy awareness/use (r = 0.05, p > 0.05). Bråten et al. (2014) surveyed Norwegian students in the first year of secondary school (n = 279) about their use of strategies for comparing, contrasting, and integrating multiple texts, finding a correlation of r = 0.36 (p < 0.001) between this and intertextual integration. Cheong et al. (2019) surveyed students’ use of self-regulatory, discourse synthesis, and test-taking strategies before, during, and after writing. They found this to be weakly correlated with intertextual integration measured in Chinese (r = 0.10, p < 0.05) and English (r = 0.19, p < 0.01). Finally, Stang Lund et al. (2019) assessed secondary-level Norwegian students’ (n = 86) knowledge and potential use of reading strategies, such as how they would deal with information about scientific issues presented in various media sources. They found a moderate association between students’ comprehension strategies and their intertextual integration (r = 0.29, p < 0.05), as measured by a verification task.

Ten (40%) studies examined associations between intertextual integration and document sourcing skills. Only one involved students at the elementary level (Florit, Cain, et al., 2020). In that study, students’ essays were coded for intertextual integration and the inclusion of source-content links, with the correlation between them being r = 0.24 (p < 0.05). A variety of sourcing dimensions were examined in nine (36%) studies involving secondary-level students. These included students’ source selection (Bråten et al., 2018; Forzani, 2016), source evaluation (Braasch et al., 2014; Forzani, 2016), links made between sources and their contents (Braasch et al., 2022; Mason et al., 2017, 2018), and memory for sources (Stang Lund et al., 2019; Strømsø et al., 2010). Intertextual integration was measured with essay tasks in six of these studies and with verification tasks in three. Associations between intertextual integration and sourcing were all positive and small to moderate in magnitude (range = 0.05 to 41).

Knowledge and Beliefs

Nineteen studies (76%) examined associations among students’ knowledge, beliefs, and intertextual integration (Table 3). At the elementary level, Florit, De Carli, et al. (2020) assessed Italian fourth- and fifth-grade students’ topic knowledge, general vocabulary knowledge, and theory of mind. Correlations with their two measures of intertextual integration were as follows: topic knowledge = 0.08–0.15, vocabulary = 0.24–0.27, and theory of mind = 0.15–0.26. Davis et al. (2017) assessed US fifth- through seventh-grade students’ topic knowledge, general vocabulary knowledge, morphological knowledge, and two measures of epistemic beliefs: stability and structure of knowledge. Associations among intertextual integration and performance on these measures were as follows: topic knowledge = 0.42 (p < 0.01), general vocabulary knowledge = 0.56 (p < 0.01), morphological knowledge = 0.52 (p < 0.01), stability of knowledge =  − 0.10 (p > 0.05), and structure of knowledge = 0.27 (p < 0.05).

Table 3 Zero-order correlations between knowledge, beliefs, and intertextual integration

At the secondary level, researchers examined associations among intertextual integration and domain knowledge (k = 1), topic knowledge (k = 18), epistemic beliefs (k = 7), and vocabulary and morphological knowledge (k = 2). Wang et al. (2021) assessed US ninth-twelfth grade students’ domain knowledge with 25-item multiple-choice tests in history and science. Correlations with performance on an intertextual integration task about American football were r = 0.53 (p < 0.01) for history and r = 0.57 (p < 0.01) for science. They also assessed students’ knowledge of football, which was slightly less correlated with intertextual integration (r = 0.50, p < 0.01). Topic knowledge was assessed in ten studies with multiple-choice tests (Andresen et al., 2019a, 2019b; Bråten et al., 2014, 2018, 2013a, 2013b; Stang Lund et al., 2017; Strømsø & Bråten, 2009; Strømsø et al., 2016; Wang et al., 2021), in five with open-ended questions (Barzilai & Ka’adan, 2017; Braasch et al., 2014; Mason, 2018; Mason et al., 2017, 2020), in one with a verification task (Davis et al., 2017), and in two with rating scales (Braasch et al., 2022; Griffin et al., 2012). Correlations with intertextual integration ranged between − 0.03 (Strømsø et al., 2016) and 0.50 (Wang et al., 2021). Given the variation in how topic knowledge and intertextual integration were measured across these studies, patterns in the magnitude of these correlations are unclear. Moreover, the smallest and largest correlations were both found with secondary-level students and used multiple-choice tests to measure topic knowledge (Strømsø et al., 2016; Wang et al., 2021). However, these studies differed in the number (4 vs. 5) and format (digital vs. print) of the texts, the topic (American football vs. health), language (American English vs. Norwegian), and how intertextual integration was measured (multiple choice vs. essay).

Additionally, two studies (8%) examined associations among intertextual integration, vocabulary, and morphological knowledge. Results from Davis et al. (2017) were discussed above. Wang et al. (2021) found a correlation of r = 0.50 (p < 0.01) between intertextual integration and morphology and r = 0.54 (p > 0.05) with vocabulary.

Epistemic beliefs were assessed in eight studies (32%) with a variety of methods. For example, Barzilai and Ka’adan (2017) used a scenario-based approach to assess students’ topic-specific perspectives (i.e., absolutism, multiplism, evaluativism) about the nature, sources, certainty, validity, and justification of knowledge. In contrast, Strømsø et al. (2016) assessed students’ beliefs in the justification of knowledge by personal accounts, authority figures, or multiple sources. Across the studies, correlations with intertextual integration ranged from r =  − 0.04 (Strømsø & Bråten, 2009) to r =  − 0.43 (Bråten et al., 2013a, 2013b). This wide range is expected, as the direction and magnitude of association is thought to vary by belief type (Bråten et al., 2011).

Motivation, Emotion, and Personality

Fourteen studies (56%) examined associations among students’ intertextual integration, motivation (k = 11), emotion (k = 4), and personality (k = 2) (Table 4). At the elementary-level, Raccanello et al. (2022) examined how Italian fourth- and fifth-grade students’ performance on an intertextual integration task was associated with how they valued the task (r = 0.18, p < 0.01) and their level of boredom with it (r =  − 0.05, p > 0.05). At the secondary level, studies examined associations between intertextual integration and several dimensions of motivation: self-efficacy (Bråten et al., 2013a, 2013b; de Ruyter, 2020), task value (Bråten et al., 2013a, 2013b), effort (de Ruyter, 2020), engagement (de Ruyter, 2020), and interest (Bråten et al., 2014, 2018; Griffin et al., 2012; Stang Lund et al., 2017; Strømsø & Bråten, 2009; Strømsø et al., 2010; Wang et al., 2021). Correlations were all small and in the range of − 0.04 to 0.39.

Table 4 Zero-order correlations between motivation, emotion, personality, and intertextual integration

Three studies (12%), all involving Italian seventh-grade students, examined associations between intertextual integration and emotion (Mason, 2018; Mason et al., 2017, 2020). In each study, students’ emotional reactivity was assessed during reading with a heart rate monitor. Correlations with intertextual integration ranged from r = 0.02 (p > 0.05; Mason et al., 2017) to r = 0.25 (p > 0.05; Mason, 2018). Two studies, both with secondary-level Norwegian students, examined associations between intertextual integration and personality. Bråten et al. (2014) found an association of r = 0.14 between students’ need for cognition and their intertextual integration performance. Braasch et al. (2014) found an association of r = 0.02 (p > 0.05) between intertextual integration and an entity theory of intelligence (i.e., fixed mindset); the correlation with an incremental theory of intelligence (i.e., growth mindset) was r = 0.19 (p > 0.05).

Discussion

We begin this section by discussing characteristics of the participants that have been studied (research question 1) and features of the tasks and materials that have been used (research question 2). We then discuss what types of individual differences factors have been studied (research question 3) and what has been learned about their associations with intertextual integration (research question 4). We will then conclude by offering four recommendations for future research.

Research Questions 1 and 2: Participant Characteristics and Materials

Given the many potential ways that multiple document comprehension has been studied, we were interested in first taking stock of the types of participants, tasks, measures, and documents that have been used. As predicted, there was wide variation in each of these parameters. However, certain grade levels, tasks, measures, and types of documents were examined more frequently than others.

Participant Characteristics

As expected, we located more studies involving secondary- than elementary-level participants. At the secondary level, participants ranged from grade 6 to 12. Although it was encouraging to find that intertextual integration had been studied with elementary-level children, only five did so and none involved children below grade 4. Furthermore, few authors reported clear inclusion/exclusionary criteria and demographic information.

Nonetheless, several findings are important to discuss. First, individuals with disabilities were clearly included in only two studies, both of which were at the secondary level. Moreover, in several studies, disabled participants were purposely excluded. Second, in contrast to most other reading research (Share, 2008), relatively few studies involved English L1 participants. Finally, of the few studies that reported information about participants’ socioeconomic status, most were from middle-class backgrounds. We were limited, though, by our decision to include only studies reported in English. It may studies reported in other languages include more diverse populations. Nonetheless, we can draw two clear conclusions. First, improved demographic reporting is needed to better understand to whom these findings may apply. Second, efforts are needed to study more diverse populations, to include individuals with disabilities, those from different socioeconomic strata, individuals from non-Western societies, and those who speak multiple languages.

Tasks, Measures, and Documents

Consistent with Primor and Katzir’s (2018) findings, we found that intertextual integration was measured in a variety of ways, with essay tasks being the most widely used. The reason for this is unclear. It may be that essays are viewed as a particularly valid form of assessment. However, few studies clearly defined intertextual integration or provided a justification for why and how they selected their measure(s). Perhaps as a result, even with the stringent inclusion criteria that we set, we are not confident that the same construct was measured in each of the studies. For the field to advance, work is needed to better define intertextual integration and to develop more standardized and comparable methods for measuring it (Flake & Fried, 2020).

In contrast, there was consistency in the genres, structures, and topics of the documents. This allowed for much clearer comparisons across studies, particularly when considering associations with topic, domain, and disciplinary knowledge. Interestingly, the documents were nearly evenly split between print and digital formats. Although print documents still have a place in formal educational settings, they are quickly being replaced by digital ones. There is a pressing need, then, to better understand how young readers process and comprehend information distributed across multiple digital documents.

Research Questions 3 and 4: Individual Differences Factors and Intertextual Integration

In research questions 2 and 4, we asked what types of individual differences factors had been examined and the direction and strength of their associations with intertextual integration. As predicted, a wide range of individual differences factors had indeed been examined. However, few appeared in more than one study. Therefore, although nearly all were positively correlated with intertextual integration, variation in the samples, tasks, measures, and documents greatly limit what can be concluded about these relations.

Language and Literacy

Language and literacy were investigated in 72% (k = 18) of the studies. This was somewhat unexpected; although language and literacy are acknowledged as an obviously important factor in intertextual integration, they have received less attention than other factors in in frameworks, theories, and models of multiple document use (see Table A1). This is likely because most theorizing has been based on secondary, undergraduate, and adult readers, for whom word-level reading and single document comprehension skills are likely assumed to be well developed. Nonetheless, it would appear safe to assume that intertextual integration depends on both skillsets.

Only six studies examined word-level reading skills. Results indicated small associations with intertextual integration (0.14 to 0.36) in the upper elementary grades and small to moderate associations (0.22 to 0.45) at the secondary level. Word-level reading has been clearly shown to affect single document comprehension performance (Perfetti & Helder, 2022). Although these six studies are not much to draw strong conclusions from, they do suggest a role for word-level reading in intertextual integration as well. Interestingly, the three elementary-level studies were conducted in Italian. In comparison to English, Italian has a relatively transparent alphabetic writing system (i.e., clear and consistent mappings between units of sound and print), which on average, proves easier for learning to read and write (Job et al., 2006). The three secondary-level studies were conducted in English and Norwegian, which in comparison to Italian, have considerably more opaque alphabetic systems (Hagtvet et al., 2006; Perfetti & Harris, 2017). Consequently, in addition to developmental differences, these writing system differences further complicate how we might interpret the results.

Across the 12 (48%) studies that examined single document comprehension, the magnitude of associations with intertextual integration ranged widely (0.06 to 0.57). A variety of factors undoubtedly account for this variation. Among them, the different ways that both constructs were measured likely played a role. For example, the smallest association was found in a study involving fourth- and fifth-grade Italian students (Florit, Cain, et al., 2020). Intertextual integration was assessed with an argumentative essay task and single document comprehension with a standardized measure of inferential and literal questions. In the two studies with the largest associations (Strømsø et al., 2010; Wang et al., 2021), there was much closer alignment between the single and multiple document comprehension measures. In both studies, for example, the single document comprehension and intertextual integration measures involved texts written about the same topics and used the same response formats. In one of the studies (Strømsø et al., 2010), the same texts were even used for both measures. Different approaches to measuring single document comprehension tap different underlying skills (Cutting & Scarborough, 2006; Keenan et al., 2008), and therefore, single indicators of single document comprehension should be not assumed to be interchangeable (Clemens & Fuchs, 2022). The same likely holds for measures of intertextual integration, which introduce even greater complexity. In sum, we can conclude from these results only that there is emerging evidence of associations between literacy skills—both word-level and comprehension focused—and intertextual integration.

Cognition and Metacognition

Given the complexity of involved with intertextual integration, many cognitive and metacognitive skills have been hypothesized to play important roles (Follmer & Tise, 2022; Tarchi et al., 2021; see also Table A1). Cognition and metacognition were examined in a little over half (56%, k = 14) of the studies, with cognition examined in five (20%) and metacognition in nine (36%).

From among the multitude of cognitive variables (e.g., working memory, attention, intelligence) and operations (e.g., searching, monitoring, assembling, rehearsing, translating; Winne, 2018) that have been discussed within the multiple document literature, only verbal working memory was examined in these studies. Across the studies, associations between verbal working memory and intertextual ranged widely from r = 0.05 to r = 0.54. Although similar methods were used to measure verbal working memory, the number of studies was small and there was wide variation in the sampled populations, documents, and tasks. Accordingly, the results provide little insight into how individual differences in cognition—broadly or narrowly defined—are associated with intertextual integration. This stands in stark contrast to the extensive body of work that has examined associations between cognition, word-level reading, and single document comprehension (Butterfuss & Kendeou, 2018; Follmer, 2018; Peng et al., 2018, 2022).

Metacognition was examined in 12 (48%) of the studies. In two of these, metacognitive strategy knowledge was assessed (Braasch et al., 2022; Davis et al., 2017), with results showing small associations with intertextual integration. Results were mixed, though, for the three that examined metacognitive strategy use. In one study, the association between strategy use and intertextual integration was small and insignificant (Davis et al., 2017), However, in the other two, the relations were both moderate and signification (Bråten et al., 2014; Florit, Cain, et al., 2020). These results suggest that while knowledge of metacognitive strategies is unrelated to intertextual integration, actual metacognitive strategy use may be. However, with only four studies—and large differences in the populations and materials used among them—any conclusions are at best tentative.

Sourcing skills were examined in ten studies (40%). As noted above, a variety of sourcing skills were examined, to include source selection, source evaluation, source-content links, and source memory. Results ranged from almost no association (r = 0.05) to one of moderate magnitude (r = 0.41). This wide range is likely due to variation in the dimensions of sourcing that were examined, the different ways intertextual integration was measured, and differences in the populations that were sampled. In sum, it appears that various dimensions of sourcing may be differentially associated with intertextual integration.

Knowledge and Beliefs

Knowledge and beliefs appeared in 19 studies (76%), the most frequent of the four individual differences categories. Correlations with intertextual integration ranged widely (− 0.39 to 0.57). The bulk of studies involved secondary-level participants and examined associations between topic/domain knowledge and intertextual integration. Vocabulary and morphological knowledge were examined in four studies (16%). Various dimensions of knowledge, spanning from word to disciplinary, have been discussed extensively in theories of single and multiple document comprehension (Alexander, 2005; Goldman et al., 2016; Kintsch, 1988; Perfetti & Helder, 2022; Perfetti et al., 1999). As Kintsch (1974, p. 10) observed, for example, “understanding a text…consists of assimilating it with one’s general store of knowledge…[s]ince every person’s knowledge and experience is somewhat different…the way in which different people understand the same text may not always be the same.” Knowledge in its various forms is generally understood to facilitate comprehension and learning (Ackerman, 1991; Alexander et al., 1994; Cabell & Hwang, 2020; Hwang et al., 2022). However, in certain cases, it can be an impediment (Simonsmeier et al., 2022). For instance, one’s knowledge of a topic can stand stubbornly in the way of learning new concepts (Vosniadou, 1992). Furthermore, possessing relevant knowledge does not ensure that it will be effectively used to aid comprehension. Indeed, Wolfe and Goldman (2005) found that some adolescents sometimes use their background knowledge to make irrelevant elaborations that do help in building a coherent multiple document representation.

As many have noted, there is unfortunately little consistency in how knowledge is conceptualized, defined, and measured (Alexander et al., 1991; McCarthy & McNamara, 2021; Murphy et al., 2012, 2018). This was certainly what we found. Although the bulk of studies examined topic knowledge, this was done in a variety of ways. Moreover, the specific topics that were examined varied across studies. It is unclear, then, how comparable scores from these different measures may be. Additionally, few studies examined more than one dimension of knowledge. In any study, it is perhaps impossible to measure the full extent of a single domain or discipline much less multiple. However, the narrow focus on topic knowledge in this literature provides little insight into how other dimensions (e.g., domain, disciplinary) may influence intertextual integration. Nonetheless, it appears from these studies that knowing something of the documents’ topic, domain, or discipline is associated with better intertextual integration.

Epistemic beliefs were examined in seven studies (28%). Much research has shown that epistemic beliefs are an important factor in learning (Mason, 2010) and that variation in them is associated with literacy performance (Bråten et al., 2016; Lee et al., 2016). This may be the case particularly when engaging with multiple documents of varying quality and perspectives (Bråten et al., 2011; Strømsø & Kammerer, 2016). Consistent with previous research (Bråten et al., 2011), results indicated that endorsing certain beliefs is associated with better intertextual integration than others. For example, Davis et al. (2017) found that the belief that knowledge is singular and absolute was more weakly associated with intertextual integration (r =  − 0.10) than the belief that there are multiple forms of knowledge but that certain forms are more valid than others (r = 0.27). There were similar patterns in several of the other studies (Barzilai & Ka’adan, 2017; Bråten et al., 2013a, 2013b; Strømsø & Bråten, 2009). Using a slightly different framework, Bråten et al., (2013a, 2013b) found that intertextual integration was negatively associated with beliefs that knowledge is justified by personal knowledge (− 0.43) and authority (− 0.08), but positively associated with the belief that it is justified by multiple sources (0.17). In sum, whereas absolutist beliefs about knowledge (i.e., “knowledge come from an external source and is certain,” Kuhn, 1999, p. 23) appear to be negatively associated with intertextual integration, multiplist or evaluative beliefs (“knowledge comes from human minds and is uncertain,” Kuhn, 1999, p. 23) appear to be positively correlated. However, due to the variability in how epistemic beliefs were measured and considerable differences in the tasks, document sets, and participants involved, more research is needed to confirm the robustness of these findings. It is important to also note that in addition to individual differences, beliefs can vary across cultures (Buehl, 2008). Further research is therefore needed to systematically examine how epistemic beliefs may shape multiple document use and comprehension across different conditions, points in development, and cultural contexts.

Motivation, Emotion, and Personality

The final category represents a broad mixture of affective processes and personality variables. Motivational factors (e.g., self-efficacy, interest, task value) appeared the most frequently (k = 11, 44%). Whereas there is no evidence that self-efficacy is meaningfully associated with intertextual integration (Bråten et al., 2013a, 2013b; de Ruyter, 2020), several studies reported small to moderate associations with interest and task value. As has become a common theme in this review, wide variation in the tasks, documents, and participants renders what can be claimed about any potential patterns unclear.

Emotion was examined in three studies (12%), all with Italian seventh-grade participants, representing a rare case of systematic replication. Correlations for these three studies ranged from − 0.28 to 0.02. Not much can be concluded from this other than that certain emotions, such as boredom, appear to be negatively associated with intertextual integration. Future work should examine how motivation and emotion interact with specific types of documents and tasks over microperiods (e.g., minutes, hours) and macroperiods (e.g., days, months, years) of development (e.g., Neugebauer & Gilmour, 2020).

Personality was examined in only two studies (8%), both of which involved Norwegian secondary-level participants. Given that they measured different facets of personality, not much can be said beyond what is reported in the individual studies. Although personality is a major branch of psychology (Burger, 2015), and has been shown be associated with a broad range of human functioning (Anglim et al., 2022; Komarraju et al., 2011), it has not be a topic of much interest in reading research. The small correlations observed in these two studies do not inspire much confidence that this is a promising area for much further investigation.

General Discussion

Much has been proposed about the roles that various individual differences factors play in intertextual integration (see Table A1). However, much of this theorizing has been based on undergraduate and adult readers. It is unclear, then, how existing conceptualizations may apply to younger readers. In this review, we did not attempt to answer that question. Rather, we examined the nature, features, and volume of research that has examined associations between individual differences and intertextual integration. These associations are shown in Fig. 3 with an evidence gap map (Polanin et al., 2022). The map reveals that many of the factors specified in frameworks, theories, and models listed in Table A1 have been examined alongside intertextual integration with K-12 students. Noticeably, though, the coverage is uneven and incomplete. This greatly limits what can be concluded about the reliability and generalizability of these findings. Indeed, even in the two-dimensional map, we can see that many cells are blank and that few are shaded darker than the lightest tone. Were dimensions added for contexts, tasks, documents, and additional participant characteristics, the map would reveal even sparser coverage. Although there are few fields for which such a map would be fully darkened, it is clear far more research is needed.

Fig. 3
figure 3

Evidence gap map of associations among individual difference factors and intertextual integration. Shading corresponds to the number of studies that have examined an association between a particular individual differences factor (listed along the vertical axis) and intertextual integration at particular grade level (listed along the horizontal axis). As the number of studies increases, the shading darkens. White cells indicate that no study has examined the association between that particular factor and intertextual integration at that particular grade level. Studies that involved unspecified secondary-level students are not represented. SD, single document. ESs, effect sizes (i.e., zero-order correlation coefficients)

Recommendations

As for how the field might productively advance, we offer four recommendations. First, as has been observed for the broader social sciences (Shrout & Rodgers, 2018), more replication is needed. Indeed, as we have noted throughout this review, no associations have been studied under similar enough conditions to discern reliable patterns. Moreover, none of the factors reviewed have been studied across the full span of K-12 development. To build a fuller and more reliable understanding of how individual differences are associated with intertextual integration, a program of replicated research is needed that carefully examines relations among specific factors across the K-12 developmental span with well specified populations and comparable tasks, measures, and document sets.

Our second recommendation is for greater attention is needed to how intertextual integration is measured. There are several examples in the literature of well-validated measures of intertextual integration and multiple document use more broadly (Goldman et al., 2013; Hastings et al., 2012; Leu et al., 2015). However, most of the studies in this review used unvalidated measures, with little attention matters of dimensionality, reliability, or validity. These issues are characteristic of the broader psychological field (Flake & Fried, 2020). Although developing a measure involves many considerations (Lane et al., 2016), we will focus here on four points that have gone largely unaddressed.

First, research is needed to examine the dimensionality of intertextual integration. The definition we supplied in the introduction indicates that multiple subprocesses may be involved. In all the studies we reviewed, though, intertextual integration was treated as unidimensional. However, efforts generally were not made to test this assumption. Moving forward, this should be a priority. Care should also be taken to ensure that measures are invariant across participant subgroups, to ensure that individual items are unbiased, and that tests and items are well calibrated for their targeted populations. Second, greater attention is needed to score reliability. Although many of the studies reported internal consistency estimates, few considered other types of reliability (American Educational Research Association et al., 2014). Moreover, some of the measures suffered from poor internal consistency. The implications of this are not trivial. For the simple bivariate relations examined in this review, measurement error can attenuate effects. In multivariate models, the effects are less predictable (Bollen, 1989). Accordingly, developing measures that produce reliable scores is imperative. Third, it will be important to consider the validity of different types of intertextual integration measures (e.g., inference verification, essays. That is, are certain measures better at capturing intertextual integration than others? Furthermore, how should performance on different measures be interpreted? Finally, and related to our third recommendation, research is needed to understand which measures are best suited for assessing intertextual integration at different points in development and for tracking short- and long-term growth.

Our third recommendation is for research examining the precursors and early development of intertextual integration. We found only five studies that involved elementary-level participants, and none with children below grade 4. It is possible that the complex reasoning involved in many multiple document tasks may be viewed as too challenging for young children. However, current instructional standards make it clear that beginning in kindergarten, children are expected to engage with multiple documents for a variety of purposes (National Governors Association & Council of Chief State School Officers, 2010). Although this initially involves being read multiple documents, the expectations quickly shift to children themselves reading, searching for, selecting, evaluating, and integrating multiple documents. Current theories and empirical findings provide little insight into how children first develop these skills and the factors that give rise to early and perhaps ongoing performance differences.

A large body of work has shown that even preschool-aged children can adopt a critical stance when presented with information from multiple informants (for reviews, see Harris et al., 2018; Mills, 2013). This work has yet to involve textual sources and often requires participants to select the testimony from one of several informants rather than to integrate information from among them. Nonetheless, the insights garnered from research on the early development of source memory and evaluation may provide an entry point into understanding how children later develop multiple document use skills.

Young children are already exposed to multiple text documents in many early literacy programs. Take for instance a first-grade lesson from a language arts curriculum (Author, 2013) used in approximately 20% of US schools (Kaufman et al., 2017). In the lesson, children are read different versions of the same fable and are then asked to compare them (Author, 2013). In other programs, young children are read multiple informational texts on the same topic (Kim et al., 2021; Language and Reading Research Consortium et al., 2014). There is not an explicit focus in these programs on sourcing or intertextual integration. However, attention to these skills could easily be incorporated. Such research could provide useful insights into the early development of intertextual integration, particularly if studied alongside other literacy, cognitive, and affective factors.

Our fourth and final recommendation is to develop more formalized theoriesFootnote 1 of intertextual integration. The proposals listed in Table A1 have proven useful for guiding and interpreting research on the role that individual differences play in shaping intertextual integration. Indeed, many have proposed how particular factors (e.g., working memory) are involved in forming intertextual representations, with some offering hypotheses about the direction and general magnitude of these relations. However, these verbal theories are best suited for significance testing, which alone cannot corroborate a theory (Meehl, 1978; Robinaugh et al., 2021). Greater specificity is needed for more risky tests that could falsify or advance a theory (Frankenhuis et al., 2023; Gershman, 2019; Meehl, 1978; Robinaugh et al., 2021).

Formal theories and models have been used extensively in research on word-level reading and to a lesser extent on single-document comprehension (Goldman et al., 2007; Reichle, 2021). This approach has proven critical for testing how specific factors (e.g., phonological processing, background knowledge) are associated with reading (Harm & Seidenberg, 1999; Van Den Broek et al., 1996) and for understanding psychological and inter-agent processes more generally (Sun et al., 2005). Similar advantages may be found in developing formalized theories of intertextual integration and multiple document use. For instance, a formalized theory of intertextual integration could be used to construct a computational model. It would then be possible to test the effects of particular individual differences factors through experimental manipulations with actual and simulated data (Goldman et al., 2007; J. A. Greene, 2022; Robinaugh et al., 2021). Such work might provide useful insights into key pressure points in the processes and development of intertextual integration that could be targeted for assessment and intervention. To these ends, we recommend supplementing the proposals listed in Table A1 with formalized theories that more clearly state their constraints and explicate the circumstances under which they may be falsified.

Limitations

As with any study, this review is limited by several factors. First, although we placed no restrictions on the types of individual differences included, we adopted a much narrower approach than has been used in most previous reviews by focusing on only one component of multiple document comprehension (i.e., intertextual integration). Second, we examined only concurrent associations and included only studies that involved K-12 participants. In doing so, we omitted work that has examined other components (e.g., sourcing) and involved other populations. Furthermore, by examining only concurrent associations, we were unable to consider matters of causality, which is critical for an explanatory theory. However, we believe our approach provided for a more precise and measured assessment of findings relevant for K-12 students than has been offered in the past. Even with this more circumscribed approach, though, limitations within the literature made it difficult to find clear and reliable patterns.

Conclusion

In theorizing about how readers use and make sense of multiple documents, much has been hypothesized about the roles played by a wide range of cognitive and affective factors (see Table A1). We found that many of these have been examined in the context of K-12 students’ intertextual integration. However, much remains to be learned about how variation in them gives rise to differences in multiple document processing across the span of K-12 development. Although associations were generally positive and small to medium in magnitude, design and measurement issues greatly limit the conclusions we can draw. As Underwood (1975) noted, individual differences stand as, “…a critical test of theories as they are being born” (p. 130). We agree and believe that to better understand and ultimately improve multiple document comprehension among K-12 students, research is needed that (a) more systematically examines individual differences factors, (b) places a greater focus on early development, and (c) builds more formalized and testable theories.