Introduction

Within twenty-first century literacy research, sourcing is regarded as a hallmark of advanced literacy skills (Bråten, Stadtler, & Salmerón, 2018; Britt et al., 2013; Goldman & Brand-Gruwel, 2018; Magliano et al., 2018). In this article, following Bråten, Stadtler, and Salmerón (2018), we define sourcing as attending to, representing, evaluating, and using information about the sources of document content. Relevant source features may include the author, the document genre or type, the venue, and the place and date of document creation or update (Britt & Aglinskas, 2002). Presumably, considering such source features in the reading process helps individuals become critical readers and learners rather than passive consumers of information (Bråten & Braasch, 2017). Important insights have been gained regarding how sourcing is related to learning and comprehension and how it can be promoted through instructional interventions at different educational levels (for reviews, see Brante & Strømsø, 2018; Bråten, Stadtler, & Salmerón, 2018). Additionally, contextual influences on students’ sourcing have been highlighted in recent years (Britt et al., 2018). In the current review, however, we focus on the role of individual difference factors in sourcing when people read to learn or comprehend document content.

Within literacy research, the process of sourcing has mainly been discussed by theoreticians interested in readers’ ability to handle multiple documents, that is, multiple document literacy (Strømsø & Bråten, 2013). However, existing conceptualizations of multiple document literacy vary substantially with respect to the individual difference factors that they consider, and these factors are more or less directly related to the process of sourcing. Further, the inclusion of particular individual differences may sometimes seem unsubstantiated and even somewhat speculative, with an empirical backing that is far from clarified. On this backdrop, it seems highly pertinent to conduct a systematic review of the empirical grounding (or the lack of it) that prevailing assumptions about relationships between individual differences and sourcing have. Hopefully, such a review will not only interest researchers in the field due to its potential contribution to theoretical clarification and refinement, but also provide educators and policy makers with important insights into what it takes to be a critical reader and learner and how the teaching of critical literacy may have to be adapted to individual differences within the student population.

Although sourcing is but one component of multiple document literacy, with content integration considered another important aspect (Braasch et al., 2018), it has long been regarded as crucial by researchers in this area (Rouet, 2006). For example, sourcing may allow readers to prioritize information from competent, unbiased, vetted, and updated sources, which, in turn, may help them build a more appropriate, higher-quality mental representation of the issue discussed across documents (Delgado et al., 2020). In particular, sourcing may facilitate the integration of content across documents because it promotes understanding of the reasons for different views and perspectives on the same issue (e.g., because authors differ with respect to their competencies or motives; Bråten et al., 2019). However, research on sourcing has not only been reviewed (Brante & Strømsø, 2018) and compiled (Scharrer & Salmerón, 2016) by researchers in reading more recently; it has also been increasingly conducted and discussed in areas such as science education (Duncan et al., 2018; Sinatra & Lombardi, 2020) and civic reasoning and discourse (Chinn et al., 2021; McGrew, 2021). Still, no prior review has focused on the potential influences of diverse individual differences on this specific, yet indispensable aspect of multiple document literacy.

Conceptualizing Individual Differences within Multiple Document Literacy with a Focus on Sourcing

Thirty years ago, Wineburg (1991) described how expert historians differed from high school students when trying to make sense of the same set of documents on a particular historical event. One of the most salient differences noted by Wineburg (1991) was that the historians engaged heavily in a sense-making activity that he termed “sourcing,” involving that they not only studied the contents of the documents but also considered information about the documents themselves. Thus, before reading the content of a document, the historians were observed paying attention to document information such as the author, document type, and place and date of its creation to judge its evidentiary value. When further processing the document, such information was considered in the interpretation of a document’s content. The high school students, in contrast, were observed to generally disregard document information and simply rely on the textbook’s description of the historical event.

Wineburg’s (1991) landmark study can be considered ground zero within research on sourcing in the reading process (Goldman & Scardamalia, 2013). This study did not lead to a wave of studies concerning sourcing within literacy research in the 1990s, however, and hardly did so in the first decade of the new century.Footnote 1 Arguably, the next important step in this area of research was Perfetti and colleagues’ attempt to conceptualize how readers mentally represent multiple documents that deal with the same topic or issue (Perfetti et al., 1999). Thus, without any empirical basis except for Wineburg’s (1991) early study and their own preliminary empirical work (Britt et al., 1999; Perfetti et al., 1995; Rouet et al., 1996, 1997), these authors proposed a documents model framework in which attention to source information played a crucial role.

Perfetti et al. (1999) built on the influential construction-integration model of Kintsch (1988), which explains the comprehension of a single text, and proposed that readers mentally represent related semantic content from multiple documents as a “situations model,” that is, as an integrated understanding of the situation described across documents.Footnote 2 However, in addition to this content-based representation, they suggested that readers of multiple documents represent information about the source of each document and links between such information and important document content, as well as links between the sources of different documents. Taken together, the source-content links and the source-source links described above constitute the intertext model within the documents model framework, with this model presumably required to understand the contribution of each source to an integrated representation of the situation, to qualify or judge content information in light of its source, and to understand relationships between sources. No complete documents model can therefore be achieved without the situations model being supplemented and integrated with an intertext model (Perfetti et al., 1999).

In accordance with Wineburg’s (1991) findings and their own preliminary work (Perfetti et al., 1995; Rouet et al., 1997), Perfetti et al. (1999) suggested that individual differences in disciplinary expertise, including “specific domain knowledge and trained experiences with texts in that domain” (p. 117), would influence readers’ representation of source-content and source-source links (i.e., intertext model construction). Likewise, later conceptualizations building on and expanding the documents model framework have highlighted the potential importance of readers’ domain, document, or disciplinary expertise for adaptive sourcing in multiple document literacy contexts (Britt et al., 2018; Rouet & Britt, 2011). In addition, these conceptualizations have featured a host of other individual differences presumably relevant to sourcing in the reading process.

Thus, in the multiple-document task-based relevance assessment and content extraction (MD-TRACE) model, which is a process model of multiple document use, sourcing is considered to be involved in the selection and processing of documents to build a complete documents model including source-content and source-source links (i.e., an intertext model; Rouet & Britt, 2011). In addition to knowledge about relevant source features (i.e., source knowledge) and document or disciplinary expertise, which are directly linked to sourcing in the model, Rouet and Britt (2011) highlighted the importance of “permanent internal resources” related to reading skills and strategies, prior knowledge about the content of the documents, and working memory/executive control. These reading skills and cognitive factors were not directly related to the subprocess of sourcing, however.

More recently, Britt et al. (2018; see also, Rouet et al., 2017) proposed a model of reading as problem solving (RESOLV). In particular, this model further elaborates and differentiates the first processing steps of the MD-TRACE model in suggesting that individuals construct mental representations of the reading context as well as the reading task, with these representations, in turn, guiding their processing of the documents (including documents model construction). In constructing these representations and further processing the documents, readers are assumed to draw on a broad array of internal personal resources, including not only reading skills and strategies (including metacognitive, self-regulatory skills), prior knowledge, and working memory/executive control, but also resources related to motivation and personality. Thus, achievement goal orientations (mastery, performance), individual interest, task values (attainment, utility, intrinsic), and self-concept of ability (self-efficacy, expectancy for success, perceived competence) were described as motivational factors most relevant to reading as problem solving, along with personality factors such as conscientiousness, openness to experience, need for cognition, and growth versus fixed mindset.Footnote 3 Finally, the RESOLV model acknowledged that readers’ beliefs, both regarding the topic discussed in the documents (i.e., topic beliefs) and regarding the nature of knowledge and the process of knowing (i.e., epistemic beliefs), are likely to influence reading as problem solving. Although the RESOLV model is distinguished by its broad discussion of the potential role of individual differences in reading, cognition, motivation, personality, and beliefs, it remains unclear whether or to what extent each of these factors is associated with reading as problem solving, in particular, not to speak of the subprocess of sourcing involved in documents model construction. Arguably, most of these individual difference factors seem as relevant to learning and comprehension in general as they are to reading as problem solving and the specific process of sourcing.

List and Alexander (2019) introduced an integrated framework of multiple texts (IF-MT) based on several previous conceptualizations within multiple document literacy (including the documents model framework, the MD-TRACE, and the RESOLV), most notably on their own cognitive affective engagement model of multiple source use (CAEM; List & Alexander, 2017). In this framework, List and Alexander (2019) described three stages of multiple document use: a preparation stage in which readers establish a stance toward the reading task, an execution stage in which they strategically process the documents, and a production stage in which they construct a task product. In terms of individual differences, the preparation stage is especially important. In this stage, readers are assumed to adopt a particular stance or orientation to task completion that involves a combination of their individual interests and attitudes concerning the domain or topic with their pre-established habits in dealing with multiple documents in terms of content integration and source evaluation (i.e., their multiple document proficiency/expertise). Specifically, when individual interest is low, attitudes weak, and multiple document proficiency low, readers are assumed to adopt a disengaged stance. Conversely, when individual interest is high, attitudes strong, and multiple document proficiency high, readers are assumed to adopt a critical analytic stance. Regarding the two stance profiles that fall in between, readers with high individual interest, strong attitudes, and low multiple document proficiency are assumed to adopt an affectively engaged stance, and readers with low individual interest, weak attitudes, and high multiple document proficiency are assumed to adopt an evaluative stance. Most importantly, these four stance profiles are also assumed to differ in terms of their sourcing behavior.

Thus, readers adopting evaluative and critical analytic stances to task completion are assumed to frequently engage in sourcing when working with multiple documents. However, only readers adopting a critical analytic stance are likely to judge the credibility of content information in light of the sources and also use source information in trying to reconcile textual conflicts, as well as to remember both content and source information in an organized and integrated way. In comparison, readers adopting an evaluative stance are likely to routinely access source information in judging content credibility and to accurately recall source information and source-content links, but these readers are not assumed to use source information in the service of meaning making and integration because of their low engagement (i.e., low interest and weak attitudes) in the topic or the task. Finally, both disengaged and affectively engaged readers are assumed to engage infrequently in sourcing (List & Alexander, 2017).

In addition to highlighting the importance of individual differences in readers’ interest, attitudes, and multiple document proficiency/expertise for adaptive sourcing, List and Alexander (2019) proposed that prior knowledge is likely to influence sourcing by supporting the adoption of a critical analytic stance. Similarly, epistemic beliefs about justification for knowing, in particular about appropriate sources of knowledge and methods for justifying knowledge claims, were assumed to influence sourcing by supporting the adoption of evaluative and critical analytic stances to task completion (List & Alexander, 2019).

Although not highlighted as an individual factor in their framework, List and Alexander (2019) discussed several forms of strategic competence of potential importance for readers’ multiple source use, including behavioral, cognitive (intratextual, intertextual), and metacognitive and regulatory strategies. Such strategies are described in great detail in Cho and colleagues’ (Afflerbach & Cho, 2009; Cho & Afflerbach, 2017; Cho et al., 2018) taxonomies of constructively responsive reading comprehension strategies (CRRCS) in reading multiple and digital texts (including hypertext).

The CAEM (List & Alexander, 2017) and the IF-MT (List & Alexander, 2019) are distinguished from other models of multiple document literacy by specifying how individual difference factors—in particular, profiles of individual differences based on individual interest, attitudes, and multiple document proficiency—are related to the subprocess of sourcing. Thus far, these assumptions must be regarded as hypotheses that are fairly loosely grounded in empirical evidence, however.

Finally, more specialized theoretical models that focus on situations in which different sources disagree also have relevance to the issue of individual differences in sourcing. One such model is the discrepancy-induced source comprehension (D-ISC) model (Braasch & Bråten, 2017; Braasch et al., 2012; Bråten & Braasch, 2018). Essentially, this model states that, when readers encounter conflicting information about a particular topic or issue and thereby experience a break in situational coherence, they may strategically pay attention to source information (i.e., who said what) in an effort to understand the conflict and restore coherent understanding of document content. Recently, the D-ISC model has been extended to contexts in which the content of a document contradicts readers’ pre-existing beliefs about a particular topic or issue (Braasch & Bråten, 2017; Bråten & Braasch, 2018). In such contexts, document content will be considered less plausible and individuals may therefore seek support from source information to try to make sense of the content (Bråten & Braasch, 2018). This explanation is consistent with the two-step model of validation by Richter and Maier (2017, 2018). According to these authors, when readers detect a conflict between content information and their prior topic beliefs, they may take strategic action and engage in elaborative processing of conflicting information, given that they are motivated and cognitively capable of doing so. Presumably, such elaborative processing of conflicting information may also include attention to the source of that information. To what extent perceived conflicts between content information and readers’ prior beliefs about the topic influence the process of sourcing, in particular, is an issue for further research, however.

While the D-ISC model seems to be consistent with an emphasis on individual differences in readers’ beliefs about the topic of the documents, the content-source integration (CSI) model described by Stadtler and Bromme (2014), which also focuses on the reading of conflicting information, brings to the forefront individual differences in prior knowledge. According to the CSI model, readers may try to resolve conflicts encountered in documents by asking themselves “what is true?,” in which case they direct their efforts toward validating textual claims in light of their prior knowledge or toward evaluating the quality of the argumentative reasoning underlying the claims. However, when readers do not possess sufficient prior knowledge or argumentative reasoning skills to evaluate textual claims directly, they may resort to the strategy of asking themselves “who to believe?,” in which case they pay attention to and evaluate the sources (e.g., the authors) presenting the claims.

Interestingly, then, the CSI model of Stadtler and Bromme (2014) seems to suggest that less prior knowledge may be associated with more sourcing, whereas other conceptualizations, such as the frameworks proposed by List and Alexander (2017, 2019), seem to suggest that more prior knowledge will increase sourcing, for example through its contribution to the adoption of a critical analytic stance to task completion. Needless to say, further clarification of whether prior knowledge actually is positively or negatively related to readers’ sourcing (if at all) seems to be an important aim for a systematic review of the role individual differences play in sourcing.

The Present Review

In summary, a range of individual difference factors have been highlighted within conceptualizations of multiple document representation and use during the last decade. These individual differences can be categorized into reading skills and strategies, cognitive factors, motivation and engagement, beliefs, personality, and expertise. As indicated in Table 1, different conceptualizations have emphasized the importance of these individual difference factors for multiple document representation and use, including for the subprocess of sourcing, to varying degrees. It is currently not clear, however, to what extent these individual difference factors actually have been included in research on sourcing, nor is it clear to what extent proposed relationships between these individual difference factors and sourcing have received empirical backing.

Table 1 Individual differences highlighted in conceptualizations of multiple document representation and use

Therefore, we set out to review the empirical work investigating associations between individual differences and sourcing in the reading process. The review was restricted to studies published between 1991 and 2020, that is, the period from the publication of Wineburg’s (1991) groundbreaking study to the present. As noted previously, we adopted a broad definition of sourcing as attending to, representing, evaluating, and using information about the sources of document content, such as the author, publication, or type of document (Bråten, Stadtler, & Salmerón, 2018). By building on this definition, we expected to identify studies that operationalized and measured sourcing in different ways, varying from noting and remembering source information to evaluating the credibility of sources and including source information in task products (e.g., in essays). Hopefully, this approach would also give us the opportunity to explore whether any associations between individual difference variables and sourcing might vary with the way sourcing was measured. Finally, given that we expected to identify studies that varied substantially with respect to domain or topic, we wanted to explore whether the role of individual differences in sourcing might vary across the domains or topics addressed in the reading materials.

Specifically, our systematic review was guided by the following four questions:

  1. 1.

    To what extent have the individual difference factors highlighted within conceptualizations of multiple document representation and use been included in the empirical research on sourcing?

  2. 2.

    To what extent have proposed relationships between individual differences and sourcing been supported by the empirical research?

  3. 3.

    Are there any indications that relationships between individual differences and sourcing may vary with the way sourcing is measured?

  4. 4.

    Are there any indications that relationships between individual differences and sourcing may vary with the domain or topic addressed in the reading materials?

Method

Search Strategy

We worked with a research librarian in the systematic review team of the university library at our institution to develop the search protocol and conduct the searches. The main search was conducted in January and February, 2019, with additional searches conducted in October 2019 and the beginning of October 2020. The search included three databases: ERIC (Ovid), PsychINFO (Ovid), and ISI Web of Science. Through several meetings with the systematic review team, an extensive search algorithm was developed and tested. The search algorithm was built from 91 search terms and their combinations. In addition to traditional searches in keywords indexed by authors and journals, the search algorithm made it possible to identify records based on the proximity of particular terms in the articles. For example, if the terms “information” or “text*” occurred in a proximity of three or fewer words from terms such as “source” or “sources” in an article, that article was included for abstract review even though none of these terms were indexed as keywords. The complete search algorithm that was used is included in the online supplementary material. We also performed a manual search of highly relevant journals for studies examining sourcing and individual differences. In addition, the third author examined the publication lists of researchers considered to be central contributors within this line of research. The database search, the manual journal searches, and the publication list examination were all restricted to articles published between January 1, 1991, and October 1, 2020.

Study Selection and Data Extraction

To be eligible for the review, the studies had to fulfill the following inclusion criteria:

  1. 1.

    The studies should examine the comprehension or evaluation of written resources, either text alone or text in combination with other representations.

  2. 2.

    The studies should report empirical research (e.g., no reviews or theoretical papers were included).

  3. 3.

    The studies should include actual reading. Thus, studies such as survey studies in which participants were asked to rank the trustworthiness of different types of media outlets in general without reading texts from these outlets were not included.

  4. 4.

    The studies should include at least one individual difference variable and one measure of sourcing.

  5. 5.

    The studies should examine typically developed populations (e.g., no participants with learning disabilities).

  6. 6.

    The studies should have a quantitative design and report results through numerical data.

  7. 7.

    The studies should be published in English.

Next, we developed a coding scheme to extract the following information from the selected studies: (a) country and participants (i.e., sample size, gender distribution, age, and grade level); (b) study design (i.e., experimental, quasi-experimental, or correlational); (c) instructions and learning material (e.g., text type, length, and topic; time limit for reading; reading medium; and availability of material during the response); (d) individual difference measures; (e) sourcing measures; (f) relationship(s) between individual difference variable(s) and sourcing variable(s); (g) type of statistics; and (h) field of research.

Results

Using the search protocol described in the method section and displayed in the supplemental material, the initial searches in the selected databases yielded 3891 results. After removing duplicates, 2891 records were screened with the abstract screening software Rayyan (Ouzzani et al., 2016). The first, second, and fourth authors collaboratively screened the abstracts of 250 (8.65%) records to establish a common understanding of the inclusion and exclusion criteria. The first and third authors screened the remaining 2641 abstracts. Of the 2891 records that were screened, 2618 were removed from the review for the following reasons: no relevant variables measured (1672 records), incorrect study design (430 records), incorrect population (193 records), incorrect publication type (316 records), or incorrect language (7 records). The remaining 273 records, together with 36 records identified through a hand search in relevant journals and an examination of publication lists, were eligible for a full-text examination. Thus, 309 records, comprising 324 studies, were examined in full text by the first and third authors. Of these 324 studies, 252 were excluded for the following reasons: sourcing was not measured (n = 131), individual differences were not measured (n = 49), no data was provided on the relationship between sourcing and individual differences (n = 38), no reading was involved in the study (n = 12), the full text could not be found (n = 7, all doctoral dissertations), the study design was incorrect (n = 5), or the paper was theoretical (n = 10). The fourth author randomly drew 54 of these 252 excluded studies (21.4%) and found 100% agreement on whether the study should be included and 96.3% agreement on the reason for exclusion. Seventy-two studies were thus coded by the first and third authors using the detailed coding scheme previously described. Figure 1 displays an overview of the entire coding process.

Fig. 1
figure 1

Overview of the coding process

These 72 studies, which are described in terms of participants, text materials, measures, and main results in Table S1 in the online supplementary material, included 8529 participants with substantial gender skewness. Five studies (Britt & Aglinskas, 2002, study 1; Lucassen et al., 2013; Potocki et al., 2020; Rouet et al., 1997; Wiley et al., 2020), representing 582 participants, did not report gender distribution. Among the remaining 7947 participants, 62.74% (n = 4986) were female, and 37.26% (n = 2961) were male. Participants represented all educational levels, with seven studies including 797 participants examining elementary school students, five studies including 313 participants examining middle school students, 12 studies including 1325 participants examining high school students, 17 studies including 2538 participants examining undergraduates, two studies including 98 participants examining master/graduate students, 20 studies including 2573 participants examining mixed or unspecified levels of university students, and nine studies including 885 participants examining samples across educational levels or samples outside the educational system. Thus, 5209 (61.07%) participants across the 72 studies were university/college students. The majority of studies (n = 52) were conducted in Europe in the following countries: Germany (n = 17), Norway (n = 15), Italy (n = 7), Spain (n = 5), France (n = 5), and the Netherlands (n = 3), with one study conducted in both Germany and Spain. The remaining 20 studies came from the USA (n = 14), Israel (n = 5), and Japan (n = 1). This distribution reflects the country where the data were collected and that several of the studies included in this review have author teams representing several countries. As displayed in Fig. 2, the publication trend indicates an increased number of studies in recent years.

Fig. 2
figure 2

Reviewed studies by publication year

With respect to reading medium, participants read digital documents in 46 and printed documents in 22 of the studies. In one study, participants read both digital and printed documents, and in three studies, reading medium was not specified. In the vast majority of the studies (95.83%, n = 69), participants read a predefined set or library of documents, with the number of documents varying from 1 to 17 (M = 6.15, SD = 3.31). In 17 of these studies, participants had to read all the documents, while it was up to the participants to decide which documents they wanted to read in nine studies. In the remaining 43 studies, there was not enough information in the manuscript to determine whether participants had to read all the documents. In the three studies that did not present participants with a predefined document set, they read on the open Internet.

Inclusion of Individual Differences

Our first research question concerned the extent to which the individual difference factors highlighted within conceptualizations of multiple document representation and use have been included in the empirical research on sourcing. As shown in Table 1, six categories of individual differences (i.e., reading skills and strategies, cognitive factors, motivation and engagement, personality, beliefs, and expertise), covering 19 individual difference constructs, have been proposed as important for multiple document representation and use in contemporary conceptualizations. As displayed in Table 2, 14 of these individual difference constructs were examined in the 72 studies included in this review. Across the 72 studies, 27 different individual difference constructs were examined in relation to sourcing.

Table 2 Individual differences measured in the reviewed studies

Of note is that in Table 2, we implemented a more general terminology than the original terms used to describe the measured individual differences in the reviewed studies (see Table S1 in the supplemental material). For example, some of the studies used the term “prior knowledge” (e.g., Braasch et al., 2014; List et al., 2017), whereas other studies referred to “prior domain knowledge” (e.g., Kammerer et al., 2021) or “topic knowledge” (e.g., Delgado et al., 2020; Peterson & Alexander, 2020), with “domain” and “topic” suggesting that prior knowledge was measured at different levels of specificity (McCarthy & McNamara, 2021). In Table 2, all these knowledge descriptors are referred to by the more general term “prior knowledge.” Other examples are that “claim agreement” used by Bromme et al. (2015) is considered a “topic belief” in Table 2 or that “word-level reading skills” in Table 2 comprises variables such as “word recognition” (Braasch et al., 2014), “word reading fluency” (Florit et al., 2019), and “reading speed” (Kammerer, Meier, & Stahl, 2016). A final example is that, due to the wording of the items in the measures of “topic involvement” (e.g., “value” and “importance”) used by Kang et al. (2011) and Westerwick (2013, study 1), we coded “topic involvement” in the more general category “task values” in Table 2.

Among the individual differences included in conceptualizations of multiple document representation and use, the most frequently examined category in relation to sourcing was cognitive factors (41 studies), followed by reading skills and strategies (29 studies), motivation and engagement (20 studies), beliefs (16 studies), expertise (8 studies), and personality (3 studies). However, the number of studies varied substantially for the individual difference constructs within these categories. For example, of the 41 studies examining cognitive factors, 35 concerned prior knowledge. Further, of the 29 studies examining reading skills and strategies, 16 concerned reading comprehension. Of note is that all the strategies subsumed under reading skills and strategies referred to strategic processes involved in the comprehension of text. Albeit containing fewer studies, the category motivation and engagement was dominated by studies examining interest (n = 12) and attitudes (n = 6). Additionally, two (i.e., achievement goals and self-concept of ability) of the five motivation and engagement constructs highlighted in conceptualizations of multiple document representation and use had not been empirically examined in any of the studies included in this review. Likewise, only two (need for cognition and growth vs. fixed mindset) of the four personality constructs had been examined empirically. Interestingly, personality factors had been examined empirically in relation to sourcing only to a very limited degree (n = 3 studies).

Another related issue is whether there are individual differences not highlighted in the conceptualizations of multiple document representation and use that have been examined in relation to sourcing. Table 2 also shows the 13 individual difference constructs we identified that were not included in any of the conceptualizations. Two of these constructs, vocabulary (Macedo-Rouet et al., 2020) and verbal comprehension (Ulyshen et al., 2015), representing the category of cognitive factors, had been examined in one study each. Two additional motivational and engagement variables were identified in the studies we reviewed. Kobayashi (2014) examined “personal relevance,” a composite variable of motivation that concerned importance, experience, and interest of the topic, whereas List, Stephens, and Alexander (2019) included a variable labeled “persistence” that referred to the reading time and the number of texts read. Reading time and texts read have been used as proxies for behavioral engagement in previous research (e.g., Bråten, Anmarkrud, et al., 2014; Bråten, Brante, & Strømsø, 2018). McCrudden et al. (2016) examined the variable “topic familiarity” based on a combination of measures targeting prior knowledge and interest. Therefore, we did not include this variable in either the cognitive or the motivation and engagement category. Two other studies included in this review referred to measures of “topic familiarity.” Lucassen et al. (2013) based topic familiarity on questions about participants’ interests and disinterests, whereas Van Der Heide and Lim (2016) referred to participants’ experience with an Internet platform in relation to familiarity. Accordingly, in Table 2, these two variables are coded as “interest” and “Internet experience.”

We also identified six studies examining sourcing in relation to a category that we labeled demographic variables, consisting of gender (4 studies), age (1 study), and parental educational level (1 study). Seven studies (e.g., Macedo-Rouet et al., 2013, experiments 1 and 2; Winter & Krämer, 2012) examined participants’ educational level in relation to sourcing. Arguably, educational level can be related to expertise. When, for example, von der Muhlen et al. (2016) compared the sourcing of undergraduates and scientists, the comparison concerned novices and experts and was coded as such, but the comparison was also related to educational level (i.e., undergraduate vs. PhD). However, when Macedo-Rouet et al. (2013, experiment 1) compared the sourcing behavior of fourth and fifth graders and found that fifth graders were better able to identify the most knowledgeable source in texts, the difference was not a matter of expertise since none of the groups could be considered experts on the topics in question (i.e., global warming, nutrition, or public transportation). Thus, we used “expertise” only in the studies in which one group of participants read texts within their area of specialization, such as in Rouet et al.’s (1997) comparison of graduate students in history (experts) and graduate students in psychology (novices) reading history documents. It is important to note, however, that the construct of expertise is multidimensional and can be assumed to involve a configuration of knowledge, strategies, interest, and beliefs in relation to a particular domain or discipline, as conceptualized within the model of domain learning (Alexander, 1997; Alexander et al., 2012). Specifically, within this model, experts in a domain or discipline are characterized by an interplay of a well-integrated body of knowledge, a well-established and efficient repertoire of deeper processing strategies, high individual interest, and adaptive beliefs about the nature of knowledge and the process of knowing.

Finally, we identified four studies examining emotional/affective constructs (Mason et al., 2017; Mason et al., 2020; Mason, Scrimin, Tornatora, et al., 2018; Mason, Scrimin, Zaccoletti, et al., 2018), two studies examining academic achievement (e.g., GPA; Hahnel et al., 2019; Mason et al. 2017), four studies examining Internet experience (Kammerer et al., 2013; Macedo-Rouet et al., 2020; Salmerón et al., 2020; Van Der Heide & Lim, 2016), and three studies examining historical thinking skills (Merkt et al., 2017; Merkt & Huff, 2020, experiments 1 and 2).

In summary, the results concerning research question 1 showed that the majority of individual differences highlighted as important for sourcing in theoretical models and conceptualizations have been examined. However, for the majority of the individual difference constructs, the number of studies was very few. The systematic review also identified a number of individual difference constructs not included in these conceptualizations that have been examined in relation to sourcing.

Relationships between Individual Differences and Sourcing

Our second research question concerned the extent to which proposed relationships between individual differences and sourcing have been supported by the empirical research. Table 4 shows the substantial variation in how sourcing was measured in the reviewed studies. In addition, different statistical analyses were performed and different types of results were reported across these studies (for details, see Table S1 in the supplemental material). Thus, it is neither meaningful nor possible to calculate a general and comparable effect size (e.g., Hedges’ g) to examine the strength of the relationship between the various individual differences and sourcing across the studies. Our main findings regarding the second research question are summarized in Table 3, which is based on the information provided in the supplementary material (Table S1).

Table 3 Specific studies (represented by study number) supporting and not supporting relationships between individual difference constructs and sourcing

Within the category of reading skills and strategies, six studies examined the relationship between word-level reading skills and sourcing. Three of these studies (Macedo-Rouet et al., 2013, experiment 1; Macedo-Rouet et al., 2020; Potocki et al., 2020) found a statistically significant relationship between word-level reading and sourcing, whereas three studies could not identify such a relationship (Braasch et al., 2014; Florit et al., 2019; Kammerer, Meier, & Stahl, 2016). These differences in results do not seem to be a matter of the educational level among the participants (i.e., word-level reading skills being more important for younger readers), since students from elementary to high school were represented in both groups of studies (i.e., studies with and without a statistically significant relationship between word-level reading and sourcing). For example, Macedo-Rouet et al. (2013, experiment 1) found a statistically significant relationship between word-level reading and sourcing in their study of fourth and fifth graders’ ability to evaluate information sources, whereas Florit et al. (2019) did not in their study of fourth graders. Likewise, Macedo-Rouet et al. (2020) identified a statistically significant relationship between word-level reading and sourcing in a study of ninth graders, whereas Kammerer, Meier, and Stahl (2016) did not.

Sixteen of the studies included in our review examined reading comprehension in relation to sourcing. Eight studies found a statistically significant relationship between reading comprehension and sourcing (e.g., Hahnel et al., 2019; Paul et al., 2019; Salmerón et al., 2020), and eight did not (e.g., Bråten, Brante, & Strømsø, 2018; Florit et al., 2019; Mason et al., 2017). Paul et al. (2019), for example, examined the effect of sourcing prompts and mutually exclusive claims on sourcing in a sample of 89 German fourth graders reading a text set about nutrition (i.e., whether a particular cereal was healthy). The participants stated whether the cereal in question was healthy and provided a justification for their decision. These justifications were coded for the frequency of source citations as well as for expertise and benevolence evaluations. Sourcing was also measured with memory for source-content links. Reading comprehension was measured with a standardized reading comprehension test. The results showed a main effect of reading comprehension on benevolence evaluations and memory for source-content links. Florit et al. (2019) also examined the relationship between reading comprehension and sourcing in a sample of fourth graders reading two text sets about unsettled topics (i.e., nutrition and video games) containing conflicting information. Sourcing was measured through the frequency of source-content links in post-reading essays. Reading comprehension was measured with a standardized test. The authors did not find a statistically significant relationship between reading comprehension and sourcing. In a clear trend, reading comprehension has primarily been measured in relation to sourcing among younger readers. Despite university students being clearly overrepresented across the 72 studies included in this review (i.e., 61.07% of the participants), only three of the 16 studies examining reading comprehension and sourcing included university-level participants, representing 26.74% of the participants in these 16 studies.

Given that students develop reading comprehension over the school years, we explored whether the age of the participants differed systematically in the studies with and without a statistically significant relationship between reading comprehension and sourcing. Both groups of studies included elementary, middle, and high school students as well as university students, and the mean age was similar between the studies with (M age = 15.36) and without (M age = 15.11) a statistically significant relationship between reading comprehension and sourcing. There was a trend towards somewhat larger samples in the studies with (M sample size = 139) than without (M sample size = 81.90) a statistically significant relationship, a difference that can obviously affect the p value.

Seven of the studies included in our review examined strategic competence in relation to sourcing. Five of these studies (e.g., Anmarkrud et al., 2014; List, Du, et al., 2019, studies 1 and 2) found statistically significant relationships between the use of comprehension strategies and sourcing, whereas two did not (Florit et al., 2019; Strømsø et al., 2020). For example, Anmarkrud et al. (2014) found a statistically significant relationship between the use of comprehension strategies and sourcing in a think-aloud study in which undergraduates read multiple and conflicting documents about a health issue. Participants’ verbal utterances during reading were coded for instances of strategic processing, and essays written after the reading of the texts were coded for explicit references to sources and number of source-content links. In addition, participants rank-ordered the documents according to trustworthiness. The results showed that strategy use was positively correlated with the numbers of both explicit references to sources in essays and source-content links. In addition, strategy use was associated with low trustworthiness ranking of a low credibility document. Four of the five studies identifying statistically significant relationships between the use of comprehension strategies and sourcing measured participants’ use of rather complex multiple text strategies focusing on the integration of content across documents in samples of university students. The two studies that could not identify a statistically significant relationship between the uses of comprehension strategies included younger students (fourth-grade and high school students).

Within the second category of individual differences, cognitive factors, 35 studies examined the relationship between prior knowledge and sourcing. Less than half of these studies (n = 16) identified a statistically significant relationship. Bråten, Strømsø, and Salmerón (2011) examined the relationship between prior knowledge and sourcing in a sample of 128 undergraduates reading seven partially conflicting texts with varying degrees of credibility on a science topic (i.e., global warming). Prior knowledge was measured by a researcher made 17-item multiple-choice test, and sourcing was measured by having participants rate the trustworthiness of the different texts and then indicate the extent to which they considered source features when rating the trustworthiness of the texts. The results showed that students with lower levels of prior knowledge placed more trust in biased sources than did students with higher levels of prior knowledge and that the latter put more trust in objective and balanced texts than in biased texts. These findings were corroborated by Mason et al. (2014), who had 134 ninth-graders read two text sets on conflicting topics (i.e., the health effects of cell phone use and genetically modified food). Prior knowledge was measured with a combination of open-ended and multiple-choice questions, and sourcing was measured with different rank-order tasks (i.e., ranking documents with respect to reliability) and log data (i.e., number of visits and reading time for most and least reliable documents). Prior knowledge was not related to the number of visits or the time spent on the most and least reliable documents in any of the document sets but was related to the reliability ranking of documents in both document sets.

The studies that did and did not find a relationship between prior knowledge and sourcing exhibited one interesting difference. Eleven of the 16 studies finding a statistically significant relationship between prior knowledge and sourcing measured prior knowledge using multiple-choice measures, whereas this type of measure was used in only six of the 19 studies that did not find a statistically significant relationship. Among the latter studies, prior knowledge was to a larger extent measured with open-ended questions (n = 4; e.g., Bråten, Ferguson, et al., 2014; Mason et al., 2017), true/false questions (n = 2; e.g., Kammerer, Meier, & Stahl, 2016; Ulyshen et al., 2015), or self-reported perceived knowledge (n = 4; e.g., Kammerer, Kalbfell, & Gerjets, 2016, experiments 1 and 2). Thus, how prior knowledge is measured might influence the relationship between prior knowledge and sourcing in hitherto unknown ways.

Five studies examined working memory/executive functions in relation to sourcing. Four of these studies (Braasch et al., 2014; Delgado et al., 2020; Florit et al., 2019; Macedo-Rouet et al., 2020) measured only working memory, whereas Mason, Scrimin, Tornatora, et al. (2018) examined both working memory and self-regulation. Two (Braasch et al., 2014; Macedo-Rouet et al., 2020) of the five studies found a statistically significant relationship between working memory and sourcing. For example, Braasch et al. (2014) examined sourcing in a sample of 59 upper secondary school students reading conflicting documents about a scientific topic (i.e., weather patterns). Working memory was measured with a working memory span task and sourcing was measured with a rank-order task in which participants ranked the documents from the most to the least reliable. The results showed that working memory capacity predicted the ability to discriminate between texts that varied in terms of reliability. Mason, Scrimin, Tornatora, et al. (2018) examined both working memory and self-regulation in a study of 7th graders reading conflicting information about a health topic. Working memory was measured with a complex reading span task, (psychophysiological) self-regulation was measured with heart rate variability, and sourcing was measured with a reliability ranking of documents and justifications for those rankings. Working memory was not related to sourcing but students with higher heart rate variability, that is, greater self-regulation, were more accurate in rank-ordering the documents.

In the only study examining argumentative reasoning in relation to sourcing, Mason et al. (2014) had ninth graders read multiple conflicting texts on two health issues. Argumentative reasoning skills were measured with a test on which participants identified informal reasoning fallacies in debates, and sourcing was measured with four tasks: (a) identify the two most and least reliable texts, (b) rank the texts from the most to the least reliable, (c) record the number of visits to the most and least reliable texts, and (d) note the time spent on them. However, argumentative reasoning was not statistically significantly related to any of these sourcing measures.

In the third category, motivation and engagement, two studies (Kang et al., 2011; Westerwick, 2013) examined task values in relation to sourcing, both finding a statistically significant relationship. Both Kang et al. (2011) and Westerwick (2013, study 1) referred to the individual difference they measured as “topic involvement,” but, given the wording of the items used in their measures (e.g., “value” and “importance”), we considered them measures of task values. Westerwick (2013, study 1) had 574 undergraduates read two online articles on a health topic for which the credibility of the source was manipulated (i.e., high, medium, or low credibility). Task values were measured with four items on a Likert-type scale, and sourcing was measured with a scale on which participants rated the believability, accuracy, trustworthiness, bias, and completeness of the sources. Regression analysis showed that task values predicted the ability to discriminate between high-, medium-, and low-credibility sources.

Of the 12 studies addressing the relationship between interest and sourcing, only three (List, Stephens, & Alexander, 2019; Strømsø et al., 2010; Tarchi, 2019) identified a statistically significant relationship between interest and sourcing. Strømsø et al. (2010) had 126 undergraduates read seven conflicting texts on a science topic (i.e., global warming). Topic interest was measured with a 12-item questionnaire, and sourcing was measured through memory for source-content links. List, Stephens, and Alexander (2019) is the only study included in the review that distinguished between situational and individual interest (Hidi, 2001). In their study, 197 undergraduates read six conflicting texts on a social science topic (i.e., the Arab spring in Egypt) that varied in regard to document type (e.g., essay, blog entry, or newspaper article) and credibility. Individual interest was measured with five items asking participants about their general interest in this topic, whereas situational interest was measured by having them rate the interestingness of each text as they accessed it. Sourcing was measured by the number of citations in the text participants wrote after reading the documents. Situational interest was positively correlated with sourcing, but individual interest was a negative predictor of sourcing in a hierarchical regression analysis. A mediation analysis did not find a mediated effect of situational interest on the number of citations in participants’ written responses via the time participants devoted to text access.

Of the six studies examining the relationship between attitudes and sourcing, four of the studies identified a statistically significant relation. For example, Kobayashi (2014) had 154 undergraduates read two conflicting texts about a health topic (i.e., the relationship between blood type and personality). Attitudes regarding the topic were measured by asking participants to rate four attitude statements on a Likert-type scale, and sourcing was measured with a composite called source acceptability, which was based on the rating of the credibility and persuasiveness of the two texts. In a multivariate analysis of variance, prior attitude was found to be a statistically significant predictor of source acceptability. In another study, van Strien et al. (2016) had 79 university majors read eight conflicting websites on a health topic (i.e., organic food). First, attitudes regarding the topic were measured with a 15-item questionnaire before the strength of these attitudes was measured with three items. Sourcing was measured in two ways: using eye-tracking data, attention to source information was calculated based on total fixation time on website logos and “about us” information, and credibility judgments were measured by having participants rate the trustworthiness, expertise, and convincingness of each website. Attitude strength correlated negatively with fixation time on source information and credibility rating for pages with attitude-inconsistent information. Moreover, attitude strength correlated positively with credibility rating for pages including attitude-consistent information. Additionally, there was an interaction of attitude strength with consistency/inconsistency on fixation time on source information.

Only three studies included in our review examined the category of personality in relation to sourcing. Neither of the two studies (Bromme et al., 2015, experiment 2; Winter & Krämer, 2012) that examined the need for cognition in relation to sourcing found a statistically significant relationship. Braasch et al. (2014) investigated whether a growth versus a fixed mindset (i.e., viewing intelligence malleable versus stable) was related to sourcing. Participants were 59 upper secondary students reading six conflicting texts with varying credibility on a science topic (weather patterns). Mindset was measured with an eight-item instrument, and sourcing was measured by having participants rank-order the texts from the most to the least reliable. The results showed that a growth mindset was positively and a fixed mindset negatively correlated with the ability to distinguish between texts of varying reliability.

Sixteen studies examined the relationship between the category of beliefs and sourcing. Among the six studies addressing topic beliefs, three (Bråten et al., 2016; Salmerón et al., 2020; Tarchi, 2019) found a statistically significant relationship with sourcing. Salmerón et al. (2020) had 207 4th–6th graders read three conflicting web pages about a health issue (i.e., tap or bottled water). Topic beliefs were measured both before and after the reading of the three web pages with five items asking about preferences and beliefs about tap and bottled water. Higher scores on this measure indicated favoring bottled water. Sourcing was measured with the number of explicit references to source features in texts participants wrote after reading the web pages, memory for source features, and source-content links. Topic beliefs measured before reading were not related to any of the sourcing measures. Post-reading topic beliefs correlated negatively with source-content links but not with any of the other sourcing measures. Bråten et al. (2016) had 71 undergraduates read one of two versions of a text about a health topic (i.e., potential health effects of cell phone use). The two text versions were identical, except for the concluding paragraph, in which one text concluded that no health risks were related to the use of cell phones, and the other version concluded the opposite. Topic beliefs were measured with two items asking participants to rate their agreement on whether cell phones should be considered a health risk. To measure sourcing, participants were asked to describe the text they had read and were given points for each source feature they mentioned. The results showed no relation between topic beliefs and sourcing; yet an interactive effect of topic beliefs with text version was found on sourcing.

Concerning epistemic beliefs, six of 10 studies examining the relationship between epistemic beliefs and sourcing found a statistically significant relationship between these variables. For example, Barzilai and Eseth-Alkalai (2015) had 170 undergraduate and graduate students read four blog posts about a scientific topic (i.e., desalination of sea water) that were written by authors varying in professional backgrounds (i.e., economists or hydrologists). The study included two conditions. In the conflicting condition, two of the blog posts were in favor of desalination, whereas the two other opposed it. In the convergent condition, all blog posts were in favor of desalination. Topic-specific epistemic beliefs were measured with a measure capturing three epistemic perspectives: an absolutist perspective that views knowledge as objective and certain; a multiplist perspective that views knowledge as subjective and justified by personal preferences and judgments; and an evaluativist perspective viewing knowledge as something constructed by people within a particular perspective, grounding this construction on evidence and shared standards. Sourcing was measured with three tasks. First, the participants were asked to connect sentences from the texts to one of the authors. Thus, this author-viewpoint-identification task measured participants’ ability to establish source-content links. Second, participants were asked to describe the purpose and viewpoint of each of the four blogs after being given the name of the author and the title of the blog post. This author-viewpoint-description task assessed participants’ ability to accurately recall and describe the viewpoint of the individual author. Third, participants were given an author-viewpoint-evaluation task consisting of two steps. In the first, participants rated whether each of the blog posts were believable, accurate, professional, balanced, reliable, correct, true, and trustworthy on a six-point scale, with a blog reliability score calculated based on the mean score on these eight items. In the second step, participants answered the open-ended question, “Why is the blog reliable or not reliable, in your opinion?” for each of the four blogs. Participants had the texts available when providing their answer, and the answers were coded for mentions of the author’s viewpoint as justification. The results showed that neither an absolutist nor an evaluativist perspective correlated with any of the sourcing measures. However, a multiplist perspective correlated negatively and statistically significantly with the author-viewpoint-identification and author-viewpoint-evaluation tasks. Based on the scores on the three sourcing measures, a latent variable called author-viewpoint comprehension was created. In a regression analysis, both absolutist and multiplist perspectives were statistically significant negative predictors of author-viewpoint comprehension, whereas an evaluativist perspective was a positive predictor.

The level of specificity in the measurement of epistemic beliefs seemed to be associated with whether a statistically significant relationship was found with sourcing. Most studies that did not find any relationship between epistemic beliefs and sourcing used either domain-general measures of epistemic beliefs (Ulyshen et al., 2015) or more domain-oriented measures, such as epistemic beliefs about medicine (e.g., Bromme et al., 2015, experiment 1). On the other hand, the six studies reporting a relationship, to a larger degree, used topic- or task-specific measures (e.g., Barzilai & Eseth-Alkalai, 2015; Barzilai & Zohar, 2012; Strømsø et al., 2011).

Finally, in the last category, expertise, we could not identify any studies examining multiple document proficiency in relation to sourcing. Hence, all eight studies addressing expertise concerned domain/document/disciplinary expertise. Five of these studies (Brand-Gruwel et al., 2017; Herrero-Diz et al., 2019; Lucassen et al., 2013; von der Muhlen et al., 2016; Wineburg, 1991) found a statistically significant relationship between domain/document/disciplinary expertise and sourcing. Brand-Gruwel et al. (2017), for example, examined sourcing in a sample of 19 novices (first semester psychology students) and 16 experts (university teachers with a PhD in psychology) reading web pages about two psychological topics (i.e., human memory and altruism). Based on two search engine results pages (SERPs) with 17 and 18 links, participants were asked to select five web pages on each of the topics and rank (prioritize) them. Sourcing was measured by the percentage of web pages on which source information was scanned (eye-tracking) and to what degree participants selected and ranked the most trustworthy web pages. The results showed domain expertise had no effect on the number of web pages on which source information was scanned, but a main effect of domain expertise was found on the ability to identify and select the most trustworthy web pages.

In summary, for research question 2, the results of the systematic review indicated that the empirical backing is rather ambiguous with respect to the relationships between individual differences and sourcing that have been suggested in contemporary conceptualizations of multiple document representation and use.

The Potential Role of Sourcing Measurement

Our third research question concerned the possibility that relationships between individual differences and sourcing might vary with the way sourcing is measured. As shown in Table 4, we identified 21 different measures of sourcing in the 72 studies included in the review, with the most frequently used sourcing measures being various questionnaires asking participants to rate the credibility or trustworthiness of sources (25 studies), citations in written products (e.g., essays, written arguments; 18 studies), and various types of questionnaires examining participants’ representation of source-content links (14 studies). Given the low frequency in the use of some of these types of sourcing measures, one should be careful drawing firm conclusions regarding the role of sourcing measurement. However, in an interesting trend, measures of sourcing that examined participants’ spontaneous sourcing (e.g., citations in a written product, log data, etc.) seemed to differ from measures of sourcing in which participants were prompted to source (e.g., questionnaires on which the participants rank or rate sources, source selection, memory for source features, etc.), with the latter type apparently yielding more positive findings regarding the relationship between individual differences and sourcing.

Table 4 Number of reviewed studies supporting and not supporting relationships between individual differences and sourcing by souring measure

The Potential Role of Domain or Topic

Our fourth research question concerned the possibility that relationships between individual differences and sourcing might vary with the domain or topic addressed in the reading materials. Across the 72 studies included in this review, we identified six domains or topics in the textual materials the participants read: health (23 studies; e.g., Barzilai & Zohar, 2012; Bromme et al., 2015, experiment 1; Mason et al., 2014), science (22 studies; e.g., List, Du, et al., 2019, study 1; Strømsø et al., 2010), history (10 studies; e.g., Barzilai et al., 2020, study 1; Merkt et al., 2017; Rouet et al., 1997), social science (6 studies; e.g., Kang et al., 2011; List et al., 2017; Winter & Krämer, 2012), psychology (2 studies; Brand-Gruwel et al., 2017; von der Muhlen et al., 2016), and restaurant reviews (1 study; Van Der Heide & Lim, 2016). In addition, we identified seven studies (e.g., Hahnel et al., 2019; Macedo-Rouet et al., 2020; Salmerón et al., 2016) using reading materials from two or more domains and one study (Herrero-Diz et al., 2019) that did not specify the topic or domain of the reading material. Given the low number of studies within some of these topics, we focused on the studies using reading materials from the three most frequently studied topics: health, science, and history.

As Table 5 shows, no clear general trend indicated that the relationships between individual differences and sourcing varied with the domain or topic addressed in the reading materials. However, some interesting differences can be seen when comparing the studies in which participants read texts about health and science. These two topics are particularly interesting given that the number of studies was comparable: 23 for health and 22 for science. One difference was that reading skills were examined in relation to sourcing to a substantially larger degree in the studies in which participants read about health topics, with this difference driven by reading comprehension, in particular. The majority of the studies (7 of 10) examining reading comprehension and sourcing did not find a statistically significant relationship, and the one study examining reading comprehension and sourcing when participants read about a science topic identified a significant relationship. The relationship between prior knowledge and sourcing also seemed to differ in regard to these two topics. Only 5 of the 16 studies examining prior knowledge and sourcing when participants read documents about health topics found a statistically significant relationship, whereas 9 of the 14 studies found a significant relationship when participants read about science topics.

Table 5 Number of reviewed studies supporting and not supporting relationships between individual differences and sourcing by domain

Finally, the studies in which participants read about historical topics examined very few individual difference factors. Only one study (Wiley et al., 2020) reported results about the relationship between more than one individual difference construct and sourcing.

In-Depth Analyses of Relationships between Reading Comprehension, Prior Knowledge, Interest, Epistemic Beliefs, and Sourcing

Given the mixed findings we obtained regarding relationships between individual differences and sourcing, we further explored relationships between the four most researched individual differences (i.e., reading comprehension, prior knowledge, interest, and epistemic beliefs) and sourcing in a set of more fine-tuned analyses. In these analyses, we carefully attended to how both individual differences and sourcing were assessed in each study, descriptive information about participants’ scores on these assessments, psychometric properties of these scores, and participants’ level of performance. Detailed information about all these dimensions are included in Tables S2-S5 in the supplemental materials. In the following, we summarize the additional patterns we were able to discern based on these in-depth analyses.

Reading Comprehension and Sourcing

Our in-depth analysis of the association between reading comprehension and sourcing indicated that the mixed findings regarding reading comprehension, at least in part, were due to how reading comprehension was measured across studies (see Table S2 for details). In the 16 studies examining reading comprehension in relation to sourcing, eight studies used multiple-choice tests (e.g., Mason et al., 2020; Paul et al., 2018, studies 1 and 2), four used cloze tests (e.g., Bråten, Brante, & Strømsø, 2018; Kammerer, Kalbfell, & Gerjets, 2016, experiments 1 and 2), two used open-ended questions (Macedo-Rouet et al., 2013; Salmerón et al., 2020), one used a statement veracity task (Merkt et al., 2017), and one used a multiple source comprehension task (Hahnel et al., 2019). In a clear trend, studies using open-ended questions and requiring extensive inferencing found a relationship between reading comprehension and sourcing (Hahnel et al., 2019; Macedo-Rouet et al., 2013; Salmerón et al., 2020), whereas reading comprehension was not related to sourcing in studies in which the former construct was measured with cloze tests (Bråten, Brante, & Strømsø, 2018; Kammerer, Kalbfell, & Gerjets, 2016, experiments 1 and 2; see, however, Kammerer, Meier, & Stahl, 2016). This trend seemed to be independent of whether sourcing was prompted or unprompted.

The findings were more mixed when reading comprehension was measured with multiple choice tests. However, we noted that most of the studies that did not find any relationships measured only unprompted sourcing in essays, with quite low scores obtained by participants. Also, most of these studies did not inform about the level of inferencing required by the reading comprehension measure.

Taken together, this analysis may suggest that relationships between reading comprehension and sourcing can only be expected to the extent that comprehension measures require the construction of mental representations, preferably at the level of situation(s) model representation (Kintsch, 1988; Perfetti et al., 1999). Further, it may be challenging to demonstrate a relationship when sourcing is unprompted and, consequently, characterized by low score variance.

Prior Knowledge and Sourcing

Our in-depth analysis of the relationship between prior knowledge and sourcing confirmed that how prior knowledge is measured, indeed, seems to matter (see Table S3 for details). Whereas all the studies measured prior knowledge at a topic-specific level, the formats of the assessments varied, with 17 studies using multiple-choice items (e.g., Barzilai et al., 2015; Kammerer et al., 2021; Stang Lund et al., 2017), seven studies using open-ended questions (e.g., Braasch et al., 2014; Bråten, Ferguson, et al., 2014; Mason et al., 2017), six studies using perceived knowledge items (Bråten et al., 2016; Kammerer, Kalbfell, & Gerjets 2016, experiment 1; van Strien et al., 2016), three studies using a term identification task (List, 2014; List et al., 2017; List, Stephens, & Alexander, 2019), and two studies using true/false measures (Kammerer, Meier, & Stahl, 2016; Ulyshen et al., 2015).

When prior knowledge was measured with multiple-choice items and open-ended questions and participants displayed a certain level of prior knowledge (e.g., on average obtained at least 40% of maximum score), the relationship with sourcing was mostly statistically significant, especially when sourcing was prompted and, thus, resulted in higher scores on the sourcing measures. As an example, Stang Lund et al. (2019), who had 140 upper-secondary students read documents on a health topic, found that students, on average, scored between 74 and 79% of maximum score on a multiple-choice prior knowledge measure and between 51 and 65% of maximum score on a prompted memory for source-content links sourcing measure. In that study, prior knowledge correlated with sourcing (r = 0.31), and path analysis showed a direct effect of prior knowledge on sourcing (β = 0.21) as well as an indirect effect via memory for textual conflicts (β = 0.07). In contrast, in studies in which prior knowledge was very low and studied in relation to unprompted sourcing (e.g., Mason, Scrimin, Zaccoletti, et al., 2018), studies in which only sourcing was very low (e.g., Florit et al., 2019), and studies in which both prior knowledge and sourcing were low (e.g., Bråten, Ferguson, et al., 2014), relationships between prior knowledge and sourcing tended to be statistically non-significant, presumably due to a lack of score variance in prior knowledge, sourcing, or both.

Regarding other types of prior knowledge assessments, none of the studies using true/false measures or term identification tasks, and only two of the six studies measuring perceived prior knowledge (Barzilai et al., 2020, experiments 1 and 2) found statistically significant relationships with sourcing. Although it did not seem to matter whether sourcing was prompted in these studies, participants displayed quite low prior knowledge in the studies using these measures, presumably reducing the functional value of participants’ knowledge in relation to sourcing. Taken together, the results of this fine-tuned analysis thus seem to suggest that higher (List & Alexander, 2019) rather than lower (Stadtler & Bromme, 2014) prior knowledge is conducive to sourcing activities.

Interest and Sourcing

Although the 12 studies that examined interest in relation to sourcing generally found that participants reported substantial individual interest in the topics discussed across documents and measured interest with high reliability, they did not produce much evidence for the importance of individual interest for the subprocess of sourcing (see Table S4 for details). In fact, 10 of the 12 studies found no positive relationships between these variables and one study (List, Stephens, & Alexander, 2019) even found that individual interest negatively predicted sourcing (i.e., the number of citations in written products) when other relevant variables, including prior knowledge, were controlled for. Further, in the two studies that found relationships between individual interest and sourcing (Strømsø et al., 2010; Tarchi, 2019), correlations were low and statistical significance possibly due to larger samples being included in those studies.

In the only study (List, Stephens, & Alexander, 2019) that measured participants’ situational interest during reading in addition to their individual interest (Hidi, 2001), participants who rated the interestingness of the documents higher during reading were also more likely to include citations to those sources in their post-reading essays. The correlation between situational interest and sourcing was quite low, however (r = 0.15).

Thus, although motivation and engagement recently have been highlighted within models of multiple document literacy (Britt et al., 2018; List & Alexander, 2019), asking participants to self-report their individual interest in the topic of the documents independent of the reading task context does not seem to be a valid indicator of such motivation and engagement. As noted by Bråten, Brante, and Strømsø (2018), possible reasons for this include that students who rate themselves highly on such scales may still not engage much in concrete, challenging multiple document tasks and sometimes even disregard source information because they rely on their own personal opinion about the issue in question. An alternative is therefore to have students rate their task-based or text-based (i.e., situational; Hidi, 2001) interest during reading, as was done by List, Stephens, and Alexander (2019). Another possibility is to focus on their behavioral engagement when working with the documents, that is, on their active, observable involvement in multiple document tasks as typified by time, effort, persistence, and productivity (Bråten, Brante, & Strømsø, 2018; Bråten et al., 2021).

Epistemic Beliefs and Sourcing

Our in-depth analysis of epistemic beliefs in relation to sourcing (see Table S5 for details) indicated that beliefs in absolute certain knowledge, reliance on personal opinions, and oversimplification of complex issues were negatively related to sourcing activities, whereas acknowledging the tentativeness of knowledge, relying on evidence and expertise, and realizing the need to justify knowledge claims by means of relevant resources both internal and external to the individual were positively related to sourcing (Barzilai & Eshet-Alkalai, 2015; Barzilai et al., 2015; Kammerer et al., 2013, 2021; Strømsø et al., 2011). This trend was most salient when epistemic beliefs were measured at a topic-specific level but was also present in some studies in which epistemic beliefs were measured at a domain-general or domain-specific level (Kammerer et al., 2013, 2021). However, we noted that three of the four studies that did not find any relationships between epistemic beliefs and sourcing did not measure epistemic beliefs at a topic-specific level (Bromme et al., 2015, experiment 1; Ulyshen et al., 2015; Wiley et al., 2020), and with only one of these (Bromme et al., 2015, experiment 1) targeting the justification dimension found to be important in other research.

Taken together, the results of this fine-tuned analysis seem to be consistent with the view that beliefs in the justification of knowledge claims by relying on evidence and competent sources and using procedures such as cross-checking multiple sources may promote adaptive sourcing, as recently highlighted within the integrated framework of multiple texts (List & Alexander, 2019; see also, Bråten, Britt, et al., 2011). Further, measuring epistemic beliefs in relation to the specific topic discussed across documents seems to be the better option when investigating links between such beliefs and sourcing.

Discussion

This systematic review sought to advance current knowledge on sourcing by providing answers to four research questions regarding the role of individual differences in attending to, representing, evaluating, and using information about the sources of document content.

Inclusion of Individual Differences

Our first research question addressed the extent to which the reviewed studies examined the individual differences highlighted in the various conceptualizations of multiple document literacy that included sourcing. The results indicated that the majority of these reader characteristics were considered, with the prevalence of cognitive factors followed by factors related to reading, motivation, beliefs, expertise, and personality.

The most examined individual difference concerned what readers already know about the domain or topic relevant to the document content, which is understandable given its influence on single-text comprehension (e.g., Ozuru et al., 2009). Further, more than half of the studies that investigated reading-related individual differences unsurprisingly examined the role of reading comprehension of single texts in the comprehension and evaluation of multiple texts. Regarding the motivational factors, it should be noted that two heavily researched individual difference constructs within motivation, achievement goals and self-concept of ability, were not included in the reviewed studies. According to the RESOLV model (Britt et al., 2018), they can be assumed to act as resources in constructing mental representations of the reading context and the task, which guide readers’ processing of the documents. In research on single-text comprehension and learning from text, mastery achievement goals have been shown to contribute to conceptual learning from science texts (Johnson & Sinatra, 2014). Likewise, self-concept of ability has been linked to reading performance (e.g., Colmar et al., 2019; Sewasew & Koester, 2019). Thus, future studies that include hitherto disregarded yet potentially influential motivation constructs could extend current understanding of sourcing within multiple document literacy. Personality factors have also been very sparsely examined in relation to sourcing, and they are much less frequently assessed in educational psychology research, presumably because they are considered individual differences that are difficult to change through educational interventions. In our review, the only personality factors analyzed at all were need for cognition and growth/fixed mindset, which might be conceived of as more modifiable, at least to some extent, in appropriate learning environments. These constructs have also been investigated in other research areas, for example, within conceptual change learning in science (Taasoobshirazy & Sinatra, 2011) and academic performance in mathematics (Bostwick et al., 2017).

The results showed that the individual differences that were not highlighted in the conceptualizations of multiple-document representation and use, yet investigated in the reviewed studies, spanned the already identified categories of cognitive and motivational factors. Moreover, demographic variables were investigated, such as gender, age, and parental educational level. In addition, some of the studies used educational level as an individual difference variable. Examined individual differences also included experiences with the Internet in terms of usage of the Web, social media, and a specific platform. Few studies concerned the role of emotions in sourcing, which is consistent with the fact that the interplay of cognition and emotion in reading processes is a relatively new area of research (e.g., Trevors et al., 2017; Zaccoletti et al., 2020). Of note is that in all the four studies in this area that were included in our review, affective states were measured physiologically in the school context by recording skin conductance or cardiac activity and using related indices of arousal and self-regulation.

Relationships between Individual Differences and Sourcing

Our second research question asked to what extent the proposed relationships between individual differences and sourcing were supported by empirical research. Overall, the results showed that such relationships emerged in half or slightly more than half of the reviewed studies, indicating that the empirical backing is not strong or, in some cases, clear. The reviewed studies varied in the kinds of analyses performed to investigate the links between individual differences and sourcing. For example, some of the investigations reported bivariate correlations, while others did not, and when individual differences were considered together with control variables in regression analyses, this analytic approach impacted the relationship between an examined individual difference construct and sourcing. However, variety is not only a negative aspect that may make a synthesis of results more difficult and less clear; it can also be considered a positive aspect to some extent, for example because it might indicate whether a range of sourcing tasks share an underlying cognitive mechanism.

Specifically, when considering both word-level reading skills and higher-level reading skills related to comprehension, proposed relationships emerged from approximately half of the studies across age groups and educational levels. These findings show that additional studies are needed to address the role of these reading-related factors in sourcing. Still, the results seem to rule out the possibility that word-level skills and reading comprehension play a role only when sourcing is considered in relatively young readers. With respect to reading comprehension, in particular, our in-depth analysis also suggested that relationships with sourcing may depend on the extent to which reading comprehension measures require mental representations, preferably at the level of situation(s) models (Kintsch, 1988; Perfetti et al., 1999). Further, the contribution of comprehension strategies to sourcing seemed to be more consistent. However, in general, complex comprehension strategies were considered, and only when readers younger than college and university students were involved did a statistically significant relationship not emerge. This outcome might suggest that comprehension strategies need to be mastered well to play a role in students’ sourcing.

Among the cognitive constructs, more than half of the studies documented a contribution of prior knowledge to sourcing. The relationship between these two variables seemed to depend on the measurement of prior knowledge. More specifically, our in-depth analysis of prior knowledge indicated that a certain level of prior knowledge may be needed for this construct to gain any functional value in relation to sourcing (List & Alexander, 2019), with quite a few studies including participants with very low prior knowledge resulting in little score variance, sometimes just to demonstrate that participants were novices with respect to the topic discussed across documents. Not surprisingly, working memory was found to be related to sourcing when readers were tasked with rank-ordering a number of increasingly reliable texts.

Among the motivational constructs, task value and attitudes were found to be most consistently related to sourcing, whereas the proposed role of interest (Britt et al., 2018; List & Alexander, 2019) was generally not confirmed. Interest can be measured at three levels of specificity and stability (Renninger & Hidi, 2011). At the broadest level, individual interest concerns a domain (e.g., science) and can be considered stable and long term. At the narrowest level, situational interest concerns a particular context or task that attracts students’ attention and, thus, can be considered unstable and short term. Topic interest is situated between these levels as it concerns individual interest in a specific topic within a domain (e.g., climate change) but, still, can be considered more stable and long term than situational interest. Although most studies measured interest at the topic level, which may seem appropriate for the purpose, the empirical backing of a relationship between this variable and sourcing was weak. This result might be due, at least partially, to low prior knowledge of the topic in question, leading participants to rate their level of interest inaccurately. In addition to an affective component related to emotional engagement in an event or activity, interest has a cognitive component related to the knowledge a person brings to the activity and other aspects of cognitive functions (O’Keefe & Harackiewicz, 2017). However, another possibility suggested by our in-depth analysis of interest in relation to sourcing is that this construct needs to be measured at the situational level or that behavioral indicators need to be used. This is because the challenging, complex tasks that participants typically are presented with in this area of research may require an active involvement that is not captured by self-reports of topic interest independent of the reading task context (Bråten, Brante, & Strømsø, 2018).

For the personality constructs, not much support for the proposed relationships was found. Very few studies examined these constructs in relation to sourcing, and only growth versus fixed mindset was found to be linked to sourcing. Further research is obviously needed to shed light on the role that personality may play in sourcing.

Among the belief constructs, about half of the investigations we reviewed provided empirical support for proposed relationships between sourcing and beliefs about the topic of the texts or about knowledge and knowing (i.e., epistemic beliefs). Regarding epistemic beliefs, the results seemed to suggest that scores on topic- or task-specific measures were more consistently related to sourcing than were scores on measures targeting less specific beliefs. Further, our in-depth analysis indicated that beliefs in justification of knowledge claims by attending to evidence, competence, and consistency across multiple sources were positively related to adaptive sourcing, which is consistent with theoretical accounts (Bråten, Britt, et al., 2011; List & Alexander, 2019). Finally, some support for the relationship between individual differences and sourcing was provided by research on the role of expertise, broadly conceived.

The Potential Role of Sourcing Measurement

Our third research question addressed whether relationships between individual differences and sourcing might vary according to the way sourcing is measured. The results showed that a large number of measures were used in the reviewed studies, some more frequently than others. Although substantial variation in this regard may make it more difficult to compare findings across studies, it also highlights the multiple ways an advanced literacy skill such as sourcing can be represented among students across educational levels.

A particularly relevant issue seems to be the distinction between unprompted (i.e., spontaneous) and prompted sourcing. Spontaneous sourcing was targeted, for example, in studies where participants referred to source features when trying to explain conflicting positions on the same issue in an essay task, without receiving any specific instructions to refer to sources (Anmarkrud et al., 2014; Bråten, Brante, & Strømsø, 2018; Kammerer, Kalbfell, & Gerjets, 2016). In contrast, prompted sourcing was targeted, for example, in studies where participants were explicitly asked to rank-order a set of documents according to their credibility (e.g., Mason, Scrimin, Tornatora, et al., 2018) or to indicate from which sources specific content within a document set originated (i.e., memory for source-content links; Strømsø et al., 2010). On the one hand, participants seemed to display rather poor spontaneous sourcing, and the consequential low variance in the adopted measures makes it difficult to identify a statistically significant relationship with individual difference factors. On the other hand, in the studies in which participants were prompted to source, they differed more in sourcing skills, and a significant relationship was thus more likely to emerge.

The Potential Role of Domain or Topic

Our fourth research question addressed whether relationships between individual differences and sourcing might vary according to the domain or topic addressed in the reading materials. Although a clear conclusion could not be drawn, an interesting variation concerned the contribution of some individual differences, such as prior knowledge and epistemic beliefs, to sourcing when considering the two most investigated domains, health and science. Unlike materials about health issues, those concerning science referred to an academic subject or were closely related to it. It is therefore plausible that what students already know about the subject content makes a difference in sourcing. It is also likely that epistemic beliefs may contribute more consistently to sourcing when participants read science than health documents. Through various activities and tasks involved in learning science as an academic subject, students have many opportunities to reflect on the nature of knowledge and the process of knowing concerning scientific topics, leading to epistemic understanding at various levels (e.g., Kuhn et al., 2000; Yang et al., 2016).

Limitations and Future Directions

As with every review, this one is not without limitations. We first note that the outcomes are based on samples in which females number almost twice as many as males. This finding is not surprising considering that females are overrepresented among the university populations of psychology and education students, which more frequently participate in the types of studies selected for this review. The inclusion of more gender-balanced samples will allow more solid and generalizable results.

A related limitation is that the majority of the studies involved university students across the USA and Europe, whereas younger students, especially in elementary and middle school, were underrepresented. The underlying reasons are quite obvious, as sourcing is a complex process and can be examined only to a limited extent in younger students. Moreover, it can be practically more difficult to involve them in this type of research, as it is time demanding and often does not fit within school constraints. However, more investigations on younger students who start being confronted with sourcing issues will extend our scientific knowledge about the development of critical thinking and literacy skills and also have implications for educational practice.

Although the 72 studies we reviewed were conducted in nine different countries on three continents, a substantial portion of this research was conducted by a limited number of research groups. We have no reason to believe that this has biased our results in any way. Still, research from an even broader research community would have been desirable. Given the steep increase in interest in this area of research in the last decade, a larger number of research groups can be expected to contribute in the coming years.

With respect to the presentation of documents, findings from studies in which participants search for and select documents themselves are essentially lacking. To what extent certain individual differences (e.g., executive function and self-regulation) become more pertinent in less constrained reading contexts is therefore an issue wide open for further research.

Some limitations more directly concern the variables of interest in this review. In some cases, the labels used to describe the examined variables were unclear. Although this is an issue that does not concern only research on sourcing and multiple text comprehension, further studies in this area would benefit from terminological clarification, such that variables related to prior knowledge, reading skills, and motivation have clear and unique referents. Moreover, some individual differences need to be investigated more deeply. For example, future research on personal relevance might shed light on the potential role of this variable as a moderator or mediator in sourcing and decision-making processes (Kobayashi, 2014). Some topics appear to be very important from a disciplinary perspective but are hardly perceived as personally relevant by students, especially at younger ages. If the topic of a set of documents has high personal relevance for readers, they may be more likely to invest effort not only in processing document content deeply, which is demanding in itself, but also to pay attention to sources and construct source-content links. Thus, studies comparing the reading of multiple documents that vary with respect to the personal relevance of the topic may extend our understanding of constructs that contribute to sourcing.

Further, in-depth investigations of the role of emotions in sourcing are highly needed. Emotions are not explicitly included in the models described in the background analysis, although both the CAEM (List & Alexander, 2017) and the IF-MT (List & Alexander, 2019) refer to affective engagement, which signals attention to motivational and emotional factors. According to the theory of achievement emotions by Pekrun (2006), motivational aspects such as control and value are antecedents of emotions experienced in relation to specific learning activities and tasks. Readers’ emotional reactivity can also be considered an individual tendency to react more or less intensely to emotional stimuli. A set of documents can be more or less emotionally charged depending on content and language style, which means that textual characteristics can interact with individual reactivity to emotional materials and potentially influence the process of sourcing (Bohn-Gettler, 2019).

Finally, we restricted the current review to the role of individual differences in the subprocess of sourcing, leaving aside another important component of multiple document comprehension, that is, the integration of content across documents. Although sourcing can be considered pivotal to multiple document comprehension, and although this selectiveness allowed us to provide an in-depth analysis of the role of several individual difference factors, our approach leaves open the question of whether individual differences might operate differently for content integration than for sourcing. Given theory and extant research within a single text paradigm, there is reason to believe, for example, that prior knowledge may play a more consistent role in content integration than in sourcing (McNamara & Magliano, 2009), and it is difficult to imagine that single text comprehension would not be strongly involved in understanding the content of multiple documents (Florit et al., 2019; Mahlow et al., 2020). To address this question beyond speculation, future systematic reviews of the role of individual differences in content integration within multiple document literacy are needed. In future reviews, it may also be pertinent to focus on one particular individual difference factor at a time, including how the role of that factor in multiple document literacy has been grounded theoretically and how it has been linked empirically to a broader array of multiple document literacy tasks.

Conclusion

In the current review, we provide a catalogue of individual differences figuring within models of multiple document literacy as a basis for scrutinizing to what extent the assumptions have been empirically supported. At the same time, we did not want to be restricted by the individual differences included in those models, also exploring more exhaustively the individual differences that actually have been researched in relation to sourcing and, thus, providing a catalogue of the individual differences falling outside the scope of the extant models as well. Hopefully, by adopting this approach, our review may encourage theorists in this area to carefully consider the empirical evidence when creating their models. Further, it may suggest individual differences that have not been included in the models but still seem worthy of further investigation.

Despite some limitations, our review provides new insights into the role of individual differences in sourcing. The results have scientific significance because they elucidate to what extent individual differences included in theoretical models of multiple document literacy have empirical support. They also have practical significance in suggesting which individual differences may facilitate or constrain sourcing when students work with multiple documents. Based on our review, we encourage future researchers in multiple document literacy to contribute to further clarification of the role of individual differences in sourcing, as well as to evidence-based knowledge of how the teaching of sourcing skills can be adapted to individual differences among learners. In particular, we call for future research that is specifically designed to investigate the effects of individual differences on sourcing and includes measures and measurements appropriate for discovering such effects.