Keywords

1 Introduction

One essential aspect of the democratization of our current societies is access to a disparate body of information through the Web. Although digital inequalities still exist even amongst young people, as the Covid-19 pandemic has revealed [1], in the digital era it is not necessary to possess books or to visit libraries to know more on unfamiliar topics. The Internet has become ‘the’ source of information, being easily accessed by students who are almost always connected through a smartphone, tablet, or laptop. From lower-secondary school onwards, students are daily ‘consumers’ of online information and use it to carry out school assignments. Even though the ‘Google generation’ of students has been raised in the Internet era, their information literacy has not improved as a result of the wide access to technology [2]. Even older and ‘Internet savy’ students have difficulties to identify fake news during online reasoning [3]. It is, therefore, vital that we teach beginner seekers of online information how to distinguish between reliable and unreliable sources. The huge array of documents that they can access by a simple click pose new challenges to them. Even if they are able to locate and select relevant webpages − which may not always be the case − students must then be able to evaluate their credibility or authoritativeness according to appropriate criteria [4,5,6].

The term ‘sourcing’ is used to refer to readers’ use of information about the author, genre, and date of publication [7]. Digital literacy necessarily includes sourcing skills. What do we have to believe in? Why? How can this information be reconciled with the others? These are crucial questions to deal with when reading multiple online pages that focus on the same content, not only to be able to discern the sources that deserve to be followed up, but, more importantly, to understand source content better [8]. Sourcing is associated with deep multiple-document comprehension and this link is particularly important. Students who are more able to process a webpage for both its content and ‘metadata’ regarding the authoritativeness of the source, are also those who are more able to comprehend conflicting information [9, 10].

Research has shown that primary [e.g., 11] and secondary school students [e.g., 1213], as well as undergraduates [14] may not evaluate sources at all, or appeal to naïve criteria when judging them and their content. Recent studies with adolescents have indicated that they use source information very little [15], or do not discern documents on the basis of author competence even when prompted with specific questions [16].

It is, therefore, not surprising if interventions on sourcing skills have been implemented in primary and secondary schools, as a recent review illustrates [7]. Some of them are long-term interventions that have not examined multiple-text comprehension as one of their possible effects [17, 18].

Our investigation focused on short-term interventions as they can be more easily implemented in the natural context of the classroom and also embedded in curriculum units of a subject like science or history. The study aimed to compare the effectiveness of two short-term instructional interventions implemented in lower-secondary school for promoting both evaluation skills of digital information sources and multiple-text comprehension. One intervention provided students with essential declarative knowledge on how to evaluate the reliability of online sources and the accuracy of the information [10, 19]. The other provided two contrasting cases of source evaluation strategies where students were asked to evaluate which was the best and should be followed up [20].

2 Evaluation of Source Credibility and Multiple-document Comprehension

In framing our study, we took into consideration the Discrepancy-Induced Source Comprehension (D-ISC) model [21, 22]. According to this model, readers pay more attention to “who says what” when they are faced with conflicting information on the same issue or questions that are presented by a different source. The perception of discrepancies between texts acts as a potential mechanism through which readers are prompted to process and evaluate source information more deeply when encoding the links between sources and the related information. Greater encoding is likely to occur when the mental representation of information of texts includes the links between sources and their content and conflicting information is more likely to be integrated into a coherent overall mental model [23]. Not surprisingly, single text comprehension is a widely investigated research area, given the relevance of the skill to construct meaning from text [e.g., 24, 25]. Much research is based on the well-known Kintsch’s model [26]. Compared to single-text comprehension in this model, multiple-text comprehension includes the additional layer of the intertext model [27]. When reading conflicting documents, readers may achieve overall coherence through the intertext model, despite there being contrasting information.

The link between sourcing and multiple-document comprehension is not only theoretically justified, but also empirically documented [4, 9, 28, 29]. All studies revealed a positive relation between sourcing and multiple-text comprehension as readers’ attention to source information is associated with a greater comprehension of multiple conflicting documents.

3 Types of Short-Term Intervention for Promoting Sourcing Skills and Multiple-document Comprehension

Two main types of short-term intervention can be identified in the current literature as effective in promoting both source evaluation and comprehension of multiple documents in secondary school students.

3.1 Intervention Based on Declarative Knowledge

One type of short-term intervention focuses on providing declarative knowledge about source evaluation. An example of this type of intervention is the study by Mason et al. [19]. Ninth graders were given three pages of information about how to evaluate the reliability of a website and the accuracy of its contents. The written material included the “SEEK” (Source, Evaluation, Explanation, Knowledge) [10]. This material illustrated three main criteria for evaluating how reliable or credible websites are in the form of three questions: Who is the author? How reliable is the information? How well does the site explain the information? Readers were instructed to evaluate: for the Source (author), whether the authors were knowledgeable about the topic and their motivation; for the Evaluation of information, whether it was based on scientific evidence and whether similar information was provided across credible sources; for the Explanation of the information, whether they understood what the site said about the topic and whether the explanation corresponded with the scientific knowledge that they might have had, or with information given by other credible sources [10].

Participants in this study had the opportunity to practice source evaluation in a basic inquiry task on a given topic. A worksheet was given to them, which included questions to prompt them to use the SEEK criteria in source evaluation. Next, they had to apply source evaluation to the transfer inquiry task on a different topic. Results revealed that participants who had received the instructional material with declarative knowledge about source evaluation in the transfer task outperformed non-instructed participants when rank-ordering the webpages and justifying their ranking in the transfer task. Moreover, SEEK-instructed participants were better at both surface and deeper comprehension of the conflicting information.

3.2 Intervention Based on Contrasting Cases

The other type of effective intervention on source evaluation is focused on providing contrasting cases. Contrasting cases is a classroom-based instructional practice that is used to promote the acquisition of declarative or procedural knowledge. Pertinent to our study is the investigation carried out by Braasch et al. [20] that focuses on the effectiveness of contrasting cases regarding protocols about source evaluation strategies. One was a protocol of expert strategies that rely on advanced criteria for judging a source; the other is the protocol of non-expert strategies that appeal to low-level criteria, which are often adopted by naïve readers. Participants in the last year of upper secondary school were asked to decide which were the best strategies to evaluate multiple documents retrieved from the Internet, and to explain why. They received the instructional material including two to-be-contrasted student strategy protocols attributed to student A and student B for practice on a basic topic. For each strategy protocol, the more competent and more critical student expressed more sophisticated strategies regarding source evaluation, taking into account author, venue, type, and date of publication: “…I start with the authors to see whether they are knowledgeable about the topic” [20, p. 184]. In contrast, the poorer, less critical student exhibited low-level strategies that did not take into consideration source characteristics: “Since this title contains my key words, I know I can trust and use the information” (p. 184). Results confirmed the effectiveness of the intervention in supporting students’ rank-ordering and justifying their ranking by appealing to essential features of the sources. Moreover, the intervention favored the inclusion of more correct scientific concepts from the more useful texts when writing essays about the topic.

4 The Present Study

Current research indicates that students need to be instructed on the important criteria to use for evaluating the sources and information accessed on the Web. These criteria refer to the qualities that make a source reliable (i.e., expertise and unbiasedness) and information accurate (i.e., based on scientific evidence and corroborated by information from other reliable sites). Research also shows that relatively short-term instructional interventions on source evaluation, like those reviewed in the previous sections, can be effectively implemented in the context of upper-secondary school, improving multiple-document comprehension as well.

To extend current research, we tested the effectiveness of the two aforementioned types of short-term instructional interventions as we do not know if the two types of intervention are equally effective or if one is superior to the other, in particular in the case of lower-secondary school students. We focused on a two-lesson intervention, instead of long-lasting interventions, because the former is more easily embedded in a curriculum and implemented in the natural classroom context. We involved younger adolescents who have started to be daily seekers of online information for various purposes, including school assignments. They need to be competent users of Internet-based information to follow up only reliable sources and construct knowledge from accurate information. As the participants were younger than those involved in the Braasch et al. [20] and Mason et al. [19] studies, the interventions took place in two sessions, rather than one, to give them more time to practice source evaluation on basic tasks.

The following research questions (RQ) guided the study:

(RQ1) Are two interventions − one based on declarative knowledge on source evaluation and the other on contrasting cases of source evaluation − equally effective in promoting students’ reliability judgments of various online documents?

(RQ2) Are the two interventions also equally effective in supporting multiple-document comprehension?

Based on the extant literature [19, 20], for RQ1, we hypothesized that students in both intervention conditions would express greater source evaluation skills than students in the control condition. Specifically, we expected that both intervention conditions would promote more reliability judgments based on source characteristics compared to the control condition for both high and low reliability documents (Hypothesis 1a). In addition, we expected a difference between the two intervention conditions in favor of the contrasting cases (Hypothesis 1b). The reason for this is that they acknowledge and target inappropriate source evaluation strategies often adopted by students and make what should and should not be considered in source reliability judgments more salient. In other words, the comparison/contrast process gives learners the opportunities to form a more differentiated knowledge base, which promotes the identification and interpretation of salient features in novel contexts [20]. On the other hand, compared to the control conditions, participants in the declarative knowledge condition would benefit from explicit instructions on what to consider in source evaluation for reliability. However, this advantage would be inferior to that promoted by the contrasting case situation, as the students would not be confronted with two concrete reliability judgments.

For RQ2, we considered that the link between source evaluation and multiple-document comprehension is not only theoretically legitimate [21], but also empirically documented [4, 9, 28]. Our hypothesis, therefore, was that students in both intervention conditions would also outperform those in the control condition for multiple-text comprehension (Hypothesis 2a), with a superior advantage for those who were provided with contrasting cases of source evaluation for reliability (Hypothesis 2b). This is because they would scrutinize source features and it would be easier for them to reconcile contrasting information in an integrated representation of multiple documents on the same topic [20].

5 Method

5.1 Participants and Design

One hundred sixty-one 8th graders (Mage = 13.44, SD = 0.7; girls = 79) from two lower-secondary schools participated in the study and were assigned to one of three conditions: intervention on declarative knowledge (DK, n = 51), intervention on contrasting cases (CC, n = 58), and no intervention or control (C, n = 52). Each condition included three classrooms in total (from both schools). All participants were native-born Italian and shared a middle-class socio-economic status. They voluntarily participated upon written parental consent. For practical constraints relating to school organization and classroom management, we used a quasi-experimental design with random assignment to conditions at the level of intact classrooms.

5.2 Interventions on Source Evaluation

Both interventions took place in two sessions, with the first about genetically modified food and the second about dinosaur extinction. At the beginning of the first session of both interventions, for 5 min the instructor introduced the issue that the Internet provides many types of document and readers must consider their reliability in order to understand the contents well. To reduce researcher bias, a scripted session plan was prepared for each of the intervention conditions and the control condition.

DK Intervention Condition.

After the 5-min introduction, in this intervention condition students were informed that they would read material regarding source reliability. They were then provided with three pages of declarative information about how to evaluate the reliability of a website and the veracity of its content. The material was taken from Wiley et al.’s [10] instructional unit on Source, Evaluation, Explanation, Knowledge (SEEK). The declarative material explained that three main criteria are to be used when evaluating the reliability of a website, specifically: (1) Who is the author? (2) How reliable is the information? (3) How well does the site explain the information? (pp. 1098–1099).

In the first session, the documents read were taken from two websites about GM food, one being more reliable than the other. One document was written by a professor of agrarian microbiology, published in an online journal (414 words); the other was taken from a site of gourmet food and wellbeing (414 words). In the second session the documents read were about dinosaur extinction, again one being more reliable than the other. One was taken from an information site on dinosaurs, which was written by its webmaster (365 words), while the other was taken from the online bulletin of the national institute of astrophysics and written by a scientific journalist (365 words).

For both topics, the order of appearance of the most and least reliable document was counterbalanced. In both intervention sessions, students were given a worksheet for each site, with questions about the author, information, and explanation. The worksheets were intended to support students’ practice in the use of the provided declarative information on what to consider when evaluating an information source for reliability.

CC Intervention Condition.

After the 5-min introduction based on Braasch et al. [20, p. 186], in this intervention condition students were informed that they would read about the different strategies that two students from another lower-secondary school used to evaluate the source reliability of two documents on a specific topic. Participants were asked to compare and contrast the two students’ strategies to distinguish what good readers do differently from poor readers when learning from information on the Internet. Participants were also instructed to take into account that one of the two students, who made comments on the documents, used better strategies than the other. The aim was to guide students to understand why some strategies are better than others.

In the first and second sessions, students were presented with the same two documents used in the DK intervention. For each topic, participants read the two documents and a pair of fictional students’ protocols that reported the strategies used by student A and student B when evaluating the documents. Participants did not receive any clues regarding which student adopted the better strategies. To exemplify, in the first session on the topic of GM food, the more critical student started by saying: “I have seen that the first is written by an agriculture professor who teaches at the university, while the second is written by gourmets. Therefore, only the first is written by a competent person, who is real expert in the topic”.

In the second session on the topic of dinosaur extinction, the less critical reader started by saying: “I have noticed that the first text is of the online bulletin of the institute of astrophysics, while the second is a site about dinosaurs. I am very interested in dinosaurs, and I always liked to read about them and watch cartoons on these animals. The site on dinosaurs is easier and more interesting…I like the idea that a big asteroid has fallen on the earth and caused a big crater. I do not like the idea that also volcano eruptions matter”. The ‘better’ and ‘poorer’ students’ protocols were of the same length (208 words).

Participants were also given a worksheet that asked to identify which student used the better strategies for evaluating the two documents and selecting the most reliable, as well as explaining as clearly as possible the reasons why they were the better strategies. The worksheets set out to support students’ practice in source evaluation through the identification of the features to be considered when evaluating an information source for reliability. In the last ten minutes of the session, the instructor wrote on the blackboard student-generated suggestions about good and poor strategies in document evaluation.

Control Condition.

Participants were provided with the same 5-min introduction as in the two-intervention conditions. They also read the same documents on GM food and dinosaur extinction during the same class periods, but no intervention on source evaluation was implemented. In other words, participants read exactly the same documents as those in the intervention conditions to be exposed at the same controversial topics, so they were not disadvantaged in terms of experience with different points of view on the same topic. However, the importance of source evaluation was not emphasized to them, nor were they provided with any specific information on source evaluation.

5.3 Application-Task Materials and Procedure

The material used in the application-task session, which was the same in all conditions, consisted of four documents read on a computer screen in the school computer lab. Used in a previous study [30], the documents were taken from real sites (stored locally) and looked like the original; only the language was modified, in some cases, to make it simpler. They were balanced for reliability (low/high) and position on the debated topic of the potential health risks associated with the use of mobile phones. The two higher reliability documents were: (a) the report of an interview with a scientist, an expert in molecular biology, published in the science section of a newspaper and explaining the biological impact of radiations from mobile phones and (b) the report of a pediatrician, published on the site of the national association of pediatrics and stating the inconclusive nature of current scientific evidence, especially for children, but recommending many precautions. The two lower reliability documents were (c) the report of a webmaster, published on an online magazine on mobile phones, describing inconclusive data and wondering who would give up using the mobile phone and (d) the personal blog of an unknown supporter of natural life, describing various serious health problems caused by mobile phones.

At the beginning of each text, information about the author, credentials, and date of publication was provided. Each text included the same number of words (424 words). Participants were instructed to read the documents carefully at their own pace as they would then be asked to complete some tasks.

5.4 Pre-intervention Measures

Because of the quasi-experimental design of the study with random assignment to conditions of the level of intact classrooms, some individual differences were assessed to ensure that participants did not differ among conditions for a number of potentially interfering variables.

Topic Prior Knowledge.

Prior knowledge of the application topic was measured using five open-ended questions about electromagnetism and the potential health risks associated with the use of mobile phones, taken from Bråten et al. [9]. Answers to these questions were analyzed for content and scored 1 for each correct information unit mentioned (range: 0–3). A random selection of 80 students’ responses were scored by the second and third authors, resulting in an 89% agreement for all answers. Disagreements were resolved through discussion. The third author scored the remaining responses.

Topic Interest.

It was measured using a 10-item self-report scale about the value and importance of knowing more about the potential health risks associated with the use of mobile phones (Cronbach’s alpha = .84), examples item: “I like to be updated on the health consequences of the continuous use of mobile phones”.

Reading Comprehension.

It was measured using the Italian standardized test for the appropriate grade [31]. Participants read an informational text and answered 15 multiple-choice questions. One point was assigned to each correct answer.

Working Memory.

It was measured using the Italian version of the well-known Daneman and Carpenter [32] Reading Span Test, which evaluates the simultaneous processing and storage of unrelated information and is, therefore, considered a complex span text [33].

Perceived Competence in Online Information Search and Evaluation.

It was measured using a six-item self-report scale (Cronbach’s alpha reliability = .78). A sample item is: “I always do well when I look for useful online information for a school assignment”.

5.5 Post-intervention Measures

Multiple-document Comprehension.

It was measured with an essay task as in many previous studies [e.g., 9, 34, 35]. Participants were asked to write a short essay to judge the health risks of mobile phone use, based on the texts read. Following Bråten et al. [9] and Mason et al. [34], the essays were scored for sourcing and argumentation. For sourcing, we considered (a) the total number of explicit references to the four source documents and (b) the total number of source-content links, that is, explicit and implicit references to the four source documents that also mentioned content from those sources. A composite score was computed for sourcing.

For argumentation, we considered whether both perspectives were reported and justified, and the unresolved nature of the debated issue was acknowledged. The essays were scored 1–3: 1 point was assigned for essays that reported only one position on the debated topic, with no reference to the controversy; 2 points were assigned when the negative and the more ‘neutral’ positions on the topic were reported, with no reference to the ‘openness’ of the issue; 3 points were assigned when the two positions were reported, but also where there was a need for more scientific information. A random selection of 80 essays were scored by the second and third author, resulting in an overall 90% agreement on the essays. Disagreements were resolved through discussion. The third author scored the remaining responses using the same coding system.

Rank-Ordering.

Participants were asked to rank-order the four documents, from the most to the least reliable [4], assigning the value 1 to the website judged as being more reliable and the value 4 to the website judged as being the least reliable. Participants were awarded 1 point for ranking the two most reliable sites either as no. 1 or no. 2 and 1 point for ranking the two least reliable sites as either no. 3 or no. 4. When rank-ordering the documents, readers did not have the opportunity to look back at them but were provided with a randomized list of the URLs names.

Justification for Rank-Ordering.

Participants were asked to motivate their rank-ordering, providing one or more justifications. These were analyzed for content and a coding system was then used, inspired by the categories identified in previous studies [e.g., 36]. The categories were the following:

  • source characteristics (authors’ credentials based on expertise and authoritativeness), for example: “This is credible and authoritative for knowledge”;

  • personal opinion (information corresponds to the reader’ opinion), for example: “I disagree with what the site says”;

  • reference to other sources (information is corroborated by other sites), for example: “It is the only one to say that”;

  • reference to the content of the document read (information is easier to understand or interesting and appealing), for example: “This is more interesting and easier”.

For each category, the scoring was dichotomous: 1 point was assigned when a justification explicitly mentioned the specific aspect of the category; 0 points when it was not mentioned. A random selection of 80 students’ responses were scored by the second and third authors, resulting in an 87% agreement for all answers. Disagreements were resolved through discussion. The third author scored the remaining responses using the same coding system.

6 Results

6.1 Preliminary Analyses

Data were first screened for normality. Descriptive statistics showed that they did not substantially deviate from normality for skewness and kurtosis. A MANOVA was then performed to ensure that participants across conditions did not significantly differ for prior topic knowledge and interest, reading comprehension, working memory, and perceived competence in online information search and evaluation. The effect of condition did not emerge, Hotelling trace = .10, F(10, 306) = 1.70, p = .080. This outcome indicated that the participants were comparable across conditions for all these potentially interfering variables. To proceed in the most parsimonious way, we did not consider the latter in the subsequent analyses. Descriptive statistics for individual differences as a function of condition are reported in Table 1.

Table 1. Descriptive statistics of all pre-intervention variables by condition

6.2 RQ1: Effectiveness of the Intervention for Source Evaluation

Rank-Ordering.

We first conducted an ANOVA with rank-ordering scores as the dependent variable. The effect of condition was not significant, F < 1. Scores were substantially similar across the DK (declarative knowledge, M = 1.37; SD = .66), CC (contrasting cases, M = 1.38; SD = .67), and control (no intervention, M = 1.29; SD = .57) conditions.

Justifications for Rank-Ordering.

We also qualitatively examined the justifications provided by participants to appropriately rank-order each of the four documents. Tables 2 reports frequencies and percentages of justifications for accurately rank-ordering the four documents as a function of condition. Non-parametric tests, specifically Kruskal-Wallis and Mann-Whitney, were then carried out to see whether statistically significant differences would emerge across categories and as a function of condition. As a measure of the effect size, for Kruskal-Wallis tests we used epsilon square (ε2) and for Mann-Whitney tests we used r [37, 38].

Justifications for Rank-Ordering Document No. 1. We examined whether there were statistically significant differences across conditions for the criteria used by participants to support their evaluation of a document as the most reliable. First, a Kruskal-Wallis test was carried out with Bonferroni correction applied, resulting in a significance level set at p < .012, given 4 pairwise comparisons (response categories). This test revealed significant differences for using judgments based on source characteristics [χ2 (2) = 22.39, p < .001, ε2 = .20] and its content [χ2 (2) = 10.08, p < .006, ε2 = .09]. Mann-Whitney U tests were then performed with Bonferroni correction applied, resulting in a significance level of .016, given 3 pairwise comparisons (conditions). As concerns the use of source characteristics for reliability judgments, participants in both DK (U = 345.00, p < .001, r = .42) and CC (U = 293.00, p = .009, r = .39) conditions outperformed those in the control condition, while the two intervention conditions did not differ from one other. As regards the reference to source content, students in the control condition used more this criterion than those in the DK (U = 394.00, p = .002, r = .42) and CC conditions (U = 484.00, p = .009, r = .25), while the two intervention conditions did not differ from one other.

Justifications for Rank-Ordering Document No. 2. A Kruskal-Wallis test did not reveal significant differences across conditions for the justifications provided by the participants to judge a document as the second more reliable among the four webpages.

Justifications for Rank-Ordering Document No. 3. A Kruskal-Wallis test revealed significant differences among the participants for using judgments based on source characteristics [χ2 (2) = 8.82, p = .010, ε2 = .09], and based on the content provided [χ2 (2) = 18.95, p < .001, ε2 = .20]. Mann-Whitney U tests showed that only participants in the CC condition used the crucial criterion of source characteristics more frequently than participants in the control condition (U = 275.00, p = .002, r = .29). Reference to document content in the reliability judgments was more frequent in the control condition than among participants in both DK (U = 261.00, p = .002, r = .30) and CC conditions (U = 200.00, p < .001, r = .41), while the intervention conditions did not differ from one other.

Justifications for Rank-Ordering Document No. 4. A Kruskal-Wallis test showed significant differences across conditions for participants’ judgments based on source characteristics [χ2 (2) = 14.72, p = .001, ε2 = .12]. Mann-Whitney U tests revealed that students in both DK (U = 544.00, p < .001, r = .35) and CC conditions (U = 648.00, p = .001, r = .31) took this crucial criterion more often into consideration than their counterparts in the control condition.

Table 2. Frequencies and percentages of justifications for rank-ordering the four online documents

6.3 RQ2: Effectiveness of the Intervention for Multiple-document Comprehension

We performed a MANOVA with the scores for sourcing and argumentation, as reflected in the essay task, the dependent variables being, overall, positively correlated (r = .20, p = .009). The MANOVA revealed an overall small significant effect of condition, Hotelling trace = .06, F(4, 312) = 2.51, p < .041, η2p = .03. Follow-up univariate analyses of variance indicated that differences emerged for argumentation, F(2, 158) = 4.80, p = .009, η2p = .06, but not for sourcing, F(2, 158) = .75, p = .471. Pairwise comparisons with Bonferroni correction revealed that both students in DK (declarative knowledge, p = .017, 95% CI [−.65, –.04]) and CC (contrasting cases, p = .033, 95% CI [−.61, −.01]) conditions outperformed students in the control condition. Participants who received an intervention on source evaluation, regardless of its type, generated greater arguments compared to their counterparts who were not exposed to any intervention. Table 3 reports the scores for sourcing and argumentation as a function of condition.

Table 3. Descriptive statistics for post-reading variables of sourcing and argumentation by condition

7 Discussion

Our first research question (RQ1) asked whether two classroom interventions are equally effective in promoting students’ reliability judgments of various online documents. In accordance with hypothesis 1a, the analysis of the justifications provided to motivate the rank-ordering of the four sites revealed that the students in both intervention conditions were more able to appeal to the characteristics of the source (expertise and authoritativeness) than students in the control condition when appropriately judging the most (no. 1) and least (no. 4) reliable site, while the two intervention conditions did not differentiate. For the second least reliable (no. 3) site, only students in the CC intervention condition considered more the quality of the source than those who did not receive any intervention.

When differences emerged for appealing to personal opinions, they indicated that the latter students used this criterion more than their counterparts in the intervention conditions. Such criterion can lead to myside or confirmation bias, that is, to judge a document as reliable or unreliable regardless of the real authoritativeness of the web source, but only because it is aligned (or not) with the reader’s own opinion on a controversial topic [39]. It is worth noting that very few students in all conditions appealed to corroboration across sources to justify their rank-ordering. A plausible explanation is that there were few documents available, and the same position on the topic was shared by only two online resources, one of which was more reliable and the other less so. Thus, this important aspect for epistemic source evaluation my have not been salient enough in the current study.

The two types of short-term intervention did not differ in rank-ordering and from the control condition. This outcome is not aligned with Mason et al. [19] intervention study that involved upper-secondary school students providing declarative knowledge about source evaluation. However, in that study 9th graders were asked to rank-order nine documents, while in the current study there were only four documents to be rank-ordered by 8th graders. A possible explanation for the lack of significant differences across conditions in the current study is that the task may have been too easy due to the lower number of documents. An alternative explanation is that the documents were easy to discriminate, even by 8th graders, who could perceive their different levels of reliability. Recent research on source evaluation has documented low rank-ordering skills in 7th graders when they were provided with six online documents [33]. Given the mixed results in the literature, in future research both the number of documents and degree of differentiation for source reliability should be taken into account when using a more challenging set of materials to be evaluated by adolescents.

The finding about the lack of differences for rank-ordering is also in contrast with the results of the Braasch et al. [20] study that involved 13th graders, who were provided with two contrasting cases of source evaluation. However, in the Braasch et al. [20] study, students exposed to the intervention were better able than the control students to discriminate six documents for usefulness, not reliability. Judgments of the utility of a document to learn about a topic are easier to make than judgments of the reliability of a source, as the former are strictly related to the content provided. In contrast, the latter refer to the ‘metadata’ regarding author credentials, which may not receive particular attention during document processing, as indicated by studies with older students [40], even when they are explicitly asked to evaluate the information about sources [41], or with middle school students, even when they are specifically prompted to consider source competence [16].

Our second research question (RQ2) asked whether the two interventions were also equally effective in supporting multiple-document comprehension. Results confirmed hypothesis 2a as both intervention conditions led to greater argumentation about the controversial topic than the control condition. This finding is aligned with previous research on high-school students’ multiple-text comprehension after a short [19] and long-term source intervention [17]. Readers’ essays did not differentiate as a function of condition for sourcing (number of sources and source-content links). This outcome may be related to the previous outcome on rank-ordering. As students were equally able across conditions to discriminate the online sources, although not always appealing to advanced criteria, in their written essays they did not differentiate for citing and relating the sources to their contents, but they did for forming a coherent and integrated representation of the contrasting information.

For both research questions, contrary to our hypotheses 1b and 2b, a superior advantage of providing contrasting cases over declarative knowledge did not emerge. The mechanisms underlying the two interventions are different. In the direct, more ‘traditional’, prescriptive intervention students bear in mind what to do and try to apply it. In the contrasting cases intervention, students are stimulated to solve an information problem, scrutinizing and identifying the best of two strategies without any direct prompt. Either one or the other mechanism has potential for sustaining, at least to some extent, sourcing and comprehension of conflicting information in eighth graders.

To sum up, the findings of this study do not allow us to determine if one of the two short-term interventions on source evaluation should be preferred to the other. By either allowing students to know directly what to look at, or stimulating reflections on different evaluation strategies, a consideration of source characteristics and a comprehension of multiple documents were encouraged. What we can expand from the current data, therefore, is that the two types of short-term intervention can be implemented in the real context of lower-secondary school classrooms and are substantially effective, at least to some extent and in a short time frame.

8 Limitations

Some limitations should be taken into account when interpreting the results of the current study. First, although both interventions favored reliability judgments based on digital source characteristics and multiple-text comprehension compared with a control condition, the effect sizes are small. Future research with more participants in each condition may provide more solid results. Second, we did not use process data to understand the underlying mechanisms that can explain the benefits of each type of intervention. Studies based on eye-tracking methodology, for example, can reveal if and where the interventions induce greater allocation of attention while reading the various materials provided to support source evaluation. The analysis of process and outcome findings will allow an in-depth understanding of the link between reading behavior and offline sourcing and argumentation. Third, it remains open to future studies if, and to what extent, what students learned during the intervention is also long-lasting, and therefore used over time when they spontaneously retrieve information, evaluate, and comprehend online documents for inquiry purposes.

9 Conclusions

Despite these limitations, the study has significance as it expands the evidence that two types of short-term intervention on source evaluation, which can be easily implemented in the classroom, are to some extent effective. The effectiveness of the interventions emerges in supporting students to appeal to the epistemic characteristics of information sources when judging the reliability of digital documents, as well as in constructing an integrated and comprehensive representation of multiple documents. Overall, the study indirectly contributes to the debate on the degree of guidance that promotes better learning, supporting the idea that either direct or guided instruction works [42].

Practical implications can also be drawn from the study as the critical use of contemporary digital media emphasizes the need to support students in becoming thinkers who are epistemically vigilant [43] and able to reason well in an information-saturated world steeped in socio-scientific issues that matter greatly in our society [44]. If students are equipped with source evaluation skills to be critical consumers of information, they will more likely be future citizens able to act as rational thinkers in decision-making processes that impact the individual and societal life. The results highlight the importance and feasibility of providing students with source evaluation skills from the early years of secondary education. Moreover, alternating one and the other kind of intervention to sustain sourcing when dealing with complex socio-scientific issues may represent a potentially effective pedagogical strategy. It combines direct and indirect instruction, depending on situational contexts, in the service of students’ involvement and cognitive performance in a crucial task in the Google era.