Questionnaire scoring
The comprehension questionnaire responses were coded following procedures adapted from the methods described in detail by Ecker et al. (2017).Footnote 7 In each testing location, two scorers applied the same systematic set of scoring criteria and consulted on ambiguous cases to code the five dependent variables described below for each condition. Scores for the two control NR scenarios presented to each participant were averaged yielding three scores on each dependent variable: NR, RNR, and RER.
The general memory score was calculated for each scenario based on (i) the number of correct idea units (including themes and details from Ecker et al.’s, 2017, criteria; maximum of four) included in the participant’s open-ended response, and (ii) the number of correctly answered multiple-choice questions (maximum of three). These two components were summed and averaged to form a score ranging from 0 to 1, where 1 indicated perfect recall. The idea units coded for this score did not mention the critical cause or its alternative.
Memory for the critical information that distinguished the retraction conditions was assessed by three scores calculated using the procedures followed by Ecker et al. (2017). Separate scores of 0 or 1 were assigned according to whether the participant’s initial open-ended response referred to (i) the original critical cause from the first passage (e.g., the fire was caused by arson); (ii) the alternative cause presented in the second passage of the RNR and RER conditions (e.g., the fire was caused by lightning strike); and (iii) the retraction or change of causal information (e.g., it was initially thought the fire was caused by arson, but investigation revealed it was actually caused by lightning strike). The two latter scores were not calculated for NR passages because they did not include an alternative cause.
The inference score was calculated from the three rating-scale questions that assessed judgements related to the critical information. These ratings were summed and converted to a score ranging from 0 to 1 where a high score indexed a stronger influence of the original misinformation on inferences about the implications of the scenarios (e.g., a high rating to “How mistrustful would local residents be after the bushfire?”).
Main analyses
As summarized in Table 1, the Australian and Chinese samples were demographically similar and the random allocation to different reading formats resulted in groups that did not differ significantly on age, gender, and, for the Australian sample, reading proficiency (all ps < .05). Mean scores for both language groups on each dependent variable in each condition are presented in Table 2.
Table 2. Mean (and standard error) scores on general memory, critical memory, and inference for Australian and Chinese samples across retraction and format conditions Omnibus analyses of variance (ANOVA) were conducted on each of the dependent variables treating retraction condition as a within-subjects factor, and format and group as between-subject factors. Follow-up contrasts were carried out to decompose significant main effects and interactions. For the retraction factor, two contrasts were tested (i) the retraction effect: comparison of the control NR condition with the average of the two retraction conditions (RNR and RER), and (ii) the reminder effect: the difference between the RNR and RER conditions. Interactions involving group were followed up by separate analyses of each language group.
General memory scores
There was a significant main effect of retraction condition on general memory scores, F(2, 472) = 15.98, p < .001, ƞ2 = .033. Follow-up contrasts showed a significant retraction effect—higher general memory scores for the NR than for the average of the RNR and RER conditions, F(1, 472) = 32.69, p < .001, ƞ2 = .065—but no significant difference between the latter two conditions (p = .30)—that is, no significant reminder effect. The main effect of Format was not significant (p > .05). However, there was a significant main effect of group because Chinese participants showed significantly higher general memory scores than did Australian participants, F(1, 472) = 22.66, p < .001, ƞ2 = .046. There was also a significant interaction between format and group, F(1, 472) = 6.78, p = .009, ƞ2 = .014. A separate analysis of the Chinese group alone showed that general memory scores were significantly higher in the paper than in the mobile condition (M = 0.603 vs. M = 0.556, respectively), F(1, 472) = 6.44, p = .012, ƞ2 = .028. In contrast, the small difference between memory scores for the paper and mobile conditions (M = 0.513 vs. M = 0.526, respectively) observed in the Australian group was not significant (F < 1).
Critical memory scores
Critical information
The critical information scores showed a significant overall effect of retraction condition, F(2, 472) = 12.06, p < .001, ƞ2 = .025, that reflected both a significant retraction effect, F(1, 472) = 6.31, p = .012, ƞ2 = .013, due to poorer memory for the critical cause from the NR scenarios than for the average of the RNR and RER conditions; and a significant reminder effect, F(1, 472) = 16.75, p < .001, ƞ2 = .034, due to superior memory in the RER than in the RNR condition. The main effect of Format was not significant (F < 1), and format did not significantly interact with retraction condition, F(2, 472) = 2.34, p = .096. Paralleling the general memory scores, there was a significant main effect of group, F(1, 472) = 22.66, p < .001, ƞ2 = .046, which reflected significantly higher recall of the critical information in Chinese than Australian participants (M = 0.677 vs. M = 0.488, respectively). Group did not significantly interact with either retraction or format condition (both Fs < 1).
Alternative cause and retraction
The alternative cause was not presented in the NR passages, so these analyses were restricted to the RNR and RER conditions. Recall of the alternative cause was not significantly affected by retraction condition, F(1, 472) = 2.90, p = .089; group, F(1, 472) = 1.79, p = .181; or format (F < 1). However, memory that the original critical cause was retracted or changed was significantly affected by retraction condition, F(1, 472) = 61.87, p < .001, ƞ2 = .116, because retraction recall rate was higher in the RER condition. Format significantly interacted with Retraction condition because the higher recall of the retraction in the RER than RNR condition was significantly larger in the mobile condition F(1, 472) = 6.30, p = .012, ƞ2 = .013. There was also a significant main effect of Group, F(1, 472) = 4.06, p = .044, ƞ2 = .009, which, unlike the general and critical information scores, reflected higher retraction recall in the Australian than in the Chinese sample (M = 0.469 vs. M = 0.404, respectively).
In summary, memory for general details was better for the control NR passages than for the two retraction conditions, but memory for the critical causal information was poorer for the NR passages than for the retraction conditions, and poorer in the RNR than in the RER condition. Although memory for the alternative cause did not differ between the RNR and RER conditions, participants were more likely to report the retraction/change in causal information for the RER condition, which included an explicit reminder of the original information. There were no significant main effects of format on any memory measure, but the retraction memory scores showed a significantly larger reminder effect in the mobile than in the print condition. Finally, there were significant effects of language group on all memory measures except the alternative cause: Chinese participants showed better memory for both general information and the original critical cause than for the Australian participants, but were less likely to report the retraction/change in causal information.
Inference scores
Analysis of the inference scores yielded a significant main effect of retraction condition, F(2, 472) = 159.57, p < .001, ƞ2 = .253. The follow-up contrasts showed that this reflected both a significant retraction effect, F(1, 472) = 391.67, p < .001, ƞ2 = .453, and a significant reminder effect, F(1, 472) = 10.47, p = .001, ƞ2 = .022, due to lower inference scores in the retraction conditions, particularly RER. The main effect of Format was also significant, F(1, 472) = 14.46, p < .001, ƞ2 = .030, because inference scores were significantly lower for the paper than for the mobile condition. There was also a significant interaction between retraction and format condition, F(2, 472) = 4.33, p = .013, ƞ2 = .009. The follow-up contrasts indicated that this reflected a larger retraction effect in the paper than in the mobile condition, F(1, 472) = 9.72, p = .002, ƞ2 = .020, while the reminder effect did not significantly differ between formats (F < 1).
The main effect of Group was significant, F(1, 472) = 70.94, p < .001, ƞ2 = .131, due to lower inference scores for the Australian than for Chinese participants. Group also participated in two significant interactions on inference scores. Firstly, group significantly interacted with retraction condition, F(2, 472) = 14.36, p < .001, ƞ2 = .030, with follow-up contrasts indicating that the retraction effect was smaller in the Chinese than in the Australian group, F(1, 472) = 36.44, p < .001, ƞ2 = .072. Secondly, group interacted significantly with format, F(1, 472) = 3.96, p = .047, ƞ2 = .008, because the difference in inference scores between paper and mobile format was greater in Chinese than in Australian participants. A separate analysis of the Chinese sample alone confirmed that they showed a significant interaction between retraction and format condition, F(2, 227) = 5.01, p= .007, ƞ2 = .022, which was due to a stronger retraction effect in the print than in the mobile condition, F(1, 227) = 13.49, p < .001, ƞ2 = .056. Neither of these interactions was significant in the Australian sample (both Fs < 1.47).
In summary, inference scores were significantly reduced by retraction of the original information, demonstrating that these judgements were sensitive to participant’s perceptions of the cause of the events described in the passages. Inference scores were also significantly lower in the paper than in the mobile condition, particularly in the two retraction conditions, demonstrating a stronger CIE in the mobile condition. Inference scores were also modulated by language group: Chinese participants showed higher average inference scores than did Australian participants, particularly in the retraction conditions, indicating that the Chinese sample showed a stronger CIE that was most marked when they read the passages on a mobile phone.
Supplementary analyses
Two supplementary sets of analyses of covariance (ANCOVA) were conducted. The first assessed whether general memory performance modulated the critical memory and inference scores, while the second evaluated the contribution of proficiency to performance of the Australian sample.
General memory
The Chinese participants showed significantly better general memory for the passages. To determine whether differences in general memory ability contributed to the stronger CIE effects evident in critical memory and inference scores for the Chinese sample, ANCOVA analyses were conducted on these dependent measures including (centred) general memory score as a covariate (see Figs. 1 and 2).
The ANCOVA analyses of memory for the critical information showed that general memory significantly predicted scores for the critical cause, the alternative cause, and retraction (all ps < .001). There was no change in the significant effects of retraction condition and group on memory for the critical cause or retraction when general memory was controlled. However, including it as a covariate in the analysis of alternative cause revealed a significant main effect of group, F(1, 470) = 12.50, p < .001, ƞ2 = .026, that was not observed in the main analysis. This occurred because, when general memory performance was controlled, recall of the alternative information was significantly higher in the Australian than in the Chinese sample (M = 0.61 vs. M = 0.49, respectively).
A parallel ANCOVA analysis of inference scores showed that general memory scores did not significantly moderate inference scores (F < 1). All main effects and interactions as well as follow-up contrasts remained significant when general memory scores were partialed out (ps < .05). However, the interaction between group and format was only marginally significant in the ANCOVA, F(1, 471) = 3.67, p = .056, ƞ2 = .008.
Thus, the ANCOVA analyses suggest that Chinese participants’ generally better memory for the passages extended to the original critical cause, but that their memory for the alternative cause was poorer than expected from their general memory performance. However, general memory did not significantly modulate the effects of language group on inference scores.
Reading proficiency
The second set of ANCOVAs, which were limited to the Australian sample, included participants’ standardized overall proficiency scores as continuous covariate in analyses of each dependent variable to determine whether the memory or inference scores were modulated by reading proficiency as assessed by the combined vocabulary and ART scores.
Proficiency predicted significant variance in all measures (all ps < .001), reflecting higher average memory for both general and critical information, and lower inference scores, in higher proficiency participants. However, controlling proficiency did not significantly modulate the pattern of effects of format or retraction condition, and the only significant interaction involving proficiency was a significantly larger increase in memory for the retraction in the RER relative to the RNR condition in higher proficiency participants, F(1, 244) = 5.16, p = .024, ƞ2 = .021.