Introduction

Selective attention is the ability to focus on goal-relevant information while ignoring or suppressing task-irrelevant information (Murphy et al., 2016). In everyday activity, selective attention is essential for cognitive function, especially in complex or noisy environments where there is potential for distraction. In a variety of real-world situations, a common source of distraction is background noise. As one might expect, background noise disrupts attention and impairs overall cognitive performance (Klatte et al., 2013). This is rather intuitive, which is why most people avoid noisy environments when engaged in cognitive tasks that require attention. However, our own personal real-life experiences may sometimes be inconsistent with this finding. This is because the extent to which performance is impaired depends on the task, the type of background noise, and the individual. For example, when reading an interesting novel or playing a video game, a person can become so engaged that they become oblivious to their surroundings. In contrast, when listening to a boring lecture, a person may be easily distracted by rather mundane irrelevant features of their environment. Taken together, this suggests that different task features, different kinds of background noise, and individual differences in cognitive ability may all have an impact on selective attention and, in turn, task performance.

The load theory of attention and cognitive control

According to the load theory of attention and cognitive control (Lavie, 2001; Lavie, 2010; Lavie et al., 2004), selective attention is influenced by the level and type of processing demands of the current task. Specifically, the extent to which people can focus attention depends on the level (high or low) and type (perceptual or cognitive) of load involved in the task. The theory, which combines traditional early-filter(Lamb, 1991; Treisman, 1969) and late-filter(Duncan, 1980; Norman, 1968) approaches to selective attention, may explain the aforementioned combination of factors that influence cognitive task performance. For instance, according to the theory, “early filter” performance is typically observed in tasks that involve high perceptual load. This is because perceptual processes are fully engaged by task-relevant information; therefore, irrelevant information is less likely to enter the information-processing system. By contrast, “late filter” performance is typically observed in tasks with low perceptual load, especially when combined with high cognitive load. In this case, task-irrelevant information is more likely to enter the system and subsequently more likely to cause interference because blocking or suppressing irrelevant information is effortful and requires executive attention processes involved in working memory (Lavie & De Fockert, 2005). Thus, the combination of low perceptual load and high cognitive load leads to “late filter” results in which people perceive but then attempt to inhibit task-irrelevant information (Deutsch & Deutsch, 1963).

Furthermore, the extent to which task-irrelevant information disrupts performance varies by individual. Previous research suggests that working memory capacity (WMC) is correlated with selective attention, such that individuals with low WMC are more susceptible to distraction than individuals with high WMC (Kane et al., 2001). This finding is consistent with load theory, especially when the primary task has low perceptual load (Konstantinou & Lavie, 2013). Load theory specifically predicts an interaction between perceptual load and individual differences in WMC, such that attention failures should be more dependent on WMC in tasks with low perceptual load. Indeed, Forster and Lavie (2007) found that attention failures were associated with individual differences in distractibility, but only when perceptual load was low.

Perceptual disfluency and the shield effect against distraction

A recent line of research indicates that a perceptual disfluency manipulation, such as a hard-to-read font in a verbal task, can help people focus on a task and enhance performance. It was originally argued that such a manipulation may improve performance by increasing an individual’s metacognition, which activates more analytic processing and reduces attention-control failures (Alter et al., 2007). This counterintuitive disfluency effect has been demonstrated in multiple studies (e.g., Alter, 2013; Diemand-Yauman et al., 2011) but has not been consistently replicated under different settings and tasks (Meyer et al., 2015; Xie et al., 2018).

These discrepant results are postulated to stem from moderators of performance arising from task materials or due to individual differences in cognitive ability (Eitel et al., 2014; Lehmann et al., 2016; Oppenheimer & Alter, 2014). Indeed, a few studies have extended the original “task engagement” hypothesis by testing for an effect of perceptual disfluency on selective attention against distraction, while also considering the role of individual differences in cognitive ability. For example, Halin et al. (2014) tested whether a hard-to-read font could shield attention from background noise during a reading task, and, if so, whether this “shield effect” would be moderated by individual differences in WMC. They found that reading performance was impaired by the presentation of background noise, but only when the materials were in an easy-to-read font. Furthermore, in the easy-to-read font condition, the distracting effect of background noise (speech vs. silence) was negatively correlated with WMC. However, in the hard-to-read font condition, the correlation with WMC was not significant. Researchers have interpreted the disfluency manipulation as facilitating selective attention and shielding against auditory distraction produced by background noise (Hughes et al., 2013).

The effect of disfluency on selective attention and task performance is consistent with load theory. Disfluent task conditions, relative to fluent conditions, are considered to involve higher perceptual load. As discussed above, high perceptual load is associated with “early filter” performance; perceptual processes are fully engaged by task-relevant information, and therefore, irrelevant information is less likely to enter the information-processing system. For individuals with higher WMC, this beneficial effect of perceptual disfluency may not be significant because they already have the ability to focus attention on the target task. In contrast, individuals with lower WMC should benefit from the high perceptual load.

Unresolved issues of the shield effect related to load theory

However, some unresolved issues remain to be addressed in the load theory account of the shield effect of perceptual disfluency against distraction. First, although described as “perceptual disfluency,” it is uncertain whether all kinds of disfluency manipulations on text fonts introduce greater perceptual load in a reading task. Lavie and de Fockert (2003) indicated that perceptual load manipulations differed from general target-stimulus degradation. According to their results, general target-stimulus degradation, such as texts with reduced contrast, introduces sensory load but not perceptual load, and therefore would not engender early filter results but instead would generally increase distraction. According to this distinction, some common disfluency manipulations such as text contrast, size, or color may only introduce sensory load. On the other hand, a hard-to-read font in a reading task may indeed increase perceptual load because additional perceptual operations need to be carried out when reading text with a hard-to-read font type. Some studies have shown that a hard-to-read font in a reading task may require more perceptual processing of the target text, which leads to fewer attention-control failures and better performance (Forster & Lavie, 2009; Sörqvist & Marsh, 2015). For example, Faber et al. (2017) found that mind-wandering was less frequent when subjects read text in a hard-to-read font, which suggests that a hard-to-read font may increase perceptual load, and, in turn, reduce attention-control failures during the task.

A second issue to be addressed is that different types of background noise may introduce different levels of cognitive load, and, in turn, cause more or less distraction. Recent research on cross-modal attention indicates that background noise with different acoustic and semantic properties may interfere with the target task differently. It was indicated that the distractibility of auditory stimuli on visual tasks is dependent on the extent to which the cognitive processes involved in the visual task overlap with the processes involved in interpreting the auditory stimuli (Hughes, 2014). For instance, during a memory task that involved free recall of word lists, irrelevant speech that was semantically related to the to-be-recalled words was found to impair recall more than irrelevant speech that was semantically unrelated (Marsh et al., 2008). According to the duplex-mechanism account of auditory distraction (Hughes, 2014; Marsh et al., 2020), auditory distraction interferes in a focal task through two distinct mechanisms: the interference-by-process mechanism, in which auditory distraction interferes due to a conflict between the cognitive processes engaged in the focal task and the processing of distraction, and the attention diversion mechanism, in which auditory distraction interferes due to the violation of predictions to the auditory scenes (Hughes & Marsh, 2019). Thus, a semantically meaningful background noise would interfere in a reading task more than a semantically meaningless one and introduce a higher level of cognitive load. For example, Halin (2016) had subjects complete four reading comprehension tests in either a hard-to-read font or an easy-to-read font, with four different conditions of background noise as auditory distraction (Speech, Road traffic, Aircraft, or Silence). Consistent with Halin et al. (2014), an interaction was found between perceptual difficulty and auditory distraction. In the easy-to-read font condition, content-related speech significantly impaired reading comprehension and led to significantly lower scores relative to the other three background noise conditions.

Aside from the two unsolved issues related to the load theory account, the findings from Halin et al. (2014) may also lead to some interesting follow-up questions on methodology. First, Halin et al. measured WMC with only one task, so it is not clear if distraction is associated with a domain-general WMC or a more task-dependent cognitive ability. Although it is still an ongoing debate whether the processing, storage, and recall requirements are common or specific across different working memory tasks, a balanced procedure that samples from tasks with both verbal and spatial modalities is preferred (Oswald et al., 2015). Second, in the Halin et al. study, the two correlation coefficients (between WMC and reading comprehension) reported in the easy-to-read font condition and the hard-to-read font condition (r = -.35 and r = -.05, respectively) were not significantly different. Therefore, it is not clear from the results whether the relationship between the distraction effect and WMC was indeed moderated by the font manipulation. Third, perceptual disfluency was manipulated within-groups, which means that subjects read texts in both font conditions and were aware of the font changes across the two conditions. Thus, the effect of perceptual disfluency in the study might have resulted from a mixture of perceptual load and subjects’ awareness of the manipulation. A within-groups design is not recommended “when juxtaposition of treatments enhances perception of treatment variations if such perceptions can interfere with the processes the researcher desires to study” (Greenwald, 1976, p. 317). Finally, according to Halin (2016), different types of background noise can lead to different distraction effects; but in Halin et al. (2014), without a content-unrelated distraction condition (i.e., meaningless speech condition in which background noise does not share any semantic property with the reading task), it was impossible to infer whether the distraction effect was caused by the mere presentation of noise (acoustic distraction) or by the semantic content of the noise (semantic distraction).

The current study

The current study replicates and extends Halin et al. (2014) by investigating the effect of auditory distraction, perceptual disfluency, and individual differences in WMC on reading comprehension. This study improves upon previous work by using a more comprehensive experimental design in which perceptual disfluency is manipulated between groups, a meaningless background noise condition is included, and two complex span tasks (verbal and spatial) are used to measure WMC.

Based on the load theory of attention and cognitive control, our first hypothesis is that perceptual disfluency introduces high perceptual load to the reading comprehension task and predicts an interaction between perceptual disfluency and auditory distraction. That is, reading comprehension will be impaired by background noise in the easy-to-read font condition but not in the hard-to-read font condition. This result would provide evidence for the “shield effect” (Halin, 2016; Halin et al., 2014). No specific prediction is made regarding the difference between the two background noise conditions, but a difference would indicate that semantic meaning and acoustic features of auditory distractions are perceived and processed differently. If the acoustic distraction effect exists, reading performance in the no-noise condition would be different from that in the two noise conditions; and if the semantic distraction effect exists, reading performance in the content-related noise condition would be different from that in the content-unrelated noise condition.

The second hypothesis is that WMC moderates the shield effect, such that individuals with low capacity will exhibit a stronger shield effect than those with high capacity. This hypothesis would be supported by a three-way interaction, namely the two-way interaction of perceptual disfluency and auditory distraction moderated by WMC. Specifically, it is hypothesized that the difference in performance between the content-related speech condition and the no-noise condition will be negatively correlated with WMC in the easy-to-read font group but will not be significantly correlated with WMC in the hard-to-read font group.

Method

Subjects

A series of power analyses were conducted to determine sample size. In order to detect a significant difference of .30 between two standardized regression slopes (as was reported in Halin et al., 2014), it was estimated that N = 90 would yield power = .80, while N = 120 would yield power = .90. Taking the results of the power analyses into consideration, a total of 126 subjects were recruited. Subjects were fluent English speakers and were all self-reported to have a reading ability higher than grade 11. Subjects were also required to have normal or corrected-to-normal vision and no hearing loss to participate based on self-report. All subjects were between the ages of 18 and 45 years (M = 23.48, SD = 5.25); 80 were female and two specified other genders or did not report gender.

Design and procedure

The experiment used reading comprehension for prose passages as the primary task and human speech as auditory distractors. The design was a 2 (Perceptual Disfluency: hard-to-read font or easy-to-read font) × 3 (Auditory Distraction/Background Noise: content-related speech, meaningless speech, and no noise) mixed factorial. Perceptual disfluency was manipulated between groups and auditory distraction was manipulated within groups. Subjects were randomly assigned to one of two perceptual disfluency groups.

The study consisted of three phases: (1) Assessment of working memory capacity, (2) administration of a reading speed test, and (3) completion of the reading comprehension task. In phase 1, subjects completed two working memory span tasks (reading span and rotation span). In phase 2, subjects completed a reading speed test in one of two fonts. Subjects in the easy-to-read group received an easy-to-read font (Times New Roman, 18 pt) while subjects in the hard-to-read group received a hard-to-read font (Haettenschweiler, 20 pt). All subjects read in silence and pressed the “end” button on the screen immediately after completing the passage. The time spent reading the passage was recorded. In phase 3, subjects completed a series of three reading comprehension tests. Each of the three tests was randomly combined with one of the three auditory distraction conditions. The reading comprehension tests and the auditory distraction conditions (no noise, meaningless speech, and content-related speech) were both presented in random orders. Each passage was presented for 5 min and was followed by ten multiple-choice questions. The questions were presented in groups of five on the computer screen without the corresponding passage. Each question group was presented for 2 min, so subjects had about 24 s to read and answer each question. Finally, after each of the three tests, subjects were asked to rate how difficult they perceived the test to be. Ratings were on a 7-point scale, with 1 being extremely easy and 7 being extremely hard.

Materials

Working memory tasks

Two complex span tasks, reading span (RSPAN) and rotation span (ROTSPAN), were used to measure WMC. In general, each task consists of multiple items. The difficulty of the items varies as a function of the number of displays per item. Each display consists of a processing component followed by a storage component. Subjects made judgments on the processing component and memorized the stimuli on the storage component. Subjects’ recall of storage stimuli was tested after several displays and was recorded as one item. A partial-credit load scoring procedure was used (PCL; see Conway et al., 2005), in which item scores are weighted based on item size (longer items contribute more to the total score than shorter ones), and partially correct items are credited based on the proportion of correctly responded elements in each item. To ensure that subjects were attending to the processing component rather than strategically ignoring it, accuracy rate of the processing component was presented on the screen throughout the task. Subjects were instructed to keep the rate above 85%.

Reading span

For each display, a sentence was shown on the computer screen as the processing component and was followed by a random letter as the storage component. Subjects judged the logical accuracy of the sentences and then memorized the letters. The RSPAN task consisted of 15 items, with three to seven displays in each item.

Rotation span

For each display, an image of a rotated letter was shown on the computer screen as the processing component and was followed by a single-direction arrow as the storage component. The arrows had eight possible directions and two possible sizes. Subjects judged whether the rotated letter was presented normally as opposed to horizontally flipped (“mirrored”), and then memorized the direction and size of the arrows. The ROTSPAN task consisted of 12 items, with two to five displays in each item.

Reading materials

All reading materials were presented by a Python program on desktop computers under the same setting (models of hardware, operation system, audio and video settings, etc.). Four prose passages were used in the experiment: one short prose passage was used to assess reading speed; three longer prose passages were used for reading comprehension. All passages were presented in either the 18-pt Times New Roman (easy to read) or 20-pt Haettenschweiler (hard to read) font depending on the perceptual disfluency group the subject was assigned to. The two font sizes were selected so that the upper and lower margins of the passages in the two font types were similar on the same screen. The layouts and margins of the experiment program were adjusted so that each passage was presented in one screen. The short prose passage to assess reading speed was 300 words. The three long prose passages to assess reading comprehension were each five paragraphs with approximately 550 words in total and described four different fictional races in fantasy literature. All prose passages were adopted and modified from Pathfinder Roleplaying Game: Monster Codex (Bulmahn, 2014). As mentioned, for each of the three reading comprehension prose passages, reading comprehension was assessed by ten multiple-choice questions with four options per question. To ensure the uniqueness of both fonts for perceptual disfluency manipulation, all the instructions and questions were in 20-pt Arial font. The experiment program with all materials is available online at: https://github.com/hanhao23/ReadingComprehensionProgram.

Auditory stimuli

Two auditory distractors were used in the experiment: content-related speech and meaningless speech. The content-related speech consisted of a male voice that described another fictional race from the same source as the three prose passages. It was recorded in a moderate speech speed (125 words/min) with no background noise. The meaningless speech was the backward-played soundtrack of the same speech. Both soundtracks were played through a binaural headset connected to the same computer presenting the experiment program, with the volume mixer fixed to 20%. Subjects were required to wear their headsets throughout the reading comprehension tasks in all three noise conditions. Subjects were not able to adjust video nor audio settings at any time during the experiment.

Results

Data cleaning

Four subjects were excluded due to extreme reading speed (< 60 words/min) or failure to complete all the tasks. The final sample size for all analyses was 122, with 61 subjects in each group. Reading comprehension scores in the three background noise conditions were all negatively skewed, but all skewness values were in an acceptable range (-0.73, -1.24, and -0.84 for content-related speech, meaningless noise, and no noise condition, respectively). All data and R scripts for data analysis are available at https://osf.io/et74x. All significance tests in the following analyses used an alpha level of .05; p-values of multiple significance tests, such as those related to post hoc comparisons, were adjusted accordingly.

Manipulation checks

Reading speed in the two font groups was compared by conducting an independent t-test, and perceived task difficulty (7-point scale) within and across groups was compared by conducting a mixed factorial ANOVA. Similar to Halin et al. (2014), reading speed in the hard-to-read font group (M = 204.41 words/min, SD = 71.02) was slightly slower, but not significantly different from that in the easy-to-read font group (M = 206.09 words/min, SD = 98.67), t(109.01) = -0.11, p = .914; Cohen’s d = .02. However, unlike Halin et al. (2014), subjective ratings of task difficulty for reading in the hard-to-read font group (M = 3.83, SD = 1.17) were slightly lower but not significantly different than those in the easy-to-read font group (M = 4.21, SD = 1.10), F(1, 120) = 3.47, p = .065; ηp2 = .03. This means that without a within-individual comparison, subjects were not subjectively aware of the disfluency manipulation. Significant differences were also found among subjective ratings of difficulty for different background noise conditions, F(2, 240) = 68.06, p < .001; ηp2 = .36. Reading with speech as background noise (M = 4.75) was perceived to be 0.52 points more difficult than reading with meaningless noise (M = 4.23; p = .001 with Bonferroni correction); reading with meaningless noise was perceived to be 1.16 points more difficult than reading with no noise (M = 3.07; p < .001); and reading with speech as background noise was perceived to be 1.69 points more difficult than reading with no noise (p < .001). Finally, the interaction between perceptual disfluency and background noise was not significant, F(2, 240) = 0.91, p = 0.402; ηp2 = .01.

Overall, the results of the manipulation checks indicate that the hard-to-read font did not take longer to read than the easy-to read font, nor was it perceived to be more difficult than the easy-to-read font between groups. Hence, a difference between the two font groups in reading comprehension cannot be attributed to different processing speeds when reading in the two fonts, or to different levels of perceived task difficulty. However, reading with background noise (either speech or meaningless noise) was perceived to be significantly more difficult than reading with no noise, and reading with speech as background noise was perceived to be significantly more difficult than reading with meaningless noise. Thus, subjects were subjectively aware that the two noise conditions existed; and they generally considered the two noise conditions to be distracting compared to the no noise condition, with the semantically meaningful noise being more distracting than the meaningless noise. In conclusion, the perceptual load manipulation did not change overall reading speed or subjective perception of task difficulty, while the within-group auditory distraction manipulation was subjectively perceived.

Group-level analyses

Reading comprehension scores were analyzed with a 2 (perceptual disfluency) × 3 (auditory distraction) mixed factorial ANOVA (see Fig. 1). Contrary to Halin et al. (2014), the main effect of perceptual disfluency was significant, F(1, 120) = 5.42, p = 0.022; ηp2 = .04; and the main effect of background noise was significant, F(2, 240) = 3.54, p = 0.031; ηp2 = .03; but the interaction between perceptual disfluency and background noise was not significant, F(2, 240) = 0.29, p = 0.747; ηp2 = .002.

Fig. 1
figure 1

Group means with error bars (SE) of reading comprehension scores across auditory distraction conditions and perceptual disfluency groups

Levene’s test on the between-group variable indicated that the homogeneity of variance assumption was violated, F(1, 364) = 12.22, p < .001. The main effect of perceptual disfluency was therefore re-tested with Welch’s t-test, which again indicated a significant difference between the two font groups, t(347.72) = 3.48, p < .001, Cohen’s d = 0.36. Comprehension performance was significantly worse in the easy-to-read font group (Measy = 6.99) than in the hard-to-read font group (Mhard = 7.76).

The main effect of background noise was followed up with post hoc tests (pairwise t-tests with Bonferroni correction). The pairwise comparisons indicated a significant difference between the meaningful speech condition and the no-noise condition, M1-M3 = -0.41, t(242) = -2.47, p = .043, Cohen’s d = -0.23; but not a significant difference between the meaningful speech condition and the meaningless noise condition, M1- M2 = -0.36, t(242) = -2.12, p = .105, Cohen’s d = -0.19; or between the meaningless noise condition and the no-noise condition, M2- M3 = -0.06, t(242) = -0.35, p = 1.000, Cohen’s d = -0.03.

Individual-level analyses

Confirmatory analysis

To test predictions about individual differences, a regression analysis similar to the correlation analysis of Halin et al. (2014) was conducted. Tables 1 and 2 present the correlations among the two WMC measures, standardized WMC composite, sum of the three reading comprehension test scores, standard score of the distraction effect, and individual reading speed. The standardized WMC composite for each subject was calculated by averaging the standardized scores from the two complex span tasks. The distraction effect for each subject was calculated by subtracting the reading comprehension score in the content-related speech condition from the score in the no-noise condition (S3– S1). Thus, larger distraction effects indicate more attention failures and therefore lower distraction inhibition performance (i.e., the subject’s ability to ignore the impulse to attend to stimuli that may induce interference).

Table 1 Correlation coefficients for working memory measures, distraction effect score, and reading speed for all subjects
Table 2 Correlation coefficients for working memory measures, distraction effect score, and reading speed by perceptual disfluency groups

A regression model was constructed in which the outcome variable was the distraction effect and the predictor variables were WMC composite, perceptual disfluency (font group), and their interaction. The regression results indicated that none of the three predictors (WMC, perceptual disfluency, or the interaction term) significantly predicted the distraction effect. The overall model had R2 < .01, and Rp2 statistics for all three predictors were smaller than .001. WMC was not a significant predictor of the distraction effect in the easy-to-read group, βe = .03, p = .857; it was also not a significant predictor of the distraction effect in the hard-to-read group, βh = -.01, p = .934; and there was not a significant difference between the two coefficients: β = -.04, p = .849. In summary, WMC was not significantly correlated with the distraction effect in either perceptual disfluency group, and WMC did not moderate the relationship between perceptual disfluency and auditory distraction (see Fig. 2). These results fail to support our second hypothesis of a three-way interaction among perceptual disfluency, auditory distraction, and WMC on reading comprehension, and they fail to replicate the results of Halin et al. (2014).

Fig. 2
figure 2

Linear regressions with scatterplots of standardized working memory capacity (WMC) on standardized distraction effect by perceptual disfluency groups

Exploratory analyses

The regression analysis above was conducted to capture the interaction between perceptual disfluency, background noise, and WMC observed by Halin et al. (2014). However, this analysis only captured the three-way interaction among perceptual disfluency, background noise, and WMC. It did not explore all possible two-way and three-way interactions among perceptual disfluency, background noise, and WMC. Also, unlike Halin et al. (2014), three background noise conditions were used in the current study, and therefore the distraction effect could not be fully investigated by only calculating the difference between performance in the speech condition (S1) and the no-noise condition (S3). For these reasons, an exploratory mixed effects analysis was conducted to investigate the potential interaction effects of perceptual disfluency, background noise, and WMC on reading comprehension performance. A series of three generalized linear mixed effects models were constructed and compared, in which one model included all potential interactions, while the others were simplified based on the significance of the initial predictors. The packages lme4 (Bates et al., 2015) and r2glmm (Jaeger, 2017) in the software R (R Core Team, 2019) were used to fit and compare the models.

Model 1 was constructed as “the interaction model.” This model investigated the fixed effects of WMC, perceptual disfluency, background noise, all two-way interactions, and the three-way interaction. Random intercepts were included by subjects, and no random slope was specified. Perceptual disfluency was a dummy-coded between-group variable (easy-to-read being the reference group), background noise was dummy-coded into two within-group variables (the no-noise condition being the reference condition), and WMC was a continuous variable. Random intercepts were specified by subjects. Results of Model 1 (estimates presented in Table 3) indicated that WMC was a significant predictor of reading comprehension, F(1, 224.97) = 14.78, p < .001, and perceptual disfluency had a significant effect on reading comprehension, F(1, 224.97) = 6.56, p = .011. However, there was no significant effect of background noise on reading comprehension, F(2, 244) = 2.05, p = .130. All two-way interactions and the three-way interaction were not significant except the interaction between WMC and perceptual disfluency: F(1, 224.97) = 4.11, p = .044. All fixed effects were tested by type III analysis of variance with Satterthwaite’s method. Figure 3 visualizes the linear relationship between WMC (x-axis) and reading comprehension performance (y-axis) in the six unique experimental conditions (2 perceptual disfluency groups × 3 background noise conditions) in the individual level results.

Table 3 Summaries of the generalized mixed effects models in the individual level analysis
Fig. 3
figure 3

Generalized linear mixed effects model visualizations with scatterplots of standardized working memory capacity on reading comprehension performance in different perceptual disfluency groups and background noise conditions

Based on these results, Model 2 was constructed as “the simplified model,” in which only the fixed effects of WMC, perceptual disfluency, and their interaction were included as predictors, with random intercepts by subjects still included. Another model (Model 3) without the WMC/perceptual disfluency interaction was also constructed. All predictors in Models 2 and 3 were significant (see Table 3). A comparison of model fit statistics indicated that Model 1 was not significantly better than Model 2: for Model 1, AIC1 = 1443.52, BIC1 = 1498.16, Rβ2 = .17; for Model 2, AIC2 = 1436.64, BIC2 = 1460.06, Rβ2 = .16; Δχ2(8) = 9.12, p = .331, ΔRβ2 = .01; but Model 2 was significantly better than Model 3: for Model 3, AIC3 = 1440.48, BIC1 = 1459.99, Rβ2 = .13; Δχ2(1) = 5.84, p = .016; ΔRβ2 = .03 (see Table 3). Thus, Model 2 was retained as the selected model, in which perceptual disfluency was a significant moderator of the predictive positive association between WMC and reading comprehension performance after within-subject variance was taken into account (randomly varying intercepts across individuals). When the texts were in the easy-to-read font, WMC had a significant positive linear association with reading comprehension performance (Be = 1.10, SE = 0.23. t(122) = 4.81, p < .001); however, the association was not significant when the texts were in the hard-to-read font (Bh = 0.36, SE = 0.20. t(122) = 1.85, p = .067).

Discussion

The current study examined the effect of auditory distraction, perceptual disfluency, and individual differences in WMC on reading comprehension. At the group level, the current results suggest that reading comprehension was worse in the easy-to-read font group than in the hard-to-read font group regardless of auditory distraction conditions. Comprehension was also worse when reading with background noise consisting of content-related meaningful speech than when reading with no noise. Both main effects were significant, with both effect sizes being small to medium (Cohen, 1988). However, the distracting effect of background noise did not vary as a function of perceptual disfluency (font group) as was predicted. Overall, reading comprehension was significantly impaired by content-related background noise and was significantly enhanced by perceptual disfluency, but contrary to Halin et al. (2014), there was not a significant interaction between the two manipulations. Furthermore, post hoc comparisons indicated that background noises, at least in this experiment, were semantically but not acoustically disruptive to reading comprehension performance.

The significant positive effect of a hard-to-read font on reading comprehension suggests that, as a disfluency manipulation, the font-type manipulation may not merely introduce sensory load. According to Lavie and de Fockert (2003), degraded stimuli that introduce only sensory load would increase the difficulty of a cognitive task and generally impair performance. However, although the effect size was small to medium, the font-type manipulation in the current study actually improved reading comprehension performance in general. Therefore, it is unlikely that a hard-to-read font introduces only sensory load to the reading task, as other degraded stimuli do.

The significant disruptive effect of content-related speech suggests that auditory distractions that are semantically related to visual materials would introduce a high cognitive load to the visual task and disrupt cognitive performance on the visual task, while auditory distractions that are semantically unrelated to visual materials may not. This result is consistent with other previous studies of auditory distraction (e.g., Marsh et al., 2009), in which auditory distractions were found to be more distractive when they engaged cognitive processes similar to those required in the primary task, and therefore were more cognitively demanding.

However, the overall group-level results only partially supported the load theory’s prediction. According to the group-level prediction, in a hard-to-read font, reading comprehension should be impaired less by distraction because less spare capacity is left for distraction processing. Therefore, the extent to which reading comprehension performance is impaired by distraction should be dependent to the font-type conditions. We did not observe this in the current results, as the interaction effect between perceptual disfluency and background noise was not significant. These results indicate that the distracting impact of background noise was not moderated by perceptual disfluency. Thus, there was no “shield” effect of perceptual disfluency observed in the current results at the group-level.

The overall group-level results also provide partial support for the duplex-mechanism account of auditory distraction (Marsh et al., 2020), as auditory distraction with semantic meaning interfered with the focal task more than the distraction without semantic meaning. However, the results failed to support the top-down control argument by the duplex-mechanism account (Hughes et al., 2013), which argues that a higher level of task engagement introduced by an increase of difficulty in encoding the focal visual items (e.g., higher perceptual load of the visual task) eliminates the impact of an attention-deviation mechanism, but not an interference-by-process mechanism. In other words, the top-down control argument predicts an interaction effect between type of background noise (meaningful speech vs. meaningless speech) and level of perceptual load (hard-to-read font vs. easy-to-read font). However, in the current study, the interaction between background noise and disfluency was not observed; reading comprehension under different types of background noise was not found to be dependent on the disfluency manipulation.

At the individual level, contrary to the potential interaction observed by Halin et al. (2014), there was not a significant three-way interaction among auditory distraction, perceptual disfluency, and WMC. However, our exploratory analyses revealed that WMC predicted reading performance regardless of distraction condition, and this effect was moderated by perceptual disfluency; WMC was less predictive of reading comprehension when the texts were disfluent. In other words, the high perceptual load in the current task interacted with individual difference in WMC, not on distraction inhibition performance, as was predicted by load theory (Lavie, 2010; Lavie & De Fockert, 2005), but rather on reading comprehension performance itself. This finding is partially aligned with the metacognition claim of disfluency theory (Alter et al., 2007), which states that perceptual disfluency can activate subjects’ metacognition of task difficulty. This triggers more effortful analytic processing for all subjects, and thus individual differences in cognitive ability will be reflected less on task performance in a perceptually disfluent task. According to Alter et al., a metacognition of higher task difficulty would consequently involve more System 2 processing of the task materials and lead to better performance in general. However, in the current study, perceptual disfluency was managed between subject groups and the perceptually disfluent condition was not subjectively perceived to be more difficult by subjects. Therefore, the beneficial effect of perceptual disfluency in the current study could not be attributed to explicit awareness of difficulty.

Overall, the current findings provide partial evidence for the effect of perceptual disfluency on reading comprehension, but do not support the existence of a shield effect of disfluency on selective attention. According to Sörqvist and Marsh (2015), perceptual disfluency increases the level of concentration on a task, which “shields” attention from distraction by reducing undesired processing of irrelevant information and stabilizing the locus of attention during a task. However, according to the current results, the influence of perceptual disfluency on distraction may be either unstable across different settings or smaller than initially expected and may be due to some unknown moderators other than WMC. Hence, our results question the generalizability and applicability of the shield effect of perceptual disfluency against auditory distraction during reading.

In conclusion, a great deal of uncertainty remains regarding the so-called “shield effect” of perceptual disfluency against distraction. More research is needed before introducing applications of disfluency to cognitive tasks in real-world settings. Future research can investigate why the current study failed to replicate Halin et al. (2014). One major methodological difference between the two studies was the manipulation of perceptual disfluency between or within groups. When perceptual disfluency is manipulated within groups, subjects may become aware of the manipulation and adopt different strategies to compensate for the change in perceptual load. This type of change lacks mundane realism, which is why we prefer the between-groups approach. Future research may also adopt a full between-groups design in which subjects are exposed to only one level of perceptual disfluency and only one distraction condition to control for potential subjective awareness of changes in perceptual or cognitive load.

Furthermore, although the perceptual disfluency manipulation in the current study was similar as that used by Halin et al., it was not identical. For example, in Halin et al. disfluency was manipulated by changing the font type while keeping the same font size between the two reading difficulty conditions, while in the current study, font size was also adjusted accordingly so that the texts appeared in similar numbers of lines and top/bottom margins between the two reading difficulty conditions. Although this font size change was minor (from 18pt. Times New Roman to 20 pt. Haettenschweiler), it is possible that the adjustment facilitated reading performance for subjects in the hard-to-read group and therefore alleviated the “difficulty.” One possible way to address this issue is to use font types that are less “compact” than Haettenschweiler but still difficult to read, such as Mistral (Pieger, Mengelkamp, & Bannert, 2016) or Brush Script (Eitel & Kühl, 2016), to manipulate perceptual disfluency in such reading comprehension tasks. Other details in the administration of the reading test also differed, such as the language of the materials and the time management for questions. These differences in manipulations and administrations could have led to the null result of the predicted interaction effect and could also lead to questions about the generalizability of the shield effect of perceptual difficulty. Further conceptual replications of different materials and administrations are needed to investigate further potential boundary conditions and moderators in both laboratory and real-world settings.

Another possible direction for future research is to use visual cognitive tasks other than reading comprehension, because reading comprehension may be too complex as an outcome if selective attention and distraction inhibition are the main research interests. Although a reading comprehension task may be ideal for the manipulation of perceptual difficulty, it may involve more complex cognitive processing than a typical selective attention task. Many factors, such as working memory (McVay & Kane, 2012) and knowledge/vocabulary (Cain et al., 2001; Ouellette, 2006), may have unique impacts on reading performance. Future studies may also test not only the primary task performance, but memory of the irrelevant distractions as well. Doing so might provide information on the amount of task-irrelevant information that is “filtered” across different experimental conditions.