Introduction

In daily life, people experience moments of inattention, where their focus drifts from a current task to something irrelevant to the task. For routine activities, there are minor consequences associated with such attentional lapses because those activities can be performed automatically. However, attentional lapses may have greater consequences for novel activities that require new learning. To illustrate, suppose that someone was repeatedly told about the positive effects of a drug. Later, they were told that the drug also has negative side effects, but they were distracted by other thoughts when told this. Their divided attention may have impaired their encoding of the negative side effects, resulting in memory for only the positive effects. This impairment in memory updating could have negative consequences if this person decides to either take or recommend the drug. Such updating failures can be avoided by detecting and later recollecting information changes (e.g., Wahlheim & Jacoby, 2013; Jacoby, Wahlheim, & Kelley, 2015), but little is known about the role of attention to changes in these effects. We addressed this here by examining how memory updating is associated with attention to changes during study and recollection of changes at test.

Episodic memory updating and memory for change

As illustrated by the example above, proactive interference effects are likely to occur when two stimuli have both shared and distinctive features. Proactive interference for individual items has often been examined using the A-B, A-D paradigm. In this paradigm, participants study two lists of paired associates and are later given a cued recall test for responses from the second list (for a review, see Anderson & Neely, 1996). The study lists sometimes contain a mixture of pairs that either repeat across lists (A-B, A-B), appear only in the second list (C-D), or have the same cue paired with different responses in each list (A-B, A-D). Proactive interference occurs for A-B, A-D items when the two responses compete at retrieval (e.g., Postman & Underwood, 1973). Proactive interference is observed as lower recall for recent responses (D) relative to control (C-D) items and higher intrusions of earlier responses (B) relative to baseline.

A recent theory of episodic memory updating proposes that recollecting integrated memory representations that include both responses can counteract proactive interference. According to the Memory-for-Change (MFC) framework (e.g., Jacoby et al., 2015; Wahlheim & Jacoby, 2013), recall performance for A-B, A-D items reflects a combination of both interference and facilitation effects that depend on how often changes are initially detected and later recollected. The MFC framework builds on Hintzman’s (2004, 2010, 2011) recursive reminding hypothesis by proposing that when two stimuli have overlapping features, the current stimulus can trigger a reminding of the prior stimulus. This reminding enables change detection and encoding of configural representations that include both stimuli together with the cognitive operation (i.e., the reminding) that co-activated them in working memory. Configural representations are assumed to preserve the temporal order of the stimuli, since it can be inferred at retrieval that the reminder stimulus occurred more recently than the reminded stimulus. Critically, access to those representations is assumed to require recollection-based retrieval, which has recently been operationalized in paired associate paradigms as correct classification of changed test items as such and recall of the List 1 response (e.g., Garlitch & Wahlheim, 2020; Wahlheim, Delaney, & Smith, 2019; Wahlheim & Zacks, 2019) Support for these predictions has been shown by proactive facilitation when changes are recollected, and proactive interference when changes are not remembered as such (Jacoby & Wahlheim, 2013; Jacoby, Wahlheim, & Yonelinas, 2013; Jacoby et al., 2015; Putnam, Wahlheim, & Jacoby, 2014; Wahlheim, 2014, 2015; Wahlheim & Jacoby, 2013; Wahlheim & Zacks, 2019).

The MFC framework assumes that attention influences detection and recollection of change, but only a few studies have investigated this. These studies have focused on the role of controlled attention in detecting change and its consequences for later recall. For example, Jacoby et al. (2015) examined how varying instructions about the breadth of retrieval during encoding influenced change detection rates and associated differences in change recollection and recall of recent responses (also see Jacoby, 1974; Jacoby & Wahlheim, 2013). The main assumption was that the instructions given to participants guided their use of controlled attention to look back across various temporal distances to determine whether a currently perceived stimulus had changed from one presented earlier. Their variants of the A-B, A-D paradigm (Experiments 2 and 3) included pairs that changed at long lags (between List 1 and List 2) and pairs that changed at short lags (only within List 2), followed by cued recall of recent List 2 responses. One group of participants, who were instructed to only identify changes that originated from List 2, were assumed to direct their attention narrowly to items presented earlier in that list. In contrast, the other group of participants, who were instructed to identify changes originating from either List 2 or List 1, were assumed to direct their attention broadly back across both lists. Participants who looked back across both lists recollected more changes originating from List 1 than participants who looked back within List 2 only. Importantly, the group that looked back over both lists showed proactive facilitation in recall of List 2 responses for pairs that changed from List 1 to List 2, whereas the group that looked back within List 2 did not. These recall differences suggested that participants were able to differentially guide their attention to past events in order to detect changes in the present.

Results of this sort provide clear evidence that attention influences how often changes are detected from the past. Although compelling based on the causal inferences that can be drawn, the characterization of the role of attention in change detection from Jacoby et al. (2015) is limited. For example, the between-subjects manipulation reduces both intra- and interindividual variability in participant-selected attention strategies for detecting change, and the procedure does not allow for direct assessment of memory differences associated with these sources of variability. Also, the conclusion about group differences in attention allocation was based on a combination of indirect measures during study and test, and data that were collapsed across participants within conditions. It is unclear from these experiments how momentary differences in attention during encoding is associated with change detection and performance on downstream memory measures, including change recollection.

The most novel contribution of the present study is that we addressed these limitations by directly measuring momentary fluctuations of attention in a variant of the A-B, A-D memory updating paradigm using self-reports. This allowed us to characterize intra- and interindividual variation in attention to changed stimuli and associations with change recollection and other memory measures at test. Based on Jacoby et al. (2015), we assumed that when participants in the present experiment report attending to changed pairs during study, they should be more likely to retrieve related stimuli, thereby enabling change detection, and overtly recollect those detected changes at test.

Self-reported attention, mind wandering, and episodic memory

As stated above, previous work has examined how task-controlled attention influences episodic memory updating, but no studies to our knowledge have examined the association between momentary fluctuations in attention during encoding and change recollection at test. To develop a more comprehensive understanding of the role of attention in episodic memory updating, we sought inspiration from studies of self-reported lapses in attention, referred to as mind wandering. Mind wandering occurs when one’s thoughts drift from the current task to one’s internal state (for a review, see Smallwood & Schooler, 2015). Mind wandering episodes can be captured by inserting thought probes throughout a task that ask participants to report on their current thoughts (e.g., Smallwood & Schooler, 2006). Mind wandering typically increases during less demanding tasks (e.g., Smallwood, Nind, & O’Connor, 2009) and as time on task increases (e.g., McVay & Kane, 2012a; Metcalfe & Xu, 2016; Teasdale et al., 1995, Experiment 3; Thomson, Seli, Besner, & Smilek, 2014b). Mind wandering can also vary across people, as shown by consistency in mind wandering rates within people across tasks (e.g., McVay & Kane, 2012b) and by associations between mind wandering and executive control abilities (e.g., Kane et al., 2007, 2016; Kane & McVay, 2012; McVay & Kane, 2009).

The literature on the association between mind wandering and episodic memory has shown that mind wandering is associated with impaired memory when deep or elaborate processing is required during encoding (e.g., Maillet & Rajah, 2013; Thomson, Smilek, & Besner, 2014a). For example, Thomson et al. (2014a) examined mind wandering in deep and shallow encoding conditions and associated differences in recognition memory between those conditions. Mind wandering reports were associated with poorer recognition memory only in the deep-encoding condition (also see, Maillet & Rajah, 2013). However, this correlation was not present when controlling for the accuracy of the deep-encoding judgments. This suggested that mind wandering interfered with participants’ ability to make correct encoding judgments, which reduced the effectiveness of deep encoding and impaired recognition memory.

Mind wandering has also been shown to disrupt the encoding that facilitates inductive reasoning and inferences. For example, mind wandering during encoding of artwork exemplars is negatively associated with classification of unstudied artwork from studied artists (Metcalfe & Xu, 2016). Mind wandering is also negatively associated with situation model updating in narrative comprehension (Smallwood, McSpadden, & Schooler, 2008). Smallwood et al. reasoned that mind wandering while reading critical passages prevented participants from retrieving and integrating information necessary to later make inferences. Finally, mind wandering is associated with poorer learning in both the classroom (Risko, Anderson, Sarwal, Engelhardt, & Kingstone, 2012; Wammes, Seli, Cheyne, Boucher, & Smilek, 2016) and the laboratory (Farley, Risko, & Kingstone, 2013; Kane et al., 2017; Loh, Tan, & Lim, 2016; Risko, Buchanan, Medimorec, & Kingstone, 2013). Greater mind wandering during lectures was associated with poorer learning, presumably because the ability to retrieve knowledge and integrate it with new information was reduced when attention was not focused on the lecture.

Collectively, these findings suggest that when attention is off-task, particularly during a mind wandering episode, memory performance suffers. This relationship is most robust when encoding requires elaborative processing, such as during deep encoding (e.g., Thomson et al., 2014a) or when information must be integrated (e.g., Smallwood et al., 2008). These findings inform predictions in the present study as change recollection is assumed to reflect retrieval of integrated representations formed using elaborative encoding processes. Based on these findings, we predict that when participants are off-task, they should be less likely to detect change and form the integrative representations that support change recollection at test.

The present study

The primary aim of the present study was to extend prior work examining the relationship between attention during encoding and associated memory performance, particularly the ability to update memory for changed information. The MFC framework assumes that attention is required to encode original and changed pairs during study. When attention is not engaged during either presentation, due to mind wandering or external distractions, changed pairs should trigger fewer retrievals of original pairs, thus precluding integrative encoding and diminishing recall of changed pairs. To our knowledge, this is the first study to directly test this idea by measuring the covariation in attention during study, change recollection at test, and recall performance. Here, we used a single-list variant of the A-B, A-D paradigm that included thought probes periodically during the study phase. The study phase consisted of word pairs that repeated four times, appeared once as control items, or repeated three times and included a changed response on the fourth appearance. The cued recall test assessed memory for the most recent responses paired with cues and recollection of changes between responses.

To foreshadow, we established that the single-list variant of the task replicated earlier findings showing proactive facilitation when change was recollected and proactive interference when change was not recollected (e.g., Jacoby et al., 2015; Wahlheim & Jacoby, 2013). Based on earlier studies showing a relationship between self-reported attention and memory (e.g., Thomson et al., 2014a), we expected recall of recent responses and change recollection to be greater for participants who indicated being on-task more often and for items that are followed by on-task reports. We also expected these associations to be greater for items that required new learning than for repeated items because repetitions allowed for more encoding opportunities. Related to fluctuations of attention, we expected to replicate earlier findings showing that on-task reports decrease as time on task increases (e.g., McVay & Kane, 2012a). We also explored the possibility that new features of changed responses that did not appear in earlier repetitions may capture attention towards the end of the study phase, thus leading to more on-task reports.

Method

In what follows, we report how we determined sample size, all data exclusions, all manipulations, and all measures in this study (Simmons, Nelson, & Simonsohn, 2012). The data and analysis scripts are available on the Open Science Framework: https://osf.io/56t9k/.

Participants

The final sampleFootnote 1 consisted of 132 undergraduates (95 female), aged 18–29 years (M = 19.02, SD = 1.70) from the University of North Carolina at Greensboro. Participants were recruited from the Psychology Department participant pool. The sample size was based on the number of participants needed to examine the within-subjects interaction between task reports and item type on recall performance. Prior experiments manipulating external variables to influence change recollection and recall performance have found small to medium effect sizes ranging from ηp2 = .06 – .09 (Negley et al., 2018; Wahlheim, 2015). According to G*Power Version 3.1.9.2 (Faul, Erdfelder, Buchner, & Lang, 2009), with power = .80 and α = .05 (two-tailed), a sample size of 128 is sufficient to detect a small to medium interaction effect (ηp2 = .06) and a small to medium between-subjects correlation of r = .25. We included 132 participants to ensure that an equal number completed each of the 12 experimental formats (described in the next section). Participants received partial course credit as compensation.

Design and materials

The current experiment used a within-subjects design, with Item Type (A-B4 [repeated] vs. C-D [control] vs. A-B3, A-D [changed]) as the independent variable. The materials consisted of 156 word sets (144 critical and 12 buffers) taken from Jacoby (1996) and Nelson, McEvoy, and Schreiber (1998). Each set contained a cue (e.g., throat) and two responses (e.g., tonsil, tongue). The two responses had overlapping orthographic features because they were originally created so that each response could complete the same word fragment (e.g., ton_ _ _). We did not use the fragments. For counterbalancing, the critical word sets were divided into six groups of 24. Each group appeared as each item type equally often across participants. For the first six formats, the response arbitrarily labeled as Response 1 was the target word (e.g., tonsil appeared as the second or only response) while the response labeled as Response 2 was the target word for the other six formats (e.g., tongue appeared as the second or only response). The non-target response from each set appeared as the response in the first three blocks for A-B3, A-D items.

The average lengths of cues (M = 5.26, SD = 1.60, range = 2–9) and responses (M = 4.76, SD = 1.08, range = 3–8) were matched across groups. The average word frequency, assessed using the Hyperspace Analog to Language method (HAL; Lund & Burgess, 1996), and catalogued by the English Lexicon Project (Balota et al., 2007), was matched across groups for the cues (M = 9.44, SD = 1.45, range = 6–14) and the responses (M = 9.34, SD = 1.60, range = 5–14). The associative strength between words in each set was indexed by the Nelson, McEvoy, and Schreiber (1998) free-association norms. The average associative strength between cues and responses was low (forward: M = .06, SD = .08, range = .03-.10; backward: M = .08, SD = .14, range = .03-.15). The average forward and backward associative strengths between responses within sets was comparably low (M = .02, SD = .06, range = .001–.07).

A schematic for the study phase is shown in Fig. 1. The study list comprised four seamless blocks with 72 word pairs in each block. One set of word pairs (24 in each block) repeated in all four blocks (A-B4). Another set of word pairs (24 in each block; 96 total) that served as control items were new in each block and had no overlapping terms with pairs from previous blocks (C-D). The last set of word pairs (24 in each block) repeated across the first three blocks and then had the same cue word paired with a changed response in the fourth block (A-B3, A-D). For example, the pair throat-tonsil could appear in the first, second, and third blocks and then the pair throat-tongue could appear in the fourth block. Buffer items appeared at the beginning and end of the study phase, with four buffer items from each of the three item types (12 total). Word pairs appeared in a fixed random order in each block of the study phase, with the stipulation that no item type appeared more than three times consecutively. The average serial position for each item type was equated within blocks to control for serial position effects.

Fig. 1
figure 1

Schematic of the study procedure. Participants studied a list that contained four seamless blocks. Each block contained word pairs that repeated across each block (A-B4), repeated in the first three blocks, and then had the same cue with a changed response in the fourth block (A-B3, A-D), or were unique to each block (C-D). Thought probes were inserted pseudo-randomly such that three probes came after each Item Type in each block, and the probes appeared six to ten word pairs apart. The probe appeared immediately after the previous word pair, and asked participants to indicate if they were on-task or off-task

Nine thought probes appeared between word pairs in each of the four study blocks (36 total). We inserted the probes pseudo-randomly with the stipulation that an equal number appeared following each item type (i.e., three probes after each item type in each block). Probes were assigned to the same item type condition as the pair they followed. Probes appeared after six to ten word pairs to minimize the systematicity of their presentation with intervals of 46, 54, 62, 70, or 78 s. The average duration between probes was 62 s (SD = 12.09 s). Each probe consisted of a discrete on-task or off-task judgment.

The test phase was self-paced and included cues from all 72 pairs that appeared in the fourth study block. The cues for the cued recall test appeared in a fixed random order for each of the 12 formats, with the stipulations that cues from the same item type condition did not appear more than three times consecutively and that the serial position was equated across item types.

Procedure

All participants were tested individually. All experimental stimuli were administered using E-prime software (Version 3, Psychology Software Tools, Inc). Word pairs and test cues appeared in white Arial size 24 pt font on a black background. Participants were told that their first task would be to study a list of word pairs for an upcoming memory test. Word pairs appeared for 6 s each with a 2 s interstimulus interval (ISI) between each presentation. Participants were told that they would periodically be asked about their attention to the task and were given an explanation about the meaning of “On-task” and “Off-task” reports (see Supplemental Materials for instructions). Each probe screen appeared immediately following the 6 s study duration for the previous word pair (before the ISI). We did this to ensure that participants made their probe judgments based on their attentional state while studying the prior word pair. Participants were told to indicate that they were “On task” or “Off task” by clicking on the corresponding button on the left or right, respectively. These responses were self-paced. The experimenter left the room after monitoring performance on the primacy buffers to allow for natural fluctuations in attention.

After the study phase, the experimenter returned and remained in the room for the test phase. Participants were told that their tasks would be to recall the most recent responses from the study phase and indicate when they remember that responses had changed (see Supplemental Materials for instructions). To begin, six of the buffer items appeared as practice items for the test phase. In both the practice and actual test phases, a cue appeared with a question mark (e.g., throat-?), and participants were asked to type the most recent response paired with each cue (e.g., tongue). After entering their response, a question appeared asking if the right word paired with the cue changed during the study phase. Participants indicated that responses had changed by pressing the “1” key and that responses had not changed by pressing the “0” key. When participants indicated that a pair had changed, they were asked to type the response that was paired with that cue earlier in the study phase (e.g., tonsil). When participants indicated that a pair had not changed, they moved on to the next trial. After completing the test phase, participants completed a final exploratory task.Footnote 2 Each session lasted approximately 1.5 h.

Results

All analyses were conducted using R software (R Core Team, 2019). All models in the analyses below include subjects and items as random intercept effects and experimental manipulations as fixed effects unless otherwise noted. We fitted logistic mixed effects models using the glmer function from the lme4 package (Bates, Maechler, Bolker, & Walker, 2015). We conducted hypotheses tests using the Anova function from the car package (Fox & Weisburg, 2011) and pairwise comparisons using the emmeans package (Lenth, 2018) with the Tukey method to control for the family-wise error rate. For the interested reader, we also report results from ANOVAs and t-tests along with their corresponding standardized effect size estimates in the Supplemental Materials. The level for significance was set at α = .05. In what follows, we report analyses for each measure in approximately the order that they appeared during the experiment.

Study

On- and off-task reports

In our first set of analyses, we tested the hypothesis that self-reported attention would decrease across the task and examined whether attention was captured by the characteristics of changed pairs. To assess self-reported attention across the study phase, we calculated the probability of on-task reports as a function of Block (1–4) and Item Type (see Fig. 2). A model including Block and Item Type as fixed effects indicated a significant effect of Block, χ2 (3) = 40.94, p < .001, no significant effect of Item Type, χ2 (2) = .83, p = 0.66, and a significant Block × Item Type interaction, χ2 (6) = 26.00, p < .001.

Fig. 2
figure 2

Probability of on-task reports as a function of Item Type and Block. Error bars are bootstrap 95% confidence intervals

To investigate the Block × Item Type interaction, pairwise comparisons were conducted to examine the on-task reports across blocks for each item type. For A-B4 items, the on-task probability did not differ between Block 1 and the other three blocks, largest z ratio = 2.50, p = .06. The on-task probability was higher in Block 2 than Blocks 3 and 4, smallest z ratio = 3.48, p = .003, and did not differ between Blocks 3 and 4, z ratio = 1.17, p = .64. For C-D items, the on-task probability in Block 1 did not differ from Block 2, z ratio = 0.62, p = .93, but was significantly higher in Blocks 1 and 2 than in Blocks 3 and 4, smallest z ratio = 2.80, p = .03. The on-task probability did not differ in Blocks 3 and Block 4, z ratio = .18, p = 1.00. For A-B3, A-D items, the on-task probability was significantly higher in Block 1 than in Blocks 2 and 3, smallest z ratio = 2.79, p = .03, but did not differ from the on-task proportion in Block 4, z ratio = .53, p = .95. The on-task probability in Block 2 did not differ from Blocks 3 and 4, largest z ratio = 2.27, p = .11. Notably, the on-task probability was significantly higher in Block 4 than in Block 3, z ratio = 3.53, p = .002.

To examine how this increase in on-task reports for A-B3, A-D items compared to on-task reports for the other item types, we examined the pairwise comparisons for Block 4 across item types. On-task reports in Block 4 were higher for A-B3, A-D than for C-D items, z ratio = 2.81, p = .01. There was not a significant difference for on-task reports in Block 4 between A-B3, A-D and A-B4, z ratio = 1.81, p = .17, but it was in the direction that would be expected, with higher on-task reports for A-B3, A-D than C-D items (A-B3, A-D: M = .71, 95% CI [.66, .76], A-B4: M = .66, 95% CI [.61, .71]). Collectively, these results suggest that attention decreased across the study phase, which is consistent with earlier findings. However, attention to changed items also appeared to increase in Block 4. Importantly, this was not a novelty effect because on-task reports did not follow this pattern for C-D items and were significantly lower in Block 4 for C-D compared to A-B3, A-D items.

Test

Recall performance

Here, we examined the effect of Item Type on correct recall and intrusions. We expected to replicate earlier findings showing better recall for repetitions (A-B4) than single presentations (C-D). It was unclear whether changed pairs (A-B3, A-D) would lead to overall proactive facilitation or interference, and the extent to which intrusions would be output, because that cell should comprise a mixture of facilitation and interference effects that depend on the extent to which change is recollected (Wahlheim & Jacoby, 2013).

Figure 3 (left panel, black points) displays correct recall of Block 4 responses. A model with Item Type as a fixed effect indicated a significant effect, χ2 (2) = 1167.60, p < .001, showing that recall for A-B4 items was higher than for the other two item types, smallest z ratio = 28.81, p < .001. Recall for A-B3, A-D items was also higher than for C-D items, z ratio = 2.67, p = .02. These results show that spaced repetitions of A-B pairs improved memory for those items above once-presented items. In addition, spaced repetitions of A-B pairs prior to changed A-D pairs led to proactive facilitation in overall recall. Later, we verify that this facilitation effect was associated with the extent to which change was recollected.

Fig. 3
figure 3

Probabilities of correct recall (left panel) and prior-block intrusions (right panel) as a function of Item Type. Black points represent overall performance on each measure for each Item Type. The green point represents conditionalized performance for A-B3, A-D items given that participants indicated change and were able to recall the earlier response (Change Recollected). The blue point represents conditionalized performance for A-B3, A-D items given that participants indicated change and did not correctly recall the earlier response (Change Remembered). The red point indicates conditionalized performance for A-B3, A-D items given that participants did not indicate change (Change Not Remembered). The size of the colored points indicates the relative frequencies of responses in each cell. Error bars are bootstrap 95% confidence intervals. Confidence intervals that could not be seen around their respective points are displayed to the left of those points

Figure 3 (right panel, black points) displays intrusions of responses from Blocks 1–3 (for A-B3, A-D items) and baseline intrusion rates (for A-B4 and C-D items). The baseline intrusion rates are estimates of how often participants produced what would have been the earlier response for items in the A-B3, A-D condition. A model with Item Type as a fixed effect indicated a significant effect, χ2 (2) = 982.26, p < .001, showing that intrusions were higher for A-B3, A-D items than both baseline estimates for the other item types, smallest z ratio = 23.18, which were not significantly different, z ratio = .11, p = .99. These results show that participants experienced proactive interference on A-B3, A-D items that led to intrusion errors.

Change classifications

Next, we assessed change classification rates to contextualize later analyses of cued recall conditionalized on those classifications. The probability of correct classifications for A-B3, A-D items was .39 (95% CI = [.37, .41]). False alarms to A-B4 and C-D items were rare, but did occur slightly more often for A-B4 (M = .06, 95% CI = [.05, .07]) than for C-D items (M = .05, 95% CI = [.04, .06]), z ratio = 2.73, p = .02. As described in the Introduction, the MFC framework proposes that change recollection allows one access to the configural representation that contains both responses and their relative order. Most recently, Change Recollected responses have been operationally defined as instances when changed items are classified as such and participants can recall the earlier response (e.g., Garlitch & Wahlheim, 2020; Wahlheim et al., 2019; Wahlheim & Zacks, 2019). We followed that definition here. When participants classified changed items correctly but could not recall the earlier response, we categorized those instances as Change Remembered (Not Recollected). Theoretical work is still needed to explain the processes leading to different patterns for those instances, so we interpret them cautiously. Finally, when participants did not classify changed items as such, we categorized those instances as Change Not Remembered. The probabilities for each change classification type were the following: Change Recollected (M = .28, 95% CI = [.26, .30]); Change Remembered (Not Recollected) (M = .11, 95% CI = [.10, .13]); and Change Not Remembered (M = .61, 95% CI = [.59, .63]).

Recall performance conditionalized on change classifications

In our next set of analyses, we conditionalized recall performance on the change classifications described above to verify that the associations between these measures shown in earlier studies replicated here in our single-list variant of the A-B, A-D paradigm. We conditionalized correct recall and intrusions for A-B3, A-D items on the three instances of change classification outlined above (Fig. 3, green, blue, and red points). We fit separate models with a fixed effect of Change Classification to the conditionalized recall and intrusion data. The model for correct recall also included C-D items to assess proactive effects of memory of earlier responses on recall of the most recent response for A-B3, A-D items.

Based on earlier studies, we expected change recollection to be associated with higher correct recall. The model for correct recall indicated a significant effect of Change Classification, χ2 (3) = 669.37, p < .001. Recall performance was significantly higher for Change Recollected responses compared to the other two classification types, smallest z ratio = 15.62, p < .001, and did not differ between those other classifications, z ratio = 2.01, p = .18. Proactive facilitation was observed when change was recollected, as recall for A-B3, A-D items was significantly higher than recall for C-D items, z ratio = 20.42, p < .001, whereas proactive interference was observed in the other cells in which participants did not recollect change, as recall for A-B3, A-D items was significantly lower than recall for C-D items, smallest z ratio = 4.65, p < .001. These results replicate prior results showing a strong association between change recollection and correct recall of recent responses (e.g., Jacoby et al., 2015; Wahlheim & Jacoby, 2013).

For intrusions, we expected that when participants recollected change, which we defined here as correct recall of the earlier response following a change classification, they would rarely, if ever, produce an intrusion. We expected this because responses of that kind would only occur when participants output the earlier response twice; once as the most recent response and once as the earlier response. We considered these instances to reflect guessing, but we plotted those data to visualize the proportion of observations for change recollection relative to the other cells and to distinguish between intrusion rates associated with the two classifications that included correct change classifications. The model indicated a significant effect of Change Classification, χ2 (2) = 205.20, p < .001, showing significantly fewer intrusions in the Change Recollected cell than the two other two cells, smallest z ratio = 12.96, p < .001. Unexpectedly, intrusions were also significantly lower for Change Not Remembered responses compared to Change Remembered (Not Recollected) responses, z ratio = 5.22, p < .001. From the perspective of the MFC framework, these instances may have reflected memory for change without recollection, which could render participants unable to oppose the high accessibility of A-B responses established through repeated presentations. However, we interpret these differences cautiously and document them primarily for comparison with other studies and to inspire future theorizing.

Relationships between attention during study and memory at test

The analyses above established that self-reported attention generally decreased across the study phase, but attention increased when changed items appeared in Block 4. The analyses above also established that change recollection was associated with proactive facilitation and that the absence of change recollection was associated with proactive interference. Having established these patterns, we next examined associations between self-reported attention during the study phase and both recall of recent responses and change recollection at test.

We first tested the prediction that correct recall of recent responses should be greater for participants who indicate being on-task more often than those who indicate being on-task less often. Since participants were only tested on items from Block 4, we correlated recall performance to on-task reports in Block 4 only. To do this, separate between-subjects Pearson product-moment correlations were computed for each Item Type between on-task report probabilities in Block 4 and correct recall of Block 4 responses. Figure 4 shows that there were positive correlations between on-task reports and correct recall with medium to large effect sizes for each item type (A-B4: r(130) = .34, p < .001; C-D: r(130) = .41, p < .001; A-B3, A-D: r(130) = .45, p < .001). Next, we computed correlations between Block 4 on-task reports and intrusions for A-B3, A-D items to examine how attention during encoding of changed items, which only appeared in Block 4, would influence intrusions. We treated this analysis as exploratory because we reasoned that being on-task more often during Block 4 could indicate that more attention was also allocated during encoding of responses from Blocks 1–3. Indeed, there was a strong positive correlation between on-task reports collapsed across Blocks 1–3 and on-task reports in Block 4, r(130) = .67, p < .001. This increased attention in Blocks 1–3 could facilitate rejection of intrusions post retrieval, make intrusions more accessible and likely to be misattributed as accurate, or some combination of both. Figure 5 (left panel) shows that on-task reports and intrusions were negatively correlated with a small effect size, r(130) = -.12, p = .16. Finally, to test the hypothesis that change recollection would be higher for participants who were on-task more often, we computed correlations between Block 4 on-task reports and change recollection for A-B3, A-D items. Figure 5 (right panel) shows that on-task reports and change recollection were positively correlated with a medium effect size, r(130) = .39, p < .001. Together, these results show that participants who reported being on-task more in Block 4 had higher correct recall, fewer prior-block intrusions, and higher rates of change recollection than participants who reported being on-task less often. We interpret the negative correlation between on-task reports and intrusions cautiously due to the exploratory nature of the analyses and the small effect size.

Fig. 4
figure 4

Between-subjects correlations between Block 4 probability on-task and correct recall for each Item Type. Given that this analysis was only for Block 4, the on-task probabilities were calculated based on three probes per participant for each Item Type. The shaded regions show bootstrap 95% confidence intervals. The effect size and degrees of freedom for each correlation are displayed in the upper left corner of each panel

Fig. 5
figure 5

Between-subjects correlations between Block 4 probability on-task and intrusions (left panel) and change recollection (right panel) for A-B3, A-D items. Given that this analysis was only for Block 4, the probability on-task was calculated based on three probes per participant. The shaded regions show bootstrap 95% confidence intervals. The effect size and degrees of freedom for each correlation are displayed in the upper left corner of each panel

We conducted another exploratory analysis to more generally characterize the association between individual variation in attention during study and episodic memory at test in our sample. We computed the between-subject correlation between on-task reports collapsed across all study blocks and recall performance for C-D items. Figure 6 shows that these variables were positively correlated with a large effect size, r(130) = .50, p < .001, showing that participants who paid more attention during encoding also retrieved episodic memories more accurately.

Fig. 6
figure 6

Between-subjects correlation between the probability on-task during study and correct recall for C-D items. The proportion on-task was calculated based on all 36 probes. The shaded region shows the bootstrap 95% confidence interval. The effect size and degrees of freedom are displayed in the upper left corner of the figure

Next, we tested the prediction that the associations between on-task reports and memory measures should be stronger for items that require new learning than for repeated items by examining recall performance conditionalized on thought probe responses during study. We assumed that if self-reported attention during study improves the ability to correctly recall recent responses, then participants should recall more responses for study items that were followed by on- than off-task reports. Further, we expected this difference to be greater for pairs that appeared for the first time in Block 4 (i.e., in the C-D and A-B3, A-D conditions) than items that repeated throughout the study phase (i.e., A-B4 condition) because repeated items would have more opportunities to be encoded with full attention. We first conditionalized correct recall for each Item Type on whether participants gave an on- or off-task report during Block 4 (Fig. 7, left panel). This analysis only included participants with at least one of each task report in Block 4. This resulted in different combinations of participants being included in each Item Type condition (for the sample sizes, see Fig. 7), and in comparisons of recall differences between Task Reports being made within participants.

Fig. 7
figure 7

Probability of correct recall (left panel) and intrusions (right panel) as a function of probe reports in Block 4. The number of participants that contributed to each on- and off-task comparison is displayed below the recall probabilities for each Item Type. Error bars are bootstrap 95% confidence intervals

We fitted a model to the conditionalized correct recall data that included fixed effects of Item Type and Task Report. The model indicated a significant effect of Task Report, χ2 (1) = 10.26, p = .001, showing that correct recall was higher when participants reported being on- than off-task. The interaction between Item Type and Task Report was not significant, χ2 (2) = 2.77, p = .25, but visual inspection suggested that, consistent with our hypothesis, the recall advantage for on-task reports was greater for novel Block 4 items. Pairwise comparisons confirmed this observation as there was no significant recall difference between task reports for A-B4 items, z ratio = .69, p = .49, but recall was significantly higher for on- than off-task reports for both C-D items, z ratio = 2.58, p = .001, and A-B3, A-D items, z ratio = 2.44, p = .01. These preliminary results suggest that the relationship between attention during Block 4 study and correct recall was stronger for new and changed items than for repeated items. We also conducted an exploratory analysis of intrusions with a model fitted to only A-B3, A-D items (Fig. 7, right panel). Consistent with the comparable between-subject correlation above, the model indicated no significant effect of Task Report, χ2(1) = .23, p = .63, showing little, if any, association between task reports and intrusions.

In our final set of analyses, we tested the prediction that change recollection would occur more often when participants reported being on- than off-task during Block 4 study. We also performed an exploratory analysis of the association between task reports and remembering but not recollecting change for which we had no a priori prediction. We assessed differences in the rates of each change classification type conditionalized on task reports (see Fig. 8) by fitting separate models with a fixed effect of Task Report to each classification. The model for Change Recollected indicated a significant effect, χ2(1) = 5.98, p = .01. The model for Change Remembered indicated no significant effect, χ2(1) = 2.76, p = .10. Finally, the model for Change Not Remembered indicated a significant effect, χ2(1) = 29.75, p < .001. Together, these results show that when participants reported being on-task while studying changed pairs during Block 4, they recollected changes more often at test.

Fig. 8
figure 8

Probability of change classifications as a function of Task Reports in Block 4 for A-B3, A-D items. The number of participants that contributed to the on- and off-task comparison is displayed in parentheses in the figure title next to the Item Type. Error bars are bootstrap 95% confidence intervals

Discussion

The present experiment examined how natural fluctuations in self-reported attention were associated with change recollection and memory performance under conditions that could lead to proactive interference effects. The results showed that attention generally decreased across the study phase, except when changed items appeared in the last block. In addition, cued recall for changed items replicated prior findings showing that overall performance comprised a mixture of proactive facilitation and proactive interference effects, depending on whether change was recollected or not. Analyses examining the relationship between self-reported attention during study and memory measures at test showed positive associations between on-task reports and correct recall of recent responses in both between- and within-subject comparisons. For the latter, there was suggestive evidence that this association was greater for items that were novel during the last study block than for items that repeated across study blocks. Critically, both between- and within-subjects comparisons also showed that on-task reports were positively associated with change recollection. In what follows, we discuss the implications of these findings for the MFC framework perspective on memory updating and the literature reporting associations between on-task reports and episodic memory performance.

Attentional fluctuation and memory for changes

As described in the Introduction, the MFC framework proposes that overall recall performance for changed items in an A-B, A-D paired associate learning paradigm comprises both proactive facilitation and interference effects. When change is recollected, proactive facilitation is observed and when change is not recollected, proactive interference is observed (Jacoby et al., 2015; Wahlheim & Jacoby, 2013). We replicated these effects that are typically observed in dual-list paradigms using a single-list variant with changes occurring towards the end of the list. We also observed overall proactive facilitation for changed items, suggesting that the current design and materials lead to frequencies of change detection and recollection that were suitable to produce proactive facilitation in overall recall.

The most novel contribution of the present study to the episodic memory-updating literature was the examination of the association between self-reported attention during encoding and change recollection at test. This allowed us to evaluate an untested assumption of the MFC framework about the role of attention in change processing and the associated benefits for memory updating. Based on previous work showing that elaborative encoding is more effective for later memory performance when mind wandering does not occur (e.g., Thomson et al., 2014a), we predicted that on-task reports would be positively associated with change recollection, which would be associated with higher memory accuracy on the cued recall test. Consistent with this hypothesis, we found that people who were on-task more often were more likely to show higher recall for all item types and higher change recollection than people who were on-task less often. Furthermore, when participants indicated being on-task in the last block, change recollection rates and correct recall for A-B3, A-D items were both higher than when participants were off-task. Taken with the finding that change recollection is associated with proactive facilitation in recall of recent responses, the positive association between on-task reports and change recollection provides correlational evidence supporting the casual assumption of the MFC framework that attention to changed stimuli during encoding can trigger retrieval of related stimuli that appeared earlier and enable encoding of configural representations that preserve memory for temporal order.

As described above, we also expected that recall differences based on task reports would be greatest for novel items that appeared in Block 4 because that was the only opportunity to encode such items. Consistent with this prediction, on-task reports during the final study block were associated with higher recall performance for both C-D and A-B3, A-D items, but not A-B4 items. However, we interpret these findings with caution because they emerged from pairwise comparisons that followed up a non-significant interaction. Note that we were underpowered to detect this interaction after excluding participants from the analysis if they did not make at least one on-task and one off-task report in Block 4.

The associations between attention during study and change processing reported here suggest that more theoretical work is needed for the MFC framework to account for the role of variations in attention in the memory benefits observed when changes are detected and recollected. The present results suggest that conscious attention to the changed response may be required to stimulate retrievals, either spontaneously or with controlled processes, that enable integration of both the original and changed response into configural memory representations. One fruitful direction for development of the MFC framework would be to conduct empirical studies aimed at characterizing how self-reported attention to both original and changed information is associated with later memory performance. This would provide a more complete view of how attentional process gives rise to the formation of configural representations. Another direction would be to manipulate how participants allocate attention to changed items, perhaps using incentives (cf. Friedman & Castel, 2013), to establish a causal link between controlled attention during encoding and the memorial benefits associated with detecting and recollecting change.

Mind wandering and episodic memory

In the current experiment, we used thought probes as a tool to measure attentional fluctuation during study. By doing so, the present findings can further contribute to the limited literature reporting associations between mind wandering and episodic memory in standard memory paradigms. Research has shown that the type of processing used during encoding can influence how likely participants are to pay attention. When participants are asked to engage self-referential encoding (Maillet & Rajah, 2013), or if the word is too easy or too difficult for them to study (Xu & Metcalfe, 2016), they are more likely to mind wander. The present results add to these findings by showing that participants are less likely to mind wander when changed items appear after several repetitions. We interpret our findings as showing that changed pairs captured participants’ attention more than did repetitions or even completely novel items. This could reflect a type of memory based-prediction error that occurs when repeated cues lead participants to expect responses that they remembered from prior repetitions (for a similar suggestion in the context of event comprehension, see Wahlheim & Zacks, 2019). It is also possible that the increase in attention to changed items could represent an increase in task difficulty, as this has also been shown to reduce mind wandering (e.g., Ju & Lien, 2018; Rummel & Boywitt, 2014).

This finding of increased attention to changes is somewhat consistent with other work showing that different kinds of stimulus changes are associated with less mind wandering. For example, Faber, Radvansky, and D’Mello (2018) examined the number of self-caught mind wandering episodes while participants watched a narrative film that included a range of situational changes. They found that more situational changes in the narrative and a higher likelihood of an event boundary (which is another type of change) were associated with less mind wandering. Related to this, Metcalfe and Xu (2016) found that interleaving artwork from different artists during study led to less mind wandering than did presenting the artwork from the same artist in a massed fashion (for additional evidence of differential allocation of attention during blocked and intermixed study, see Wahlheim, Dunlosky, & Jacoby, 2011). Together these findings suggest that changes either at the situation model or item level may help one sustain their attention during a task. This may have also occurred in the present experiment when changed responses appeared in the last block of the study phase.

Another possibility is that retrieving the earlier response when changed responses appeared (which was assumed to occur during change detection) acted as a type of test. Prior work has shown that inserting tests during study can reduce the rates of mind wandering (Szpunar, Khan, & Schacter, 2013). According to the MFC framework, the presentation of a changed A-D pair may have stimulated retrieval of earlier A-B pairs, suggesting that A-D pairs sometimes acted as test cues. It could be argued that the presentation of a repeated A-B pairs may also stimulate the retrieval of earlier A-B pairs (Wahlheim, Maddox, & Jacoby, 2014), but the experience associated with such retrievals may differ. The retrievals triggered by A-D pairs will likely stimulate a qualitatively different subjective experience and subsequent representations because of the additional response (i.e., the D term) compared to retrievals triggered by the re-presentation of A-B pairs.

Limitations and future directions

Although the results of the current study support the proposed relationship between attention and the ability to recollect changes, there are several limitations that should be acknowledged. First, the current study included thought probes that were inserted pseudo-randomly throughout the blocks in order to capture attention lapses more naturally, but this meant that there was not a direct match for probes to appear after the same items in each block. Consequently, the data reported here do not allow us to draw conclusions about attention allocation for the original and changed presentation of specific items during the study phase. In order to more accurately capture the associations of attention on change detection and recollection for items, we plan to compare attention for the presentations of both the A-B item and associated A-D items in the study phase and then examine the associations between task reports and later memory measures. In addition, we plan to increase the number of changes in the study list to increase observations. One concern with the current experiment is that the primary analyses involved conditionalization and, as noted earlier in the Discussion, many participants had to be removed from the analyses because they did not have both an on- and an off-task report in Block 4, thereby reducing power. Consequently, one limitation that should be improved upon in future work is increasing the power to detect the experimental and correlational effects of interest.

As with earlier studies relying on self-reported mind wandering episodes, the accuracy of self-reported attention to the task is difficult to verify. Furthermore, it is possible that variations in the experimental design could influence the results. For example, asking participants to make a discrete on- and off-task judgment in the current study deviates from other mind wandering work that uses several categorized thought options (e.g., Kane et al., 2007; Kane et al., 2017) or explicitly gives participants the option to indicate that they are “mind wandering” (e.g., Metcalfe & Xu, 2016; Xu & Metcalfe, 2016). Prior work has found that mind wandering rates can vary as a function of probe framing (e.g., Weinstein, De Lima, & van der Zee, 2018), and this could influence the rates at which participants reported being on-task in the present experiment. Furthermore, due to the constraints of the present design, probes appeared 62 s apart on average. Choices about the distance between probes could also impact on-task reports because mind wandering rates increase with the time between probes (Seli, Carriere, Levene, & Smilek, 2013). Given these considerations, future work should examine how thought-probe framing and timing moderate the relationship between self-reported attention and change processing.

Conclusions

The current experiment was the first to characterize the associations between attention fluctuation, change processing, and episodic retrieval in order to test the assumption from the MFC framework about the role of attention in episodic memory updating. Results showed that recall performance and change recollection were higher when participants reported being on- than off-task in both between- and within-participant comparisons. These correlational results are consistent with the MFC framework, positing that attention to changed stimuli during encoding is necessary to later recollect changes, which in turn is associated with higher memory performance for more recent responses. Future work should examine the causal role of attention during encoding on memory for changes, examine how combinations of attention on both original and changed information can influence the processes posited by the MFC framework, and test the boundary conditions of the present findings using various thought probe methods from the mind wandering literature.