The ability to accurately assess the quality or extent of encoding in memory is necessary for the efficient control of learning (Finley, Tullis, & Benjamin, 2010). Knowing how well a stimulus is stored allows people to predict whether they will be able to remember that stimulus in the future and provides them with an informed basis for deciding whether to continue or terminate study (Son & Metcalfe, 2000; Thiede & Dunlosky, 1999) or how to schedule upcoming study events (Benjamin & Bird, 2006; Son, 2004). Consequently, it is of importance to understand the extent to which people are able to accurately assess their future memory performance, as well as to elucidate what factors are used to make those assessments.

Judgments of learning (JOLs) are subjective assessments of future recall performance (for a review, see Dunlosky & Metcalfe, 2009). Such judgments generally reveal a modest level of accuracy that can be enhanced quite dramatically by delaying the JOL until some time after study (Nelson & Dunlosky, 1991, 1992; Dunlosky & Nelson, 1992; Koriat & Ma’ayan, 2005). Since JOLs have frequently been found to be dissociable from memory performance, many theorists have suggested that JOLs are inferences based on cues such as the experience of processing the stimulus or retrieving it from memory (Begg, Duft, Lalonde, Melnick, & Sanvito, 1989; Benjamin & Bjork, 1996; Benjamin, Bjork, & Hirshman, 1998a; Koriat, 1997; Mazzoni & Nelson, 1995; Schwartz, Benjamin, & Bjork, 1997). Often, these cues are predictive of future memory performance, but they can be misleading and lead to inaccurate JOLs under atypical conditions. For example, a reliance on retrieval fluency—the rapidity with which an item is retrieved from memory—leads participants to predict high future levels of recall for currently highly accessible material. This heuristic is generally useful because rapid recall now often reflects those same factors—degree of learning and efficient retrieval routes—that predict probable recall later (Benjamin & Bjork, 1996; Benjamin, Bjork, & Schwartz, 1998b; Blake, 1973). However, certain circumstances, such as the recall of items in the recency portion of a short list, violate that relationship. Those items are typically remembered quickly and well at immediate test but poorly after a delay (Craik, 1970), thus leading the retrieval fluency heuristic to fail under those conditions (Benjamin et al., 1998b).

A currently popular distinction segregates the world of metacognitive cues into intrinsic, extrinsic, and mnemonic factors (Koriat, 1997). Mnemonic factors are experiences that give the participant the subjective feeling that the stimulus has been stored well in memory, such as fluent encoding (Begg et al., 1989; Hertzog, Dunlosky, Robinson, & Kidder, 2003) or fluent retrieval (Benjamin & Bjork, 1996; Koriat & Ma’ayan, 2005). In contrast, intrinsic and extrinsic factors have to do with the stimulus materials and conditions of learning, respectively. Intrinsic factors are the properties of a stimulus that have an effect on memory (or that participants think have an effect on memory), such as the relatedness of a pair of words (see Rhodes & Castel, 2008). Extrinsic factors are the experimental circumstances that influence memory, such as study list length, study duration, and so forth Intrinsic and extrinsic factors can affect JOLs indirectly—mediated by their effect on mnemonic variables such as fluency—or can affect JOLs directly, through a more analytic inference. Such analytic bases for judgment are typically inferior to experiential ones (cf. Kelley & Jacoby, 1996), but the demands of everyday metacognition do not always provide an obvious experiential alternative.

In the present experiments, we used a variant of the classic proactive interference (PI) paradigm, in which the cues, but not the targets, from cue–target pairs are repeated across multiple study and test episodes. Re-pairing the same cues with new targets increases the amount of cue-specific PI (or cue overload; Watkins & Watkins, 1975) and reduces cued-recall performance for those new targets (Postman, 1962). It is unclear whether participants will reveal metacognitive sensitivity to the effects of PI and what aspects of the encoding experience will underlie that sensitivity or lack thereof.

Previous research has investigated people’s metacognitive ability to predict interference. Metcalfe, Schwartz, and Joaquim (1993) investigated how PI affects feeling-of-knowing (FOK) judgments. They found that people did not appropriately predict the negative effects of PI. Instead, participants gave higher FOKs to items for which the cues (but not the targets) had been repeated. This result suggests that the familiarity of the cue drove people’s metacognitive judgments but that PI did not play a role.

In contrast, Maki (1999) found that this cue familiarity explanation did not generalize to JOLs elicited during a retroactive interference (RI) task. Participants went through two study–test cycles of number–noun pairs. Some numbers were repeated across the two lists. After both lists, they were given the cues from the first test and were asked to give JOLs. Participants correctly predicted that cues that were associated with more than one target would lead to poorer cued-recall performance. Although Maki’s study differs from Metcalfe et al. (1993) in that Maki used JOLs, JOLs are nonetheless sensitive to cue familiarity under other conditions (Benjamin, 2005), and thus it is noteworthy that they correctly predict decreases in performance with the repetition of cues. Maki interpreted her results as suggesting that increasing the set of possible responses tends to increase target competition (Schreiber, 1998; Schreiber & Nelson, 1998) and that JOLs reflect that competition.

It is unclear whether JOLs elicited during a PI task would be based on target competition, cue familiarity, or an entirely separate basis for judgment. The A–B A–C paradigm used here provides an opportunity to assess the metacognitive response to PI across a series of trials. If participants base their JOLs on target competition or some other metacognitive cue that is sensitive to the cue overload (Watkins & Watkins, 1975) that builds up with the re-pairing of cues with new targets, JOLs should decrease across trials (as in Maki, 1999). On the other hand, if participants are sensitive to cue familiarity, JOLs will increase across trials, despite decreases in performance (as in Metcalfe et al., 1993).

Alternatively, participants’ predictions of PI might be an analytic inference—that is, they may be based on a naïve theory of the effects of the experimental manipulation on their performance. If that is the case, it is possible that participants would predict the effects of interference, but it is not necessary that they do so. The experiments presented here also included a release-from-PI trial, in which a new list of cues was paired with a new list of targets. The cue overload that accrues over lists should be “released” by this manipulation, and consequently, memory should improve (Wickens, 1970). If participants are relying on an analytic notion of interference, there is no reason to think that they would appreciate the cue-specific nature of PI in this paradigm. That is, participants may have a general notion of PI that does not appreciate the cue-specific nature of the interference in this paradigm. Specifically, there are reasons to believe that much interference that leads to forgetting in the real world is of a general sort and is not related to the overloading of specific cues (Wixted, 2004). If participants’ understanding of PI derives from their interactions with the world and with their own experience of forgetting, JOLs might reflect global PI—something that accrues across lists regardless of the circumstances of cue pairing—and not cue-specific PI. Two hallmarks would be evident in that case: a failure of JOLs to appreciate the difference between a condition that promotes cue-specific interference (A–B A–C) and one that does not (A–B D–C), and a failure to appreciate the effects of release from (cue-specific) PI.

In summary, if participants are relying on a heuristic cue that is sensitive to cue overload, they should be able to correctly predict decreases in performance during the PI trials. Moreover, they should also be sensitive to the absence of cue overloading during the release-from-PI trials, and JOLs should increase on those trials. On the other hand, if participants are basing their JOLs on cue familiarity, JOLs should increase during the PI trials as recall performance goes down, and they should decrease on the release-from-PI trial as recall performance rises. Finally, if they are basing their predictions not on any mnemonic cue, but rather on a naïve theory of interference, they may predict decreases during the PI trials but may not necessarily predict the increase that occurs during release from PI.

Experiment 1

Method

Participants

Forty-one undergraduate students from the University of Illinois, Urbana-Champaign participated for partial credit in an introductory psychology course.

Materials

Stimuli were 520 medium- to high-frequency words (mean Kučera–Francis frequency = 86 uses per million, range = 40–200) selected from the MRC database (http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm). Two lists of cues and four lists of targets were created for each participant by randomly sampling from these words. Each list consisted of 20 words randomly selected for each participant. The first three study–tests lists generated PI by using the same set of cues, each time re-paired with a new list of targets. The fourth and final study–test list consisted of the other list of cues paired with the fourth list of targets. None of the targets were repeated.

Procedure

Participants were instructed that they were going to study a list of 20 word pairs and that they would later be asked to recall each target when presented with its cue. They were also instructed that they were to provide JOLs for each pair during the study phase. They entered the JOLs by typing in a number between 0 and 100 (inclusive) on the number pad that indicated “how likely they were to later recall the word on the right given the word on the left.” At the end of the study phase, participants were also asked to make an aggregate JOL (aJOL) that indicated “the proportion of all the pairs they think they will be able to recall.” They were not told that there would be additional study–test phases, nor did the instructions for subsequent study phases say how many additional study–test phases there would be.

Participants went through four study–test phases. The first three study–test phases used the same list of cues but a different list of targets each time. The fourth study–test phase used the other list of cues and the last list of targets. The cues were presented in a different random order each time. During the study phase, pairs were presented on the screen for 2 sec, with a 1-sec blank screen prior to the JOL prompt. The test was a cued-recall test. Cues were presented on the screen along with a prompt for the participants to type the target. The cue stayed on the screen until participants typed a response and hit Enter, at which time the next cue was presented.

Results

All inferential statistics reported are reliable at the α <.05 level using two-tailed tests unless otherwise noted. Figure 1 depicts mean recall performance, mean JOLs, and aJOLs as a function of study–test trial. Answers were counted as correct only if participants spelled the target exactly as it was originally presented.

Fig. 1
figure 1

Mean cued recall, JOLs, and aJOLs, as a function of study–test trial in Experiment 1. Trial 4 is the release-from-PI trial

As was expected, there was a reliable decrease in recall over the PI trials (trials 1–3), F(2, 80) = 5.82, MSE = .07. JOLs and aJOLs also reliably decreased across these trials, F(2, 80) = 26.11, MSE = .16, and F(2, 80) = 8.90, MSE = .11, respectively.

Recall increased reliably between trials 3 and 4, t(40) = 2.53, SD = .14), but, in contrast, JOLs on trial 4 failed to predict release from PI and were reliably lower than those on trial three, t(40) = 3.36, SD = .09). Also, aJOLs did not increase from trial 3 to 4. If anything, they exhibited a small quantitative decrease, but this difference was not reliable, t(40) = 0.98, n.s.,SD = .12).

Discussion

Judgments decreased with increasing PI across the first three trials of the experiment. This result appears at first blush to be consistent with the target competition, rather than with the cue familiarity, view, but neither view can comfortably accommodate the finding that judgments continued to decrease during the release-from-PI trial. This combination of findings suggests that JOLs likely do not have a mnemonic basis in this task: If participants were sensitive to cues at study that accurately reflected impending interference at recall, those same cues should have been diminished on the release trial. It thus seems likely that participants used an analytic basis to predict the effects of PI but, because they did not have a particularly sophisticated mental model of interference effects, they did not successfully predict the effects of the release trial.

In this experiment, judgments might have decreased across trials because participants had a sense of how cue competition, as elicited by the repetition of cue terms, induced PI. Alternatively, the decreasing JOLs might reflect a theory of global PI, whereby they recognized that memory performance should, in general, decrease as the number of things to be remembered increased (Strong, 1912; Wixted, 2004). Experiment 2 evaluated these possibilities by comparing a cue PI condition replicating Experiment 1 with a condition in which cue terms were not repeated and, thus, only global PI was present.

Experiment 2

Experiment 1 showed that participants fail to predict release from PI even after having accurately predicted PI. This result suggests that participants do not appreciate the cue-specific nature of the PI. The fact that participants correctly predict decreasing performance across PI trials appears to contradict that claim, but an alternative must be considered. Predictions of decreasing performance may simply reflect a belief about list-wise—rather than cue-specific—interference. Experiment 2 evaluated this claim by examining how the repetition of cues influences JOLs. In this experiment, one group of participants (repeated-cues group) experienced a PI procedure equivalent to the paradigm reported in Experiment 1. A second group (novel-pairs group) went through a similar procedure, with the exception that the cues did not repeat across trials: All pairs were composed of both novel cues and novel targets. Comparisons between these groups will indicate the extent to which decreases in JOLs across trials reflect an appreciation for cue-specific PI. A third group (repeated-pairs group) studied lists in which both the cue and targets were repeated across study–test trials. This group should replicate previous studies of the effects of repetition on JOLs (Koriat, Sheffer, & Ma’ayan, 2002) and thus permit clearer attribution of the decrease in predictions to either cue-specific or global PI. It will also provide a benchmark by which to evaluate the magnitude of decreases evident in the first two groups.

Method

Participants

One hundred sixty-two participants from the University of Illinois, Urbana-Champaign participated in this experiment. Fifty-five participated in the repeated-cues condition, 52 in the novel-pairs condition, and 55 in the repeated-pairs condition.

Materials

All conditions consisted of four study–test trials. Lists of a randomly selected 20 words were constructed, using the same word pool as that in the previous experiment. The materials and procedure for participants in the repeated-cues group were identical to those outlined in the Method section of Experiment 1. Four lists of cues and four lists of targets were generated for each participant in the novel-pairs condition. Each of these lists of cues was used for one of the study lists, with no cue (or target) being used twice. Finally, one cue list and one target list were generated for each participant in the repeated-pairs condition. These two lists made a single study list that was repeated (in a different random order) for each study–test phase.

Procedure

All three groups studied word pairs in four study–test phases. For the repeated-cues group, the first three trials shared the same cues, and the fourth had a different set of cues. For the novel-pairs group, no word was repeated in any of the study lists. Finally, for the repeated-pairs group, the same list of cues and targets was repeated for all four study–test phases. All other presentation parameters were the same as those in Experiment 1.

Results and discussion

Mean performance on the cued-recall task, as well as mean JOLs and mean aJOLs, are presented in Fig. 2 for the repeated-cues, novel-pairs, and repeated-pairs groups.

Fig. 2
figure 2

Mean cued recall, JOLs, and aJOLs, for the repeated-cues group (left panel), the novel-pairs group (middle panel), and the repeated-pairs group (right panel) as a function of study–test trial in Experiment 2. Trial 4 is the release-from-PI trial for the repeated-cues group

An ANOVA was conducted on recall for the repeated- and novel-pairs groups during first three trials, with test phase and item type as factors. This analysis revealed a reliable interaction, F(2, 210) = 5.64, MSE = .02. Planned comparisons revealed a reliable decrease in recall for the repeated-cues group, F(1,105) = 12.83, MSE = .02, indicating PI, but no change in recall for the novel-pairs group, F(1,105) < 1, MSE = .02. In contrast, an ANOVA conducted over the first three trials for the repeated-pairs group showed a reliable increase during these trials, F(2, 108) = 221.05, MSE = .01.

Of interest is how participants’ JOLs changed over trials for each of the groups. JOLs for both the repeated-cues and the novel-pairs groups decreased from trial 1to 3, F(2, 210) = 60.35, MSE = .01. This decrease did not differ between the two groups, as indicated by the lack of an interaction, F(2, 210) < 1, MSE = .01, suggesting that participants’ predictions of decreases across trials were not the result of their sensitivity to the repetition of the cues. Rather, participants appeared to employ a more general strategy of predicting poorer performance with each subsequent list that they studied. This interpretation must be qualified by the results from the repeated-pairs group, whose mean JOLs increased as expected from trial 1 to 3,F(2, 108) = 64.03, MSE = .02. It appears to be the case, then, that participants predict poorer performance (i.e., interference) only when they are acquiring new information. This pattern is consistent with what would be expected if naïve theories of forgetting have a role but predict only global PI, and not cue-specific PI.

The aJOLs followed the same pattern across trials as did the JOLs. They reliably decreased for both the repeated-cues and the novel-pairs groups between trials 1 to 3, F(2,210) = 26.86, MSE = .01, and this decrease did not interact between the two groups, F(2, 210) = 0.38, MSE = .01, indicating that it was not the result of the cue-specific effects of PI. Also, aJOLs reliably increased for the repeated-pairs group between trials 1 to 3, F(2, 108) = 67.33, MSE = .02.

For the release-from-PI trial, the change from trial 3 to 4 was reliably different between the repeated-cues group and the novel-pairs group, F(1, 105) = 10.01, MSE = .02). Planned comparisons revealed that this was because recall increased for the repeated-cues group, F(1, 105) = 15.38, MSE = .02, but not for the novel-pairs group, F(1, 105) = 0.36, MSE = .02. JOLs and aJOLs for the repeated-cues group did not predict this increase but, rather continued to decrease, t(54) = 5.60, SD = .08, and t(54) = 2.83, SD = .12, respectively. JOLs and aJOLs for the novel-pairs group did not change from trial 3 to 4, t(51) = 0.95, n.s., SD = .08, and t(51) = 0.50, n.s., SD = .13, respectively. Not surprisingly, recall for the repeated-pairs group increased from trial 3 to 4, as did JOLs and aJOLs, t(54) = 5.75, SD = .07; t(54) = 7.28, SD = .11; and t(54) = 6.67, SD = .10, respectively.

These results suggest that participants predict interference from the learning of additional information regardless of whether that interference is cue specific or not. There was no difference in the rate of decrease of both JOLs and aJOLs between the repeated-cues group—who actually experienced PI—and the novel-pairs group, whose performance did not change across trials. Participants must have been using a general strategy to predict poorer performance with each subsequent study list. This strategy was not used, however, in the repeated-pairs group, who correctly predicted that performance would increase. Rather, participants seemed to predict decreases in performance when they were learning novel information. It is worth noting, however, that the increase in JOLs for the repeated-pairs group was smaller than the increase in recall, a phenomenon typically known as the underconfidence-with-practice effect (Koriat et al., 2002). Results from the other conditions raise the possibility that participant’s increasing underconfidence across trials could, in part, reflect learners’ theories about general interference.

Taken together, Experiments 1 and 2 reveal that participants predict decreases in performance over trials and over lists because of a general belief that the acquisition of new material is increasingly difficult. This leads JOLs to decrease over trials in either the presence or the absence of cue-specific PI and additionally explains why those judgments are not sensitive to release from PI. Because novel cue–target pairings elicit the same metamnemonic response as pairs with repeated cues and novel targets, it is not surprising that participants fail to appreciate the mnemonic benefits of the release manipulation.

In Experiment 3, we examined the role of experience in the failure to appreciate cue-specific release from PI. Numerous studies have shown that experience with a memory task, and especially experience engaging in explicit metamnemonic judgments during that task, can ameliorate such errors (Benjamin, 2003; Dunlosky & Hertzog, 2000). However, if participants are insensitive to the role that cue repetition plays in the promotion of interference and are not monitoring that aspect of the learning materials, experience with the procedure may not promote an increase in metamemory accuracy.

Experiment 3

In Experiment 3, we investigated whether experience with PI and release from PI could help participants become sensitive to the cue change that predicts release. Participants proceeded through three study–test phases with word pair lists that had the same cues. The fourth study–test phase again served as the release phase by introducing a new set of cues. This new list of cues served as the cues for additional fifth and sixth study–test phases, thereby building up PI for a second time. A seventh and final phase was a second release trial, including a new (third) set of cues. The goal of this experiment was to examine whether participants would learn that the PI that builds up in this paradigm is critically tied to the repetition of cues.

Method

Participants

Sixty-nine participants from the University of Illinois, Urbana-Champaign participated for partial credit in an introductory psychology course.

Materials

The materials were the same as those in Experiment 1, with two exceptions. One was that the list length was reduced to ten words in order to compensate for the increased number of study–test phases. Also, there were now three lists of cues and seven lists of targets created individually for each participant. Two of the lists of cues were each paired with three different lists of targets. The last list of cues was paired with the last list of targets.

Procedure

The procedure was the same as that in Experiment 1, except for the addition of fifth, sixth, and seventh study–test phases. The fifth and sixth trials shared the same list of cues as the fourth trial. The seventh trial had a new list of cues and thus served as a second release-from-PI phase. All display, study, and test parameters were the same as those in Experiment 1.

Results and discussion

Mean performance on the cued-recall task, as well as mean JOLs and mean aJOLs, are presented in Fig. 3. Recall performance reliably decreased during the first three trials, F(2, 136) = 8.77, MSE = .02, and during trials 4–6, F(2, 136) = 13.30, MSE = .02.

Fig. 3
figure 3

Mean cued recall, JOLs, and aJOLs, as a function of study–test trial in Experiment 3. Trials 4 and 7 are release-from-PI trials

JOLs and aggregate JOLs decreased from trials 1 to 3, F(2, 136) = 17.64, MSE = .01, and F(2, 136) = 20.39, MSE = .01, respectively, replicating the results of Experiments 1 and 2. JOLs also decreased during the second set of PI trials (trials 4–6), F(2, 136) = 3.69, MSE = .01. The decrease in aJOLs during these trials, however, was not reliable, F(2, 136) = .90, n.s., MSE = .01.

Recall increased between trials 3 and 4, t(68) = 2.11, SD = .11, and again between trials 6 and 7, t(68) = 5.51, SD = .20), indicating release from PI. Once again, JOLs and aJOLs failed to predict this increase. During the first release-from-PI trial, JOLs predicted decreases in performance, t(68) = 1.90, p = .06, SD = .12. Also, aJOLs decreased, but not reliably, t(68) = 0.56, n.s., SD = .16. During the second release-from-PI trial, neither the increase of .003 for JOLs nor the decrease of .004 for aJOLs was statistically reliable, t(68) = 0.25, n.s., SD = .09, and t(68) = 0.30, n.s., SD = .12, respectively. Nonetheless, neither measure predicted the increases in performance that occurred in recall performance, suggesting that experience with buildup and release from PI did not promote accurate prediction of such effects on a second opportunity. This result suggests that participants were not at all sensitive to manipulations of cue repetition throughout the procedure and, thus, were not able to attribute changes in performance across trials—to the degree that they were aware of them—to such manipulations.

General discussion

In this research, we investigated people’s metacognitive judgments of PI and release from PI. In three experiments, JOLs and aJOLs decreased across study–test phases as much as or more so than actual cued-recall performance during PI trials. This result indicates that participants were metacognitively sensitive to the effects of interference. However, participants did not appreciate the cue-specific nature of the PI induced in these experiments, as evidenced by their failure to predict release from PI (in all experiments) and the lack of differential predictions for repeated-cues and novel-pairs groups (Experiment 2).

These results differ from those of Maki (1999), who found that JOLs do differentiate between repeated cues and novel pairs in an RI task. Participants in that experiment may have been sensitive to target competition, since the JOLs were delayed until some time after both of the competing targets were associated to the cue. In the experiments presented here, JOLs were made at the moment of association, and this may have minimized any mnemonic sensitivity participants may have had to target competition or cue overloading.

Our results also differ from those in Metcalfe et al. (1993), in that JOLs did not appear to be driven by cue familiarity. This difference may reflect the fact that cue familiarity affects metacognitive judgments only when those judgments are made during a later test. Making those judgments at a later time opens the door for mnemonic factors to play a role, just as in the Maki (1999) experiment.

In our experiments, in which judgments were made at study, there is little evidence that participants used a mnemonic index to predict decreases across trials. If participants had used increased competition across trials as a basis for JOLs, for example, they would have predicted the decreases across PI trials (which they did) and an appropriate increase on the release trial (which they did not). Also, those indices would not have predicted decreases in performance when the cues were novel, as they were in the novel-pairs group in Experiment 2. Likewise, cue familiarity cannot account for the pattern of results, since it would predict increasing JOLs across PI trials and decreasing JOLs on the release trial.

Rather, it appears that participants apply a naïve theory of memory to the prediction task. That theory fails to incorporate a cue-specific basis for interference. Participants choose to provide lower judgments across the experiment on the basis of a more general belief about list-wise PI. They base their judgments on this general belief even when there is no discernable interference present (Experiment 2). In contrast, participants do not predict interference when they are asked to relearn the same word pairs. This suggests that their beliefs of interference are specific not to the repetition of study trials but, rather, to their beliefs about learning additional information. The participants’ failure in our experiments, however, might indicate ecologically tuned metacognition, given the claim that forgetting in the real world owes more to global than to cue-specific competition (Wixted, 2004). From that perspective, participants’ shortcomings in this task might reflect the general ecological invalidity of cue-specific PI paradigms, rather than a meaningful metacognitive shortcoming.

In Experiment 3, we examined whether participants would learn to predict release from PI after having had prior experience with the phenomenon. If participants had learned about the effects of changing cues in the cue–target pairs during their first experience with release from PI, they might have been able to infer the increase on the second release-from-PI trial. This would suggest that participants were updating their mental models of PI to include release from PI. There was little indication that this took place. They did not learn to predict the large increases in performance that accompany a new set of cues, suggesting that they did not monitor that aspect of the task carefully.