Predicting what we will remember and what we will forget is crucial for daily functioning. For example, when meeting someone, knowing that you will remember their face but not their name can help you focus on what will soon be forgotten and what needs to be remembered. For instance, if you know that you are likely to forget someone’s name, you can prioritize memory for this information to avoid the potential consequences (e.g., social awkwardness) of forgetting. Likewise, if you know what is likely to be forgotten, you can prioritize memory for other information that is more likely to be remembered. Thus, evaluating what we know and what we do not know may be critical in optimizing memory, but it is unclear how these processes affect memory for information that is likely to be remembered and information that people feel will soon be forgotten.

Metacognition reflects one’s knowledge about cognition (Nelson & Narens, 1990; see also Dunlosky et al., 2016; Nelson, 1996; Rhodes, 2019) and metamemory refers to how we monitor and control learning and memory processes. When predicting whether some to-be-remembered information will be remembered or forgotten, learners are engaged in metacognitive monitoring. Specifically, metacognitive monitoring often involves evaluating future memory performance—both what will be remembered and what will be forgotten. In contrast, metacognitive control involves self-regulating learning and is often informed by metacognitive monitoring (see Dunlosky & Tauber 2016, for a review of metamemory).

Researchers typically operationalize monitoring of future learning in terms of judgments of learning (JOLs; see Rhodes 2016 for a review) by asking participants to evaluate the likelihood that information will be remembered in the future. These judgments of learning are informed by the cues available to the learner during encoding. According to Koriat’s (1997) cue-utilization framework, three types of cues inform monitoring judgments. The characteristics of to-be-remembered information that influence or are believed to influence memory (e.g., word frequency, or word-pair relatedness) exemplify intrinsic cues. A learner’s encoding operations as well as the studying and testing conditions such as study time, retention interval, or the type of test (i.e., recall versus recognition) illustrate extrinsic cues. A learner’s experience with to-be-remembered information, such as how easily the information comes to mind in response to a cue, exemplifies mnemonic cues. This cue-utilization framework has been frequently supported (e.g., Bröder & Undorf 2019; Koriat, 2015; Rhodes, 2016) such that predictions of memory performance are generally accurate when based on the cues that affect recall (see Dunlosky & Matvey 2001; Nelson & Dunlosky, 1991; Rhodes & Tauber, 2011; Tiede & Leboe, 2009), but important metacognitive biases have been observed when learners rely on cues that are not always indicative of later recall (e.g., font size, loudness; see Rhodes & Castel 2008, 2009; see also Castel & Rhodes 2020).

Metamemory judgments often take the form of judgments of the likelihood of remembering (but see Finn 2008; Li et al., 2021; for judgments of the likelihood of forgetting; Tauber & Rhodes 2012, for estimates of the duration of retention), and making metamemory judgments can influence memory, an effect known as reactivity (Arbuckle & Cuddy, 1969; Double & Birney, 2019; Double et al., 2018; Mitchum et al., 2016; Rivers et al., 2021; Soderstrom et al., 2015; Spellman & Bjork, 1992; Tekin & Roediger, 2020; Witherby & Tauber, 2017). Specifically, reactivity occurs when making metacognitive judgments while studying to-be-remembered information influences which or how much information is remembered (but this effect may be small and differ based on how memory is tested, see Myers et al., 2020). Thus, evaluating the memorability of information may change what is remembered, although it is unclear if learners are aware of how metacognitive judgments influence remembering and forgetting.

Reactivity is largely considered in terms of positive and negative reactivity whereby more or less information is remembered due to making metacognitive judgments, relative to a comparison condition that did not make a metacognitive judgment. For example, when studying word pairs, making metacognitive judgments during encoding often leads to positive reactivity when the word pairs are related (e.g., Janes et al., 2018; Soderstrom et al., 2015; Witherby & Tauber, 2017) whereas negative reactivity may occur when the words are unrelated. Previous work has demonstrated that both JOLs (e.g., Double et al., 2018) and judgments of forgetting (e.g., Li et al., 2021) can affect subsequent memory and several theoretical mechanisms have been proposed to account for this reactivity. However, rather than a single process, many mechanisms likely contribute to reactivity (see Janes et al., 2018; see also Myers et al., 2020). Specifically, reactivity likely occurs because making a judgment about information alters how we process information (i.e., metacognition modifying attention, see Castel et al., 2012), changes the learner’s goals of what or how much to remember (Mitchum et al., 2016), strengthens the cues used as the basis for predictions (Soderstrom et al., 2015), directs attention to information that would have been processed less or would not have been processed at all (Halamish & Undorf, 2022), increases the availability of item-specific information (Senkova & Otani, 2021), and/or engages deeper levels of processing (Craik & Lockhart, 1972; Tekin & Roediger, 2020) when learners evaluate the cues indicative of an item’s memorability (i.e., study time, location in the study phase, word frequency, etc.; see Soderstrom et al., 2015).

Predicting remembering and forgetting may require a similar evaluation of the qualities contributing to an item’s memorability and this behavior could enhance encoding by engaging deeper levels of processing. Accordingly, reactivity accounts predicated on metacognition modifying attention (Castel et al., 2012) or strengthening cues (Soderstrom et al., 2015; see also Halamish & Undorf 2022) predict that identifying information that is likely to be remembered may trigger effective encoding processes that enhance memory for information expected to be remembered. It also invites the intriguing prediction that identifying information likely to be forgotten results in unexpected remembering of this information. Specifically, information that is judged to be forgotten may be later remembered because of the act of evaluating it as likely to be forgotten (i.e., because of the metamemory judgment). In contrast, the changed goal hypothesis (Mitchum et al., 2016) would predict that lower JOLs signal a diminished goal of remembering the item, suggesting that identifying information as likely to be forgotten will diminish memory for this information.

The current study

We were interested in whether evaluating information as likely to be remembered or likely to be forgotten enhances memory for this information relative to information not subject to memory predictions. We presented participants with lists of words to remember for a later test and, on each list, participants identified words that they were confident that they would remember and words that they believed they were most likely to forget on a test. Since prior work illustrates that positive reactivity occurs for easier, related pairs but negative reactivity can sometimes occur for more difficult pairs (but sometimes reversed; see Ericsson & Simon 1980; Fox et al., 2011), in the present study, if participants’ predictions were accurate, then prior evidence suggests that the words predicted to be forgotten (the harder words, presumably) would be more poorly remembered (if there is no reactivity) but may also potentially show negative reactivity (Double et al., 2018). However, if memory for words predicted to be forgotten violates this prediction (i.e., these words are remembered similarly or better than words not given a prediction), this would illustrate a positive reactivity effect. As such, we expected words that participants indicated they were most likely to forget would be better recalled than words participants did not indicate as likely to be remembered or forgotten. This memory advantage for words strongly anticipated to be forgotten would suggest, paradoxically, that the act of judging information as likely to be forgotten may unexpectedly influence later memory for this information. Additionally, we expected participants to demonstrate elevated recall of words they indicated that they were most likely to remember relative to words they indicated that they were most likely to forget as well as words not given memory predictions.

In Experiment 1, participants studied (fixed study time in Experiment 1a and self-paced study time in Experiment 1b) lists of 16 words and selected two words that they were most likely to remember and two words they were most likely to forget. In Experiment 2, participants studied lists of 18 words and selected six words that they were most likely to remember and six words they were most likely to forget, leaving six words with no prediction (thus equating the number of words for each prediction type). In Experiment 3, participants studied lists of 16 words and were asked to select four words but were not given any instructions regarding how to make their selections. In Experiment 4, participants studied lists of words presented sequentially (in contrast to the simultaneous presentation in all other experiments) and make Likert scale judgments of remembering or forgetting for all words. In Experiment 5, participants studied a list of 20 words and were either asked to select 10 words that they would like to be tested on and will remember or select 10 words that they would not like to be tested on (meaning unselected words are the words they expect to remember). Finally, in Experiment 6, participants studied a list of 20 words and were asked to select 10 words that they expect to remember and 10 words that they expected to forget. In each experiment, we examined whether the act of identifying words as likely to be forgotten had the paradoxical effect of improving memory for those items.

Experiment 1a

In Experiment 1a, participants studied four lists of 16 words to remember for a later test. During the study phase, participants were asked to identify two words that were likely to be remembered and two words that were most likely to be forgotten. Following the 30-second study phase, participants completed an immediate free recall test for the studied words. We expected enhanced recall for words participants expected to remember as well as for the words participants predicted that they would forget (although the benefit for these words may be smaller) relative to the remaining words not selected as likely to be remembered or forgotten.

Method

Participants

After exclusions, participants were 48 undergraduate students (Mage = 19.92, SDage = 1.38) recruited from the University of California, Los Angeles (UCLA) Human Subjects Pool. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit if they cheated). This exclusion process resulted in one exclusion. Participants were also excluded if they selected more than two “remember” or more than two “forget” words. This exclusion process resulted in 12 exclusions. In each experiment, we aimed to collect approximately 50 participants per condition. The sample size was selected based on prior exploratory research and the expectation of detecting a medium effect size. Additionally, each participant was only allowed to participate in one experiment (i.e., all participants in each study were naïve).

Materials

The words used in this experiment were between 4 and 7 letters (M = 4.99, SD = 0.98) and on the log-transformed Hyperspace Analogue to Language (log-HAL) frequency scale (with lower values indicating lower frequency in the English language and higher values indicating higher frequency), ranged from 5.48 to 12.65 and averaged a score of 8.81 (SD = 1.57). Words were classified according to the English Lexicon Project website (Balota et al., 2007).

Procedure

Participants were presented with lists of words to remember for a later test with each list containing 16 words. Words were presented simultaneously in two columns with eight words in each column. On each list, participants studied the words for 30 s and were asked to underline (by clicking on the word once) two words that they were confident that they would remember and circle (by clicking on the word twice) two words that they thought that they were most likely to forget on the test (with underlining and circling counterbalanced between-subjects); if participants clicked on a word a third time, it was no longer underlined or circled. Following the study phase, participants completed a 1-minute immediate free recall test whereby they typed all the words that they could remember from the just-studied list into an on-screen text box. This was repeated for four study-test cyclesFootnote 1.

Results

Recall as function of participants’ predictions is shown in Fig. 1. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed an effect of predictions [Mauchly’s W = 0.72, p < .002; Huynh-Feldt corrected results: F(1.61, 72.64) = 52.02, p < .001, ηp2 = 0.54] such that words participants said they would remember (M = 0.79, SD = 0.24) were better recalled than words predicted to be forgotten (M = 0.52, SD = 0.27), [pholm < 0.001, d = 0.80] and words not predicted to be remembered or forgotten (M = 0.29, SD = 0.14), [pholm < 0.001, d = 1.50]. Critically, recall was better for words expected to be forgotten than words not selected to be remembered or forgotten [pholm < 0.001, d = 0.70]. Thus, people better remembered words that they judged as likely to be forgotten than not-selected words.

Fig. 1
figure 1

The proportion of words recalled as a function of participants’ predictions about remembering and forgetting in Experiment 1a. Error bars reflect the standard error of the mean

Next, we investigated the potential bases for participants’ judgments by examining whether participants used word frequency (which is associated with the likelihood of recalling an item, see Popov & Reder, in press, for a review) to inform their predictions (cf. Benjamin, 2003). Specifically, we used the log-HAL frequency score for words predicted to be remembered, words predicted to be forgotten, and words not predicted to be remembered or forgotten. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed an effect of predictions [F(2, 90) = 7.07, p = .001, ηp2 = 0.14] such that the words participants said they would forget were less frequent (M = 8.40, SD = 0.75) than words not predicted to be remembered or forgotten (M = 8.86, SD = 0.17), [pholm = 0.002, d = − 0.51] and the words predicted as to be remembered (M = 8.79, SD = 0.61), [pholm = 0.007, d = − 0.44]. However, the frequency for words predicted to be remembered was similar for words not given a prediction [pholm = 0.627, d = 0.07]. Thus, participants may have incorporated word frequency in their memorability decisions.

Finally, we also examined participants’ output by calculating the average output position of words participants indicated that they would remember, forget, or did not select as to be remembered or forgotten. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed an effect of predictions [F(2, 82) = 14.14, p < .001, ηp2 = 0.26] such that the words participants said they would remember (M = 2.82, SD = 1.10) were recalled earlier than words predicted to be forgotten (M = 4.02, SD = 1.86), [pholm < 0.001, d = 0.69] and words not given a prediction (M = 4.08, SD = 0.96), [pholm < 0.001, d = 0.73]; however, the average output position for words predicted to be forgotten was similar to that of words not given a prediction [pholm = 0.810, d = − 0.04]. Thus, participants generally recalled to-be-remembered words before other words.

Discussion

In Experiment 1a, participants demonstrated superior memory performance for words they predicted would be well remembered but also often remembered words they predicted as likely to be forgotten. Additionally, participants may have incorporated word frequency into their predictions. For example, highly frequent words (e.g., apple) are better recalled than less frequent words (e.g., aardvark; see Hall 1954; McDaniel & Bugg, 2008), and although participants often predicted low-frequency words as to be forgotten (Mendes & Undorf, 2021; Tullis & Benjamin, 2012), they were recalled more often than words not predicted to be remembered or forgotten. Thus, Experiment 1a suggests that predicting that something is most likely to be forgotten can, somewhat paradoxically, enhance its memorability.

Experiment 1b

In Experiment 1a, study time was fixed and participants may not have had sufficient time to study each word and evaluate its later memorability. As such, in Experiment 1b, we allowed participants to self-pace during the study phase to determine whether these findings replicated if study time was self-paced. We again expected words selected as most likely to be forgotten to be better remembered than words not given a prediction, suggesting that this observation is also present under self-paced learning conditions.

Method

Participants

After exclusions, participants were 62 undergraduate students (Mage = 20.84, SDage = 4.39) recruited from the UCLA Human Subjects Pool. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit if they cheated). This exclusion process resulted in two exclusions. Because the study phase was self-paced, we constrained the task so that participants could not advance to the recall test until exactly two words had been selected as to be remembered and two words had been selected as to be forgotten, removing this as an exclusion criterion in Experiment 1b.

Materials and Procedure

The procedure used in Experiment 1b was identical to that used in Experiment 1a except that study time on each list was self-paced rather than fixed.

Results

On average, participants spent 97.16 s studying each list (SD = 80.13). Recall as a function of participants’ predictions is shown in Fig. 2. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed a main effect of predictions [F(2, 122) = 54.54, p < .001, ηp2 = 0.47] such that words participants said they would remember (M = 0.90, SD = 0.19) were better recalled than words predicted as to be forgotten (M = 0.63, SD = 0.26), [pholm < 0.001, d = 0.85] and words not predicted to be remembered or forgotten (M = 0.49, SD = 0.28), [pholm < 0.001, d = 1.31]. Critically, recall was better for the words expected to be forgotten than words not selected as to be remembered or forgotten [pholm < 0.001, d = 0.45]. Thus, predicting that a word would be forgotten made it more memorable.

Fig. 2
figure 2

The proportion of words recalled as a function of participants’ predictions about remembering and forgetting in Experiment 1b. Error bars reflect the standard error of the mean

As in Experiment 1a, we examined word frequency for remember words, forget words, and words not predicted to be remembered or forgotten as an exploratory analysis. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed a main effect of predictions [F(2, 122) = 3.95, p = .022, ηp2 = 0.06] such that the words participants predicted that they would forget (M = 8.52, SD = 0.75) were less frequent than words not predicted to be remembered or forgotten (M = 8.84, SD = 0.22), [pholm = 0.025, d = − 0.34], but not less frequent than the words predicted to be remembered (M = 8.77, SD = 0.66), [pholm = 0.082, d = − 0.26]. Additionally, frequency for words predicted to be remembered was similar to that of words not given a prediction [pholm = 0.539, d = 0.08]. Thus, to some extent, participants may have incorporated word frequency into their decisions.

Finally, we examined participants’ output by calculating the average output position of words participants indicated they would remember, forget, or did not select as to be remembered or forgotten. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed a main effect of predictions [F(2, 118) = 54.38, p < .001, ηp2 = 0.48] such that the words participants said they would remember (M = 3.35, SD = 1.87) were recalled earlier than words predicted to be forgotten (M = 6.01, SD = 2.39), [pholm < 0.001, d = 1.22] and words not predicted to be remembered or forgotten (M = 5.76, SD = 1.73), [pholm < 0.001, d = 1.11]. However, the average output position for words predicted to be forgotten was similar for words not given a prediction [pholm = 0.388, d = 0.11]. Thus, participants typically recalled to-be-remembered words before other words.

Discussion

The trends observed in Experiment 1b were generally consistent with Experiment 1a. Specifically, participants better recalled words they predicted would be remembered but also demonstrated enhanced recall for words they predicted would be forgotten. However, in Experiment 1, participants’ selection of to-be-remembered and to-be-forgotten words was disproportionate relative to words not selected at all. Experiment 2 thus equated the number of words for each type of prediction.

Experiment 2

we note that experiment 2 was suggested by a reviewer and conducted after the other experiments reported in the present manuscript

In Experiment 1, because only two words were selected as to be remembered and two words were selected as to be forgotten, this left 12 words as the comparison (i.e., words with no prediction). Accordingly, in Experiment 2, we presented participants with 18 words on each list and ask them to select six words they were likely to remember and six words they were likely to forget, leaving six unselected words. Thus, we could compare recall under circumstances when words were allocated equally among remembered, to-be-forgotten, and unselected words, although this modified procedure also reduces the potential distinctiveness of selecting fewer words that would be most likely to be remembered and most likely to be forgotten.

Method

Participants

After exclusions, participants were 50 undergraduate students (Mage = 21.00, SDage = 3.33) recruited from the UCLA Human Subjects Pool. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit). This exclusion process resulted in one exclusion.

Materials and Procedure

The materials and procedures for Experiment 2 were similar to Experiment 1 with the following exceptions: (1) rather than 16 words, participants were presented with 18 words separated into two columns; (2) the recall test was self-paced, but participants were required to spend at least one minute on the test; (3) there was no recognition test following the fourth study-test cycle; and (4) rather than selecting the two words most likely to be remembered and the two words most likely to be forgotten, participants were required to select the six words most likely to be remembered and the six words most likely to be forgotten. As in Experiment 1b, participants were allowed to self-pace the study phase.

Results

On average, participants spent 72.03 s studying each list (SD = 13.37). Recall as a function of participants’ predictions is shown in Fig. 3. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed a main effect of predictions [F(2, 98) = 83.38, p < .001, ηp2 = 0.63] such that words participants said they would remember (M = 0.77, SD = 0.21) were better recalled than words predicted as to be forgotten (M = 0.38, SD = 0.25), [pholm < 0.001, d = 1.68] and words not predicted to be remembered or forgotten (M = 0.36, SD = 0.25), [pholm < 0.001, d = 1.75]. However, recall was similar for the words expected to be forgotten and words not selected as to be remembered or forgotten [pholm = 0.679, d = 0.06].

Fig. 3
figure 3

The proportion of words recalled as a function of participants’ predictions about remembering and forgetting in Experiment 2. Error bars reflect the standard error of the mean

To examine word frequency for remember words, forget words, and words not predicted as to be remembered or forgotten, we conducted a repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember). Results revealed a main effect of predictions [F(2, 98) = 6.92, p = .002, ηp2 = 0.12] such that the words participants predicted that they would forget (M = 8.63, SD = 0.35) were less frequent than words not predicted to be remembered or forgotten (M = 8.84, SD = 0.26), [pholm = 0.012, d = 0.63] and the words predicted to be remembered (M = 8.90, SD = 0.41), [pholm = 0.002, d = 0.78]. However, frequency for words predicted to be remembered was similar to those words not given a prediction [pholm = 0.492, d = 0.15]. A one-sample t-test indicated that words predicted as to-be-forgotten were recalled more frequently than 0 [t(49) = 10.76, p < .001, d = 1.52]; if predictions for these words had been accurate, recall should not be significantly different from 0.

Finally, we again examined the average output position of words participants indicated they would remember, forget, or did not select as to be remembered or forgotten. A repeated-measures ANOVA with 3 levels (predictions: forget, neither, remember) revealed a main effect of predictions [F(2, 92) = 44.08, p < .001, ηp2 = 0.49] such that the words participants said they would remember (M = 4.29, SD = 1.48) were recalled earlier than words predicted as to be forgotten (M = 6.60, SD = 2.20), [pholm < 0.001, d = 1.31] and words not predicted to be remembered or forgotten (M = 6.71, SD = 2.13), [pholm < 0.001, d = 1.24]. However, the average output position for words predicted to be forgotten was similar for words not given a prediction [pholm = 0.664, d = 0.07].

Discussion

Results from Experiment 2 indicated that participants were somewhat metacognitively accurate—the words they predicted would be remembered were recalled better than words they expected to forget as well as words not given a prediction. However, recall was similar for words participants expected to forget and words they did not make a prediction for—accurate metacognition would have been exemplified by recalling words they did not make a prediction for better than words they expected to forget. Because words predicted to be forgotten were similarly recalled as words not given a prediction, there may be some reactive benefit to making predictions of forgetting. Guided by word frequency, participants may attend to words that they decide are both memorable and likely to be forgotten. Moreover, the ensuing act of circling or underlining words may thus draw attention to those words, subsequently benefiting recall. We examined this account in Experiment 3.

Experiment 3

Participants in Experiment 1 demonstrated elevated memory for words they selected as most likely to be forgotten. In Experiment 2, participants demonstrated similar levels of recall for words predicted to be forgotten and those words not accorded any prediction. In both cases, participants’ level of recall belied what would be expected from a prediction that items would be largely forgotten. One possible explanation for the potential reactive benefit of selecting words as likely to be forgotten is that the act of word selection underlies memory. Prior research has shown that when participants make choices about when and what to learn, memory for the chosen information is often enhanced, leading to a “choice effect” (Coverdale & Nairne, 2019; DuBrow et al., 2019; Gureckis & Markant, 2012; Markant et al., 2014; Markant & Gureckis, 2014; Rotem-Turchinski et al., 2019). For example, letting participants select cues or targets during paired-associate learning can improve cued recall (e.g., Monty & Permuter 1975; Perlmuter et al., 1971; see also Watanabe & Soraci 2004) as does honoring participants’ choices about what to restudy (Kornell & Metcalfe, 2006). Additionally, allowing participants to make decisions regarding aspects of learning such as presentation order or duration can benefit memory (Markant et al., 2014; Murty et al., 2015, 2019; Voss et al., 2011). Accordingly, we investigated whether the choice effect may have contributed to memory benefits for “forget” items.

In Experiment 3, we again presented participants with lists of words to remember for later tests, with participants randomly assigned to study words at a fixed pace or for a duration of their choosing. In each condition, participants were asked to circle two of the words and underline two of the words (via mouse clicks)Footnote 3. However, participants were not provided instructions regarding why or how to select which words to underline and which words to circle. It may be that circling and underlining words does not impact memory for those words since participants were not asked to consider memorability. However, if selecting words (i.e., the act of choosing words without instructions) elevates attention and enhances encoding, then these processes may impact memory, suggesting that the choice effect may explain the memory benefits observed in Experiments 1 and 2. Thus, Experiment 3 permitted us to determine whether choice only conferred memory benefits in the absence of any direct instructions regarding predictions about what words would be later remembered or forgotten.

Method

Participants

After exclusions, participants were 102 undergraduate students (Mage = 20.58, SDage = 3.16) recruited from the UCLA Human Subjects Pool. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit). This exclusion process resulted in four exclusions. For participants self-pacing the study phase, participants were excluded if they circled more than two words or underlined more than two words. This process resulted in 10 exclusions.

Materials and Procedure

The task used in Experiment 3 was similar to that used in Experiment 1. However, rather than circling and underlining words to indicate whether they would be remembered or forgotten, participants were asked to circle two words and underline two words; they were not given any instructions regarding the criteria for which words to circle or underline. Additionally, study time on each list was either fixed (30 s; n = 47) or self-paced (n = 55).

Results

On average, participants self-pacing their study time spent 85.18 s studying each list (SD = 60.07). Recall for words participants clicked on (either circled or underlined) as a function of whether study time was fixed or self-paced is shown in Fig. 4. A 2 (action: clicked, not clicked) x 2 (study condition: fixed, self-paced) mixed-factor ANOVA revealed that words that were either circled or underlined (M = 0.78, SD = 0.20) were better recalled than words that were not clicked on (M = 0.42, SD = 0.24), [F(1, 99) = 252.66, p < .001, ηp2 = 0.72]. Additionally, participants self-pacing their study time (M = 0.57, SD = 0.24) recalled more words than participants with fixed study time (M = 0.41, SD = 0.11), [F(1, 99) = 13.13, p < .001, ηp2 = 0.12]. However, clicking did not interact with study condition [F(1, 99) = 2.20, p = .142, ηp2 = 0.02].

Fig. 4
figure 4

The proportion of words recalled for words participants clicked on (either circled or underlined) as a function of whether study time was fixed or self-paced in Experiment 3. Error bars reflect the standard error of the mean

A further analysis of memory for clicked words revealed that circled words (M = 0.81, SD = 0.23) were better recalled than underlined words (M = 0.75, SD = 0.25), [t(97) = 2.16, p = .033, d = 0.22], perhaps occurring because participants clicked an item twice to circle it but only once to underline it. Future work with a counterbalanced design would be necessary to test this conjuncture.

We again examined word frequency for words participants either circled or underlined compared with words participants did not click on. A 2 (action: clicked, not clicked) x 2 (study condition: fixed, self-paced) mixed-factor ANOVA revealed that words that were either circled or underlined (M = 8.78, SD = 0.55) were similarly frequent as words that were not clicked on (M = 8.84, SD = 0.24), [F(1, 99) = 0.62, p = .434, ηp2 = 0.01]. Additionally, word frequencies were similar for participants self-pacing their study time and participants with fixed study time, [F(1, 99) = 1.42, p = .237, ηp2 = 0.01]. Moreover, clicking did not interact with study condition [F(1, 99) = 0.44, p = .509, ηp2 < 0.01].

Finally, we examined participants’ output by calculating the average output position of words participants either circled or underlined or did not click on. A 2 (action: clicked, not clicked) x 2 (study condition: fixed, self-paced) mixed-factor ANOVA revealed that words that were either circled or underlined (M = 4.25, SD = 1.80) were recalled earlier than words that were not clicked on (M = 5.22, SD = 1.78), [F(1, 97) = 21.88, p < .001, ηp2 = 0.18]. Additionally, participants self-pacing their study time (M = 5.37, SD = 1.79) had a larger average output position (because they recalled more words) than participants with fixed study time (M = 4.06, SD = 0.88), [F(1, 97) = 19.92, p < .001, ηp2 = 0.17]. However, clicking did not interact with study condition [F(1, 97) = 3.33, p = .071, ηp2 = 0.03].

Discussion

In Experiment 3, participants better recalled words they selected (circled or underlined via mouse clicks) relative to the words that were not selected, consistent with the choice effect found in self-regulated learning contexts (Gureckis & Markant, 2012; Markant et al., 2014; Markant & Gureckis, 2014). Specifically, by selecting a word, participants’ attention was likely drawn toward that word, enhancing memorability. Thus, drawing attention to a subset of words during encoding via the selection process can confer a memory benefit. Accordingly, the selection process may influence memory in the absence of any direct instructions regarding predictions about what words would be later remembered or forgotten.

Cross-experiment comparison

An informal cross-experiment comparison revealed that words judged as likely to be forgotten in Experiment 1 were less likely to be remembered than words chosen somewhat randomly in Experiment 3. Specifically, comparisons of memory for the most likely to be remembered and forgotten words in Experiment 1 relative to words chosen somewhat randomly in Experiment 3 showed that, regardless of study schedule (fixed or self-paced), there was a benefit for the to-be-remembered items relative to those chosen without a metacognitive judgment [t(209) = 2.33, p = .021, d = 0.32] and a cost for the to-be-forgotten words relative to those chosen without a metacognitive judgment [t(207) = 6.06, p < .001, d = 0.84]. Thus, the metacognitive judgment regarding remembering provides a memory boost above a random choice, whereas the metacognitive judgment regarding forgetting is associated with a memory cost relative to random choice. This suggests that people have some sense of what is or is not memorable but making predictions about forgetting may be a countervailing force. For instance, people can pick out what is difficult to remember (diminishing memory) but the act of choosing is helpful (improving memory).

Experiment 4

in keeping with open science practices, we note that experiment 4 was suggested by the editor and was conducted after all of the original experiments were conducted

Experiment 3 suggested that making a choice about a word was positively associated with later memory of that word. However, we note that items not selected at all may have received no attention, making it difficult to isolate the act of choosing as a key causal factor in subsequent memory. Accordingly, in Experiment 4, we presented words one at a time with a fixed study time. After studying each word, two groups of participants were asked to rate on a Likert scale how likely they were to remember the item. For one group, there were seven options and the scale ranged from − 3 (most likely to forget) to 3 (most likely to remember); 0 was “not sure”. For another group, there were three options and the scale ranged from − 1 (most likely to forget) to 1 (most likely to remember); 0 was “not sure”. As a comparison, another group of participants did not provide any ratings but had an inter-stimulus interval between each word to equate study time for each item. Relative to the no-judgment group, we expected participants making memorability ratings to demonstrate better recall of both items they rated as likely to be remembered but also items likely to be forgotten (i.e., positive reactivity).

Method

Participants

After exclusions, participants were 161 undergraduate students (Mage = 20.94, SDage = 3.78) recruited from the UCLA Human Subjects Pool. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit). This exclusion process resulted in four exclusions.

Materials and Procedure

Participants were presented with lists containing 16 words to remember for a later test. Each word was presented for 5 s. Following the presentation of each word, some participants (n = 49) were asked how likely they were to remember the word on a 7-point Likert scale ranging from − 3 (most likely to forget) to 3 (most likely to remember); 0 was “not sure”. Another group of participants (n = 62) were asked how likely they were to remember the word on a 3-point Likert scale ranging from − 1 (most likely to forget) to 1 (most likely to remember); 0 was “not sure”. Participants were given 5 s to respond. Rather than making Likert ratings, another group of participants (n = 50) had a 5-second inter-stimulus interval to match the length of the study phase for each group. After studying all 16 words, participants completed a 30-second distraction task requiring them to rearrange the digits of several three-digit numbers in descending order (e.g., 123 would be rearranged to 321). Participants were given 3 s to view each of the 10 three-digit numbers and subsequently rearrange the digits. Following the distractor task, participants completed a self-paced free recall test. This was repeated for a total of four study-test cycles.

Results

We were interested in whether predicting words as likely to be forgotten would result in a memory boost for these words when presented sequentially. Thus, given the trends observed in Fig. 5 (which provides recall as a function of Likert predictions) we compare the ends of each scale (most likely to forget and most likely to remember) with each other and 0 (unsure). Because the two Likert conditions used different scales, we examine them separately.

Fig. 5
figure 5

The proportion of words recalled as a function of Likert ratings for the group that made judgments with seven response options ranging from − 3 to + 3 (Likert-7) and the group that made judgments with three response options ranging from − 1 to + 1 (Likert-3). The dashed horizontal line reflects the average proportion of words recalled for the control group not making any Likert ratings in Experiment 4. Error bars reflect the standard error of the mean

For the participants responding using a 7-point Likert scale, a repeated-measures ANOVA with 3 levels (most likely to forget, unsure, most likely to remember) revealed differences in recall between the predictions [F(2, 44) = 23.68, p < .001, ηp2 = 0.52] such that words given a -3 were less likely to be recalled than words given a 0 [pholm = 0.007, d = 0.80] and words given a + 3 [pholm < 0.001, d = 1.95]; additionally, words given a + 3 were recalled better than words given a 0 [pholm < 0.001, d = 1.15]. Compared to average recall in the control group (M = 0.54, SD = 0.28), a one-sample t-test indicates that words given a -3 were most poorly recalled (M = 0.28, SD = 0.29), [t(31) = -5.14, p < .001, d = − 0.91] and words given a + 3 were better recalled (M = 0.73, SD = 0.32), [t(31) = 3.37, p = .002, d = 0.60].

For the participants responding using a 3-point Likert scale, a repeated-measures ANOVA with 3 levels (most likely to forget, unsure, most likely to remember) revealed differences in recall between the predictions [F(2, 118) = 36.59, p < .001, ηp2 = 0.38] such that words given a -1 were less likely to be recalled than words given a + 1 [pholm < 0.001, d = 1.04] but not words given a 0 [pholm = 0.178, d = 0.18]; additionally, words given a + 1 were recalled better than words given a 0 [pholm < 0.001, d = 0.86]. Compared to average recall in the control group, a one-sample t-test indicates that words given a -1 were more poorly recalled (M = 0.39, SD = 0.31), [t(59) = -3.77, p < .001, d = − 0.49] and words given a + 1 were better recalled (M = 0.66, SD = 0.26), [t(62) = 3.89, p < .002, d = 0.49].

Discussion

In Experiment 4, we did not observe a memory benefit for items predicted to be forgotten, indicating that the effects observed in Experiment 1 with simultaneous presentation may not translate to sequential study conditions. Rather, participants appear to have used the Likert scale as a continuous predictor of memorability and generally used it relatively accurately. Specifically, words given negative ratings (thus predicted to be forgotten) were least likely to be remembered and words given positive ratings (thus predicted to be remembered) were best recalled. Thus, the present results are consistent with prior metamemory research such that people can anticipate, to some degree, what they will or will not forget (see Rhodes 2016 for a review).

In Experiment 4, we did not find evidence for reactivity for items judged as likely to be forgotten, but this may be due to participants now having to judge all items as opposed to selecting only the most/least likely to be remembered/forgotten. Additionally, the use of this type of design could be difficult to interpret as a different number of observations are present for each rating creating between-participants variability in the tendency to use the extreme values. It may be that if only a smaller number of items are selected and judged on a Likert scale in terms of memorability (as opposed to all items in a sequential manner as we did in the present experiment), this type of procedure may induce more distinctness that could lead to reactivity and memory benefits for items judged as most likely to be forgotten.

Experiment 5

In Experiment 5, rather than making selections for words most likely to be remembered and forgotten, we asked participants to make selections concerning only to-be-remembered or to-be-forgotten words. Specifically, we presented participants with 20 words and some participants were either asked to circle 10 words that they would like to be tested on (i.e., to-be-remembered or RRRR words); they could forget (FFFF) non-circled words. In contrast, another group of participants was asked to circle 10 words that they would not like to be tested on (to-be-forgotten or FFFF words); they needed to remember (RRRR) the non-circled words. That is, some participants selected words to remember and unselected words were subsequently categorized as to-be-forgotten words; other participants selected words to forget and unselected words were subsequently categorized as to-be-remembered words. Participants were tested on all words regardless of their selections. Thus, Experiment 5 employed a variant of a directed forgetting task (see Bäuml et al., 2020; Johnson, 1994; MacLeod, 1998; Sahakyan et al., 2013 for reviews) whereby half of the words were to-be-remembered and half were to-be-forgotten. However, rather than the experimenter dictating the cue to remember or forget each word, the learner made those decisions (i.e., participants self-cued directed forgetting rather than participants making subjective memorability predictions; some prior work has told participants that they should remember all words regardless of their predictions, see Li et al., 2021). We expected participants to best recall words regarded as to-be-remembered but that this effect would be enhanced when those words were circled rather than unselected (and thus dubbed to-be-remembered by default). However, when participants circled the words that they would not like to be tested on, we expected enhanced memory for those words compared to words rendered as to be forgotten because they were not selected.

Method

Participants

Participants were 124 undergraduate students (Mage = 20.75, SDage = 3.80) recruited from the UCLA Human Subjects Pool. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit). This exclusion process resulted in one exclusion.

Materials and Procedure

The task used in Experiment 5 was similar to that used in Experiment 1b. However, participants were presented with a single list of 20 words and, rather than circling two words and underlining two words, were either required to circle 10 words that they wanted to be tested on (RRRR; participants were told they can/should try to forget (FFFF) non-circled words) or to circle 10 words that they did not want to be tested on and should forget (FFFF; they needed to remember (RRRR) the non-circled words). Participants were given as much time as needed for this portion of the task. Participants then completed a 30-second distraction task that required them to rearrange the digits of several three-digit numbers in descending order (e.g., 123 would be rearranged to 321). Participants were given 3 s to view each of the 10 three-digit numbers and subsequently rearrange the digits. Following the distractor task, participants were given 2 min to recall all words, regardless of whether they had been identified as to-be-remembered or to-be-forgottenFootnote 5.

Results

On average, participants spent 83.53 s studying the words (SD = 50.41). In our analyses, we considered words participants did not select as the opposite cue as their circling judgments (i.e., if a participant was circling 10 words that they would like to be tested on, we scored the other 10 words as to-be-forgotten words; if a participant was circling 10 words that they would not like to be tested on, we scored the other 10 words as to-be-remembered words). Figure 6 shows recall performance as a function of whether participants selected to-be-remembered or to-be-forgotten words as well as the prediction (FFFF or RRRR).

Fig. 6
figure 6

The proportion of words recalled for words participants selected as to-be-remembered (with not selected words considered to-be-forgotten) or to-be-forgotten (with not selected words considered to-be-remembered) in Experiment 5. Error bars reflect the standard error of the mean

A 2 (selected words: remember, forget) x 2 (coded cue: remember, forget) mixed-factor ANOVA revealed that participants correctly recalled a greater proportion of words dubbed to-be-remembered (M = 0.72, SD = 0.24) than to-be-forgotten (M = 0.18, SD = 0.18), [F(1, 122) = 468.84, p < .001, ηp2 = 0.79]. However, there were no group differences such that those words participants were asked to select did not significantly influence recall [F(1, 122) = 0.43, p = .513, ηp2 < 0.01]; participants selecting the words to remember recalled a similar proportion of words (M = 0.44, SD = 0.13) as participants selecting the words to forget (M = 0.46, SD = 0.17). Critically, those words that were selected interacted with the coded cue [F(1, 122) = 28.47, p < .001, ηp2 = 0.19] such that more to-be-remembered words were recalled when they were selected [pholm = 0.002, d = 0.56] and more to-be-forgotten words were recalled when they were selected [pholm < 0.001, d = 0.74].

We again examined word frequency for words participants either circled as to-be-remembered or underlined as to-be-forgotten. A paired samples t-test revealed that words participants dubbed to-be-remembered were more frequent (M = 8.95, SD = 0.62) than words they dubbed to-be-forgotten (M = 8.65, SD = 0.55), [t(123) = 3.95, p < .001, d = 0.35].

Finally, we also examined the average output position of words participants either selected or rendered as to-be-remembered or to-be-forgotten. A paired samples t-test revealed that words participants dubbed to-be-remembered were recalled earlier (M = 4.61, SD = 1.17) than words designated as to-be-forgotten (M = 7.79, SD = 2.99), [t(84) = 12.37, p < .001, d = 1.34].

Discussion

In Experiment 5, to-be-remembered words were better recalled than to-be-forgotten words, but this effect was enhanced when participants circled the to-be-remembered words (rather than selecting the words to forget). Additionally, when circling the words to forget, rendering unselected words as to-be-remembered, memory for to-be-forgotten words was enhanced compared with a condition in which the to-be-forgotten words were not selected. Thus, selecting words benefits memory, even when selecting information to forget, consistent with prior work suggesting that selected items may be more memorable than unselected items (Coverdale et al., 2019; Cunningham et al., 2011). Accordingly, findings from the present experiments indicate that the act of selecting an item, even when that selection pertains to forgetting, may promote further processing of the selected items and enhance memory. However, we note that these experiments have only asked participants to identify a subset of items for selection in contrast to work in much of the metamemory literature that solicits judgments for every item. In Experiment 6, we examined whether the selection effect is also observed when each item was judged.

Experiment 6

In Experiment 5, participants selected a subset of the studied words as to-be-remembered or to-be-forgotten. In Experiment 6, we asked participants to make selections concerning all presented words to determine whether the current outcomes generalize to methods in which participants judge every item. Specifically, we presented participants with 20 words and asked them to circle 10 words that they would like to be tested on and underline 10 words that they did not want to be tested on. As a comparison, another group of participants circled or underlined all 20 words but were not given any further instructions.

Method

Participants

Participants in the experimental group were 64 undergraduate students (Mage = 20.31, SDage = 2.64) recruited from the UCLA Human Subjects Pool. We also recruited another group of participants from our university’s Human Subjects Pool (n = 66; Mage = 20.92, SDage = 2.55) to serve as a comparison. Participants were tested online and received course credit for their participation. Participants were excluded from analysis if they admitted to cheating (e.g., writing down answers) in a post-task questionnaire (they were told they would still receive credit if they cheated). This exclusion process resulted in no exclusions from the experimental group and two exclusions from the control group.

Materials and Procedure

The task used in Experiment 6 was similar to that used in Experiment 5. However, participants were presented with 20 words and, rather than selecting 10 words (as either to-be-remembered or to-be-forgotten), participants were required to circle 10 words and underline 10 words. Specifically, participants were asked to circle 10 words that they would like to be tested on (RRRR) and underline 10 words that they do not want to be tested on (FFFF)Footnote 6. As a comparison, another group of participants was asked to circle 10 words and underline 10 words but were given no further instructions. Participants were given as much time as needed for this task. Participants then completed the same 30 s distraction task as in Experiment 5. Following the distractor task, participants were given 2 min to recall all words, regardless of whether they had circled or underlined it, followed by another 30 s of the distractor task. Finally, participants completed the same surprise recognition test as in Experiment 5.

Results

On average, participants in the experimental group spent 125.65 s studying the list of words (SD = 169.61). To examine recall as a function of participants’ selections, we conducted a paired samples t-test. Results revealed that participants better recalled words that they circled as to-be-remembered (M = 0.71, SD = 0.25) than words they underlined as to-be-forgotten (M = 0.22, SD = 0.18), [t(63) = 12.91, p < .001, d = 1.61], (see Fig. 7). Moreover, compared to recall of the control group (who circled and underlined randomly; M = 0.49, SD = 0.30), one-sample t-tests revealed that recall for to-be-remembered words was enhanced [t(63) = 6.90, p < .001, d = 0.86] whereas recall for to-be-forgotten words was impaired [t(63) = -12.56, p < .001, d = -1.57].

Fig. 7
figure 7

The proportion of words recalled for words participants selected as to-be-remembered or to-be-forgotten in Experiment 6. The dashed horizontal line represents the average recall of participants in the control group (circling and underlining words randomly). Error bars reflect the standard error of the mean

Further analyses of word frequency revealed that words participants circled as to-be-remembered were more frequent (M = 9.01, SD = 0.56) than words they underlined as to-be-forgotten (M = 8.67, SD = 0.51), [t(63) = 3.12, p = .003, d = 0.39]. In addition, analyses of output position indicated that words participants circled as to-be-remembered were recalled earlier (M = 4.71, SD = 1.53) than words they underlined as to-be-forgotten (M = 7.46, SD = 3.11), [t(51) = 6.45, p < .001, d = 0.90].

Discussion

In Experiment 6, participants selected both words to remember and words to forget. Compared to a control group required to select all words with no instructions, participants better recalled words they circled as to be remembered than words they underlined as to be forgotten. However, participants still sometimes recalled words that they underlined as to-be-forgotten, albeit less often than if selected as to-be-remembered or not selected at all, indicating that both clicking on and processing information words influence memorability. Additionally, words circled as to-be-remembered were more frequent than words selected as to-be-forgotten and were also recalled earlier during the retrieval phase. Overall, Experiment 6 replicates our prior findings with participants making selections involving all presented words rather than just a subset of words.

General discussion

The current study examined whether explicitly identifying information that is likely to be forgotten can make that information more memorable. In Experiment 1, we presented participants with lists of 16 words and asked them to remember the words for a later test. We also asked participants to select two words they thought they would remember and two words they thought they would forget (by circling or underlining them by clicking on the word). Results revealed that items predicted to be remembered were better recalled than all other items. However, items identified as most likely to be forgotten were better recalled than items not identified in any form. Such elevated recall for words that participants indicated were likely to be forgotten illustrates a reactivity effect, as people often remembered information that they predicted they would forget.

In Experiment 2, we presented learners with lists of 18 words, and participants selected six words that they were most likely to remember and six words that they were most likely to forget (leaving six words unselected, thus equating the number of items in each prediction category). Results revealed that words selected as most likely to be remembered were best recalled and words selected as most likely to be forgotten were recalled at a similar rate as words not given a prediction. This may demonstrate some memory benefit for the words expected to be forgotten as accurate metacognition should result in the worst memory performance for words the learner expects to forget, although it may be that words not given any prediction are recalled at a lower rate than participants expected. Thus, because the recall benefit of predicting forgetting was reduced with a larger number of items being selected, this beneficial form of reactivity may depend on the number of items the learner evaluates.

In Experiment 1, when identifying just two words as most likely to be forgotten, these words were better recalled, but this benefit was not seen in Experiment 2 when six words were selected as most likely to be forgotten. As such, the memory benefits of predicting forgetting likely only occur when a learner looks for a very small number of items that they think they will forget, potentially making these items more distinctive. These findings suggest that the benefits of making predictions about a small subset of to-be-remembered information may result from the relative distinctiveness principle (see Surprenant & Neath 2009) or the von Restorff effect (Hunt, 1995; Wallace, 1965) whereby information that is distinct from competing information is better remembered. Thus, there may be a metacognitively induced distinctiveness/von Restorff effect when selecting what is least likely to be remembered.

If a few items are identified as forgettable, even if the material was not particularly distinctive, once a learner identifies the few items that are likely to be forgotten, these items may then become distinctive leading to a von Restoff type of effect associated with this type of processing. In contrast, when a learner is determining which one-third (a relatively large portion) of a set of to-be-remembered information is least likely to be remembered, this may not engage the same type of processing that promotes distinctiveness and better recall. Rather, it may be that when a learner identifies a smaller number of items, this process leads to some distinctiveness or unique processing for these items which then enhances recall for both the items expected to be remembered, but also unintentionally enhances memory for the items expected to be forgotten. Thus, there may be a boundary effect to the benefits of predicting forgetting such that increasing the number of items to be judged makes those items less distinctive and diminishes memory performance.

In Experiment 3, we examined a potential mechanism of the enhanced memory for selected words based on the choice effect found in self-regulating learning environments (Gureckis & Markant, 2012; Markant et al., 2014; Markant & Gureckis, 2014; Ruggeri et al., 2019). Specifically, we again presented participants with lists of words and asked them to circle and underline some of the words but did not give them any instructions regarding how to select words. Results revealed enhanced memory for words participants selected without any instructions regarding memorability. Thus, simply selecting a word (by circling or underlining it), even for no apparent reason, was sufficient to increase recall despite this being a form of shallow processing (see Craik & Lockhart 1972; see also Tekin & Roediger 2020). This could indicate that the increase in recall for selected words is the result of these words receiving more attention or that merely selecting items as memorable or not memorable in Experiments 1 and 2 may have encouraged further processing that enhanced memory.

We note that not-selected items may have received no attention, making it difficult to isolate the act of choosing as a key causal factor in subsequent reactivity. Accordingly, in Experiment 4, we controlled for study time by presenting words one at a time. After studying each word, participants were asked to make metamemory predictions on a Likert scale with one end of the scale indicating that the word was likely to be forgotten, the middle of the scale representing being unsure of whether the word would be remembered or forgotten, and the other end of the scale indicating that the word was likely to be remembered. We did not observe a memory benefit for items predicted to be forgotten, revealing another boundary condition to the memory benefits of predicting forgetting: when making predictions using a continuous predictor of memorability, people are generally accurate (consistent with prior metamemory research demonstrating that people can anticipate, to some degree, what they will or will not forget, see Rhodes 2016 for a review).

As further evidence against differences in attention/study time for each of the words accounting for the reactivity observed in the present studies, results demonstrated that participants incorporated intrinsic qualities of the words (i.e., frequency) into their metamemory decisions. Specifically, words that participants predicted would be remembered were higher frequency words while words that participants predicted would be forgotten were lower frequency words (consistent with the effect of frequency on memory, see Popov & Reder, in press). For participants to incorporate frequency into their predictions, they would need to evaluate all words in the set (if some words were ignored it would be unlikely to have observed differences in frequency as a function of participants’ predictions). Thus, the word frequency effect in participants’ predictions offers some evidence that the reactivity observed in the present study was not simply the result of differences in attention or study time.

In Experiment 5, rather than making selections regarding both remembering and forgetting, learners either selected the words to remember or the words to forget (and unselected words were considered to-be-remembered or to-be-forgotten based on which words participants were asked to select). Results again demonstrated enhanced memory for selected words regardless of whether they were to-be-remembered or to-be-forgotten. Finally, in Experiment 6, rather than selecting a subset of the words, participants selected all words. Compared to participants selecting all words without instructions, words selected as to-be-remembered were better recalled while words selected as to-be-forgotten were more poorly recalled, illustrating relatively accurate metacognition. However, comparing memory across Experiments 1 and 3, selecting words as likely to be remembered provided a memory advantage compared to selecting words without the metacognitive component, but selecting words as likely to be forgotten resulted in a memory cost compared to selecting words without the metacognitive component. Thus, the framing of the metacognitive component to the selection effect via remembering versus forgetting influenced the magnitude of this benefit (but see Li et al., 2021 who found similar reactivity effects for judgments of forgetting and judgments of learning).

Without instructions regarding memorability, choices made in Experiment 3 appeared unrelated to the characteristics of the items. However, results suggested that choices in the present experiments were generally associated with word frequency (see Benjamin 2003; Mendes & Undorf, 2021; Tullis & Benjamin, 2012). Specifically, participants were most likely to select lower-frequency words as candidates to be forgotten relative to words they selected as likely to be remembered or did not select, a decision that aligns with the comparatively lesser chance of producing that word on a free recall test relative to higher-frequency words. Surprisingly, selecting these low-frequency words that participants deemed likely to be forgotten was associated with elevated levels of recall compared with words that were not selected, likely because of the distinctiveness or additional processing that was directed to these words due to their choice. One possibility is that the choice effect manifested in the present study and boosted “remember” words, which were already more likely to be recalled than “forget” words (in terms of frequency). However, the choice effect also boosted memory for the “forget” items, but the lower frequency of these words may explain why the boost did not result in recall exceeding performance for the “remember” items.

Collectively, these data are consistent with prior work suggesting that making memory judgments may influence memory (reactivity) but provide several novel insights. Most notably, making either remembering or forgetting predictions rendered the selected word memorable, consistent with the richness of encoding account (Craik & Tulving, 1975; Hunt, 2003; Hunt & Smith, 1996; Hunt & Worthen, 2006; Moscovitch & Craik, 1976; Watkins, 1978; Watkins & Watkins, 1975) whereby improved memory results from people generating ideas about a given word which may increase the available retrieval cues. The present results are also consistent with accounts of reactivity contingent on “metacognition modifying attention” (Castel et al., 2012) or increasing attention to cues (Soderstrom et al., 2015). Specifically, the process of selecting certain items may have modified participants’ attention and what cues they attended to (see Halamish & Undorf 2022). Here, clicking on a word likely resulted in additional processing, leading to better recall of these words, although this benefit was greatest when predicting remembering compared with forgetting. However, in contrast with the changed goal hypothesis (Mitchum et al., 2016), although participants incorporated item difficulty by using word frequency to guide their selections, this did not yield negative reactivity for the more difficult words. This may further indicate that the reactivity observed in the present study is largely driven by the selection of words more than the act of making metacognitive predictions.

Again, the present study demonstrated that the act of selecting a word and not selecting others enhances the recall of selected items. As such, it may be that the act of selection, rather than metacognitive monitoring, is causing reactivity. This account is consistent with the results of Experiment 4 such that when participants were asked to monitor their learning using a Likert scale, greater confidence that a given item would be remembered generally corresponded to better recall for those items but when participants indicated that they expected to forget a word, there was not enhanced memory for these items; these items were poorly recalled. Thus, the present study indicates that learners are generally accurate when monitoring their learning, and monitoring learning can enhance memory (positive reactivity) for items expected to be remembered but not for items expected to be forgotten. Moreover, when participants are asked to select certain items, this process can result in positive reactivity regardless of the reason for selecting the item.

The present study is also consistent with prior work demonstrating that people remember words better if they say “Yes” to an orienting question than if they say “No” (see Roediger et al., 2002; see also Roediger & Gallo 2002) or make other decisions regarding presented information such as the choice effect found in self-regulating learning contexts (Gureckis & Markant, 2012; Markant et al., 2014; Markant & Gureckis, 2014; Ruggeri et al., 2019). Thus, how participants answer a question about a word can influence the processing of the word, and in the current study, selecting a word as likely to be remembered or forgotten enhanced memory (similar to ironic effects in memory, see Wegner 1994). This was present even when participants were not asked to predict memorability, suggesting that the act of selecting leads to a choice effect, enhancing later memory for the selected information. As such, any additional processing given to a word, whether occurring from metamemory predictions or simply clicking on it, may enhance memory, but only if this processing is a result of identifying information that people feel will be remembered. Future work may benefit from using eye-tracking (i.e., measures of fixation time, the number of eye fixations for each word, and/or participants’ pattern of saccadic eye movements between the two columns of words) and other more precise measures to better examine how participants’ attention is allocated and which words are processed.

The enhanced memory we observed for items learners selected as likely to be remembered but also as likely to be forgotten is consistent with Koriat’s (1997) cue-utilization framework. Specifically, participants used intrinsic cues (e.g., word frequency) to make metacognitive judgments, but may not consider how processing selected words (a potential extrinsic cue) can influence memory, sometimes in unintentional ways. In the present work, participants may focus on the intrinsic properties, such as the word frequency of the items they are processing, but may not incorporate other aspects (e.g., how memory will be tested, the consequences of processing items that are thought to be not well remembered). Future research is needed to better determine how extrinsic and mnemonic cues could influence how people remember items that they deem they are unlikely to recall.

In sum, memory was enhanced for words participants selected as likely to be remembered but also for words participants indicated were most likely to be forgotten, relative to words not given a prediction. Indeed, simply selecting a word enhanced memorability, indicating that the observed reactivity occurred as a result of these words being selected and becoming more distinct and/or receiving additional processing. We found novel evidence that when people identify a small amount of information that they think they will later forget, this process enhances memory relative to information that is not selected as likely to be forgotten. Thus, behaviors that draw attention to words or result in additional processing can enhance memory, even if the metacognitive behavior involved identifying it as likely to be forgotten.