Are judgments of learning made after correct responses during retrieval practice sensitive to lag and criterion level effects?

Pyc, Mary A.; Rawson, Katherine A.

doi:10.3758/s13421-012-0200-x

Are judgments of learning made after correct responses during retrieval practice sensitive to lag and criterion level effects?

Published: 08 March 2012

Volume 40, pages 976–988, (2012)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Are judgments of learning made after correct responses during retrieval practice sensitive to lag and criterion level effects?

Download PDF

Mary A. Pyc¹ &
Katherine A. Rawson²

1903 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Although successful retrieval practice is beneficial for memory, various factors (e.g., lag and criterion level) moderate this benefit. Accordingly, the efficacy of retrieval practice depends on how students use retrieval practice during learning, which in turn depends on accurate metacognitive monitoring. The present experiments evaluated the extent to which judgments of learning (JOLs) made after correct responses are sensitive to factors (i.e., lag and criterion level) that moderate retrieval practice effects, as well as which cues influence JOLs under these conditions. Participants completed retrieval practice for word pairs with either short or long lags between practice trials until items were correctly recalled 1, 3, 6, or 9 times. After the criterion trial for an item, participants judged the likelihood of recalling that item on the final test 1 week later. JOLs showed correct directional sensitivity to criterion level, with both final test performance and JOLs increasing as criterion level increased. However, JOLs showed incorrect directional sensitivity to lag, with greater performance but lower JOLs for longer versus shorter lags. Additionally, results indicated that retrieval fluency and metacognitive beliefs about criterion level—but not lag—influenced JOLs.

Effects of successive relearning on recall: Does relearning override the effects of initial learning criterion?

Article 30 March 2016

Why do learners ignore expected feedback in making metacognitive decisions about retrieval practice?

Article 26 March 2021

Metacognitive judgments can potentiate new learning: The role of covert retrieval

Article Open access 08 June 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A wealth of research has shown that practice involving retrieval of target information from memory (i.e., retrieval practice) is beneficial for subsequent retention (for reviews, see Rawson & Dunlosky, 2011; Roediger & Butler, 2011). Of course, the effectiveness of retrieval practice depends on a number of factors. For example, although failed retrieval attempts may show modest memorial benefits (e.g., Kornell, Hays, & Bjork, 2009), retrieval practice is particularly efficacious when retrieval attempts during encoding are successful (e.g., Karpicke & Roediger, 2007; Pyc & Rawson, 2007, 2011). Furthermore, the memorial benefits of successful retrievals depend critically on the quantity and timing of those successful retrievals (Pyc & Rawson, 2009).

Although retrieval practice has been shown to yield large improvements in memory under appropriate experimentally devised conditions, in many learning situations (e.g., a student studying for an exam), the scheduling of retrieval practice is largely in the hands of the learner. Thus, the efficacy of retrieval practice for enhancing learning can only be as good as individuals’ self-regulated use of retrieval practice. Therefore, it is important to understand the extent to which individuals’ judgments of learning are sensitive to factors that influence the efficacy of retrieval practice. Accordingly, the present research examined the extent to which individuals’ judgments are sensitive to the quantity and timing of successful retrievals during practice.

Below, we provide a brief review of the particular retrieval practice effects that are relevant for the present experiments. We then describe components of self-regulated learning, with particular emphasis on metacognitive monitoring, the component of greatest interest here. Finally, we report two experiments evaluating the sensitivity of judgments of learning to factors that influence the efficacy of successful retrieval practice.

Efficacy of retrieval practice

Many studies have established that retrieval practice is beneficial for memory. Retrieving information from memory during practice promotes memory to a greater extent than do other strategies, such as restudying (e.g., Cull, 2000; Karpicke & Roediger, 2007, 2008). Important for present purposes, previous research has shown that the quantity and timing of practice influences the memorial benefits of retrieval practice.

Concerning the quantity of practice, research has shown greater memorial benefits when individuals engage in more versus less retrieval during practice (e.g., Allen, Mahler, & Estes, 1969; Wheeler & Roediger, 1992). Concerning the timing of practice, a wealth of previous research has demonstrated greater memorial benefits when items are practiced with a longer versus shorter lag between practice trials with items (e.g., Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006; Cull, 2000; Landauer & Bjork, 1978; Pashler, Zarrow, & Triplett, 2003; Pyc & Rawson, 2009). However, almost all of this previous research has manipulated the quantity and timing of trials during practice. In contrast, the present research involved manipulating the quantity and timing of correct retrievals during practice. When students self-regulate their own learning using retrieval practice, they presumably do not (and should not) simply engage in a fixed number of practice trials for each item. Rather, students should self-test until they can correctly recall items multiple times during encoding (e.g., Pyc & Rawson, 2009).

What influence does the quantity and timing of correct retrievals have on final test performance? Recent research has shown greater memorial benefits for items correctly retrieved more versus fewer times during practice and for items that are correctly retrieved after longer versus shorter lags during retrieval practice (Pyc & Rawson, 2009). Pyc and Rawson (2009) presented participants with foreign language paired associates for an initial study trial and then test–restudy practice trials until items reached a preassigned criterion level of performance (1, 3, 5, 6, 7, 8, or 10 correct retrievals) during practice. Items were practiced with either a short or a long lag between practice trials. After a delay, participants completed a final cued recall test for all items. Across two experiments, performance increased as the number of correct retrievals during practice increased (see also Nelson, Leonesio, Shimamura, Landwehr, & Narens, 1982; Vaughn & Rawson, 2011). Additionally, performance was higher for items with a longer lag versus shorter lag between correct retrievals during practice. Thus, the benefits of successful retrievals depend critically on the quantity and timing of those successful retrievals.

Theories of self-regulated learning and metacognitive monitoring

Although researchers have identified various retrieval practice schedules that are particularly beneficial for memory (i.e., schedules with multiple correct retrievals that take place after long lags), the impact of successful retrieval practice for promoting learning hinges critically on individuals’ using the most effective retrieval practice schedules when self-regulating their study. Self-regulated learning includes two central components, monitoring and control (e.g., Greene & Azevedo, 2007; Nelson & Narens, 1990; Winne & Hadwin, 1998). Monitoring involves evaluating how well information has been learned and/or the likelihood that information will be remembered in the future. Control involves decisions about what to study, when to study, and how to study. The primary assumption of models of self-regulated learning is that monitoring informs control decisions, which in turn influence learning (e.g., Ariel, Dunlosky, & Bailey, 2009; Dunlosky & Metcalfe, 2009; Nelson & Narens, 1990; Winne & Hadwin, 1998). Consistent with this basic assumption, research has shown that more accurate versus less accurate monitoring during study leads to higher levels of test performance (e.g., Dunlosky & Rawson, in press; Rawson, O’Neil, & Dunlosky, 2011; Thiede, 1999; Thiede, Anderson, & Therriault, 2003).

Because monitoring accuracy is critically important for effective control and later test performance, we focus on this aspect of self-regulated learning in the present experiments. To examine the extent to which individuals accurately monitor their learning during retrieval practice, we evaluated the extent to which judgments of learning (JOLs) made after correct retrievals are sensitive to factors (i.e., lag and criterion level) that moderate the effects of successful retrieval.

What factors influence JOLs? Koriat’s (1997) cue-utilization framework states that JOLs are inferential, in that individuals do not have direct access to their own memory states and, thus, must use heuristics to assess the likelihood of being able to later recall information. That is, JOLs are not based on an evaluation of the memory strength of an item but, instead, are based on one or more cues that individuals use to infer the state of their memory.

What types of cues are used to make JOLs? According to the cue-utilization framework, three classes of cues can influence JOLs: intrinsic, extrinsic, and mnemonic. Intrinsic cues are based on characteristics inherent to items, which may make them easier or more difficult to learn (e.g., abstract vs. concrete). Extrinsic cues are based on learning conditions (e.g., number of trials) or the encoding task an individual engages in (e.g., interactive imagery). Mnemonic cues are based on aspects of an individual’s own subjective experiences during task performance (e.g., retrieval fluency), which may provide the individual with information that is predictive of how well an item has been learned, as well as the likelihood that the item will be recalled at a later time. To foreshadow, extrinsic and mnemonic cues are of greatest interest here.

Sensitivity of JOLs to effects of correct retrievals

With the goal of the present research in mind (i.e., to evaluate the sensitivity of JOLs to the quantity and timing of successful retrievals during practice), to what extent can previous research provide information about the kinds of cues that learners use to make JOLs after correct retrievals?

A wealth of previous research has evaluated the sensitivity of JOLs to the quantity and timing of practice, but these previous studies are different in important ways from the present research. For example, previous research has shown greater JOL accuracy as the quantity of practice increases (e.g., Mazzoni, Cornoldi, & Marchitelli, 1990; Meeter & Nelson, 2003; Zechmeister & Shaughnessy, 1980). However, much of this previous research has involved study trials only, rather than retrieval practice. Furthermore, prior research involving retrieval practice has manipulated the number of practice trials, rather than manipulating the number of correct retrievals.

Likewise, previous research has examined JOL accuracy as a function of timing of practice. JOL magnitudes are often greater with less versus more time between practice trials, whereas performance is usually lower with less versus more time between practice trials (e.g., Kornell, 2009; Zechmeister & Shaughnessy, 1980). However, the available research either has again involved only study trials or has manipulated the timing of practice trials rather than the timing of correct retrievals. Furthermore, much of the work showing JOL magnitude differences as a function of timing has compared massed versus spaced practice (i.e., no spacing vs. some spacing between practice trials with items), rather than short versus long lags.

Why are these differences important? First, given that previous research has shown differences in JOL accuracy for study versus retrieval practice (e.g., Karpicke, 2009; Kornell & Son, 2009; Mazzoni & Nelson, 1995, Experiment 2; Shaughnessy & Zechmeister, 1992), the sensitivity of JOLs to effects of the quantity and timing of practice in previous studies involving study trials only may differ from the sensitivity of JOLs to these factors under conditions involving retrieval practice (e.g., because the mnemonic cue of retrieval fluency is available under conditions of retrieval practice, but not under conditions of study only). Second, implementing a fixed number of practice trials for each item yields differences in learning status for various items. That is, some items may be correctly recalled during practice, whereas others may not be correctly recalled. Because retrieval status (i.e., correct vs. incorrect) is a powerful cue for making judgments (Nelson & Dunlosky 1991), differences in retrieval status for individual items exerts a strong influence on JOLs made during practice with a fixed number of trials. In contrast, when all items are learned to a given criterion, individuals cannot use retrieval status as a cue for making judgments. Third, a similar logic applies to studies manipulating the lag between trials, rather than the lag between correct retrievals, in that retrieval status will differ as a function of lag in the former case, but not in the latter. In sum, the sensitivity of JOLs to the quantity and timing of correct retrievals may differ from patterns observed in previous research to the extent that the available cues differ for conditions of criterion versus noncriterion learning.

Importantly, here we are interested in students’ judgments of learning when all items are successfully retrieved, for reasons described above. However, to our knowledge, only one prior study has examined JOLs during criterion learning (i.e., when all items are practiced until correctly recalled). Karpicke (2009) reported that JOLs were greater for items that were correctly recalled three versus one time during practice.^{Footnote 1} No prior research has evaluated the relationship between lag and JOLs when items are learned to a criterion level of performance, nor has prior research examined JOLs when both lag and criterion level are manipulated.

However, on the basis of the kinds of cues that Koriat’s (1997) cue-utilization framework assumes people use when making JOLs, we outline a number of possible outcomes. On the basis of the definition provided by the cue-utilization framework, criterion level is an extrinsic cue. If individuals have accurate beliefs regarding criterion level, JOLs will increase as criterion level increases. Of course, even if participants have accurate beliefs, it is possible that they may not use these beliefs when making JOLs (e.g., Koriat, Bjork, Sheffer, & Bar, 2004), so one might not see a relationship between criterion level and JOL. It could also be the case that individuals do not have any beliefs about criterion level, in which case JOLs will not differ for various criterion levels. (We do not consider the highly implausible possibility that individuals would believe that an increase in criterion level would lead to a decrease in memory.)

Although criterion level is an extrinsic cue, it also influences the mnemonic cue of retrieval fluency. For example, metacognitive research has shown that in various tasks, JOLs increase as response latencies decrease (e.g., Benjamin, Bjork, & Schwartz, 1998). Importantly, previous research on retrieval practice has shown that retrieval latencies decrease as the number of correct retrievals during practice increases (e.g., Pyc & Rawson, 2009). Therefore, if JOLs during retrieval practice are based on the mnemonic cue of retrieval fluency, JOLs are predicted to increase as criterion level increases.

Lag is also an extrinsic cue by definition. If individuals have accurate beliefs about lag, JOLs will be higher for items that are correctly retrieved after longer versus shorter lags. Again, even if participants have accurate beliefs, it does not ensure that they will use these beliefs when making JOLs (Koriat et al., 2004), in which case JOLs may not differ for longer versus shorter lags. JOLs also may not be related to lag if individuals do not have any beliefs about the effects of lag. Finally, if individuals have inaccurate beliefs about lag (and incorporate those beliefs when making JOLs), JOLs will be higher for shorter versus longer lags.

The extrinsic cue of lag also influences the mnemonic cue of retrieval fluency. Previous research has shown that retrieval latencies during retrieval practice are lower for items retrieved after shorter versus longer lags (e.g., Pyc & Rawson, 2009). If JOLs during retrieval practice are based on the mnemonic cue of retrieval fluency, JOLs will be higher for items that are correctly retrieved after shorter versus longer lags.

The present experiments were designed to evaluate two questions. First, are JOLs sensitive to the effects of criterion level and/or the lag between correct retrievals on final test performance? Second, what cues are used to make JOLs for criterion level and lag? In two experiments, participants learned foreign language paired associates via retrieval practice with restudy until items reached an assigned criterion level of performance (one, three, six, or nine correct retrievals). Items were practiced with either a short lag or a long lag between trials. After the last correct retrieval for each item, participants predicted the likelihood of retrieving that item on the final test. If JOLs are based on the extrinsic cue of criterion level and/or on the mnemonic cue of retrieval fluency, JOLs will increase as criterion level increases. For lag, several outcomes are plausible, depending on the extent to which the extrinsic cue of lag complements or competes with the mnemonic cue of retrieval fluency.

Experiment 1

Method

Participants and design

Forty-one Kent State University undergraduates participated in return for course credit. Criterion level (one, three, six, or nine correct retrievals during practice) was a within-participants manipulation. Lag (short vs. long) was a between-participants manipulation, with 22 and 19 participants in each group, respectively.

Materials

Items included 48 Swahili–English translation word pairs previously normed for item difficulty (Nelson & Dunlosky, 1994). Twelve word pairs were assigned to each of four lists, with an equivalent range of item difficulty in each list. Within each list, three items were randomly assigned to each criterion level (randomized anew for each participant).

Procedure

All task instructions and items were presented via computer. All items first received an initial study trial, followed by blocks of test–restudy practice trials until items reached their assigned criterion level of performance. For initial study trials, the cue (Swahili word) and target (English translation) appeared on the computer screen for 10 s. For test trials, the cue appeared on the computer screen, and participants had 8 s to type the correct target answer. If an item was retrieved before 8 s had elapsed, participants could press a key to submit their response. Items that were not correctly retrieved received a 4-s restudy trial with the cue and target before participants moved on to the next to-be-learned item. Items that were correctly retrieved did not receive a restudy trial before participants moved on to the next item.

The computer tracked the number of times each item was correctly retrieved during practice. Items continued to receive test–restudy practice trials until they reached their assigned criterion level of performance (one, three, six, or nine correct retrievals). After items reached their criterion level of performance, they were dropped from further test–restudy practice. If an item had not reached its criterion level of performance on a given trial, it was placed at the end of the list of to-be-learned items. Participants were not aware of the specific criterion level for each item but were told that items would be practiced until they reached an “acceptable level of performance.”

For the short-lag group, the 12 items from one list were each presented for an initial study trial. After all items in the list had an initial study trial, items received test–restudy practice trials until they were correctly retrieved to their predetermined criterion level. When all items in one list had been practiced to criterion, items from a second list were presented for initial study and test–restudy practice trials, and so on until items from each of the four lists had been learned. Order of list presentation was counterbalanced across participants.

For the long-lag group, the four lists of 12 items were combined into one list. All items were presented for an initial study trial. After initial study, items received test–restudy practice trials until items were correctly retrieved to their predetermined criterion level.

Immediately after a given item was correctly recalled to its criterion level of performance (i.e., one, three, six, or nine correct retrievals), participants made a JOL for that item. For the JOL trial, participants were asked the following: “For the item you just saw, how likely do you think it is that you will be able to correctly recall the ENGLISH translation when you are shown the SWAHILI word on the final test 7 days from now?” Participants were asked to type in a response, using any number from 0 to 100 (in which 0 = 0% likelihood of recalling in 7 days and 100 = 100% likelihood of correctly recalling in 7 days). Thus, participants made 48 JOLs, one for each item immediately after the item reached its criterion level of performance during practice.

During the second session 1 week later, participants completed a computer -administered self-paced cued recall final test for all 48 word pairs.

Results and discussion

Final test performance

The mean percentage of items correctly recalled on the final test as a function of criterion level and lag is presented in Fig. 1. Results of a 2 (lag) × 4 (criterion level) mixed factor analysis of variance (ANOVA) showed a significant main effect of criterion level, with final test performance significantly increasing as the number of correct retrievals during practice increased, F(3, 117) = 41.67, MSE = .02, p < .001. The main effect of lag was also significant, with final test performance significantly higher in the long-lag group than in the short-lag group, F(1, 39) = 37.74, MSE = .07, p < .001. The interaction was also significant, indicating a greater difference in performance for the lag groups as criterion level increased, F(3, 117) = 6.47, MSE = .02, p < .001.

Judgments of learning

As was expected on the basis of findings from prior research, higher criterion levels and longer lags between correct retrievals improved final test performance. More important for present purposes, to what extent were JOLs sensitive to the effects of criterion level and lag on final test performance? Mean JOL values at each criterion level for each lag group are presented in Fig. 2. Results of a 2 (lag) × 4 (criterion level) mixed factor ANOVA showed a significant main effect of criterion level, with mean JOL values increasing as the number of correct retrievals during practice increased, F(3, 117) = 44.16, MSE = 162.78, p < .001. Thus, JOLs show correct directional sensitivity to the effects of criterion level on final test performance.

In contrast, the main effect of lag was not significant, F(1, 39) = 2.26, MSE = 2,942.23, p = .141. JOLs did not accurately reflect the effects of lag on final test performance. In fact, the numerical trend was in the opposite direction (t-tests showed a significant difference between short-lag and long-lag JOLs for criterion level 1, t(39) = 2.61, p = .01, as well as a trend for criterion level 3, t(39) = 1.81, p = .08). The interaction term was not significant, F(3, 117) = 2.12, MSE = 162.78, p = .102. Thus, although performance differences between lag groups increased as criterion level increased, JOL differences did not show this same pattern.

In sum, JOLs showed correct directional sensitivity to the effects of criterion level but did not show correct directional sensitivity to the effects of lag between correct retrievals. To what extent did the mnemonic cue of retrieval fluency influence JOLs? To measure retrieval fluency, we examined first keypress latency for all correct retrieval trials in session 1. First keypress latency was defined as the amount of time between onset of the Swahili cue and a participant’s first keypress in the response box. For each participant, we calculated the mean first keypress latency for the nth correct retrieval during practice, with n = 1–9 correct retrievals across criterion level conditions. To provide the most stable estimates of first keypress latency, we collapsed across criterion level for this analyses (e.g., all 48 items were correctly recalled once and thus contributed to this mean, the 36 items assigned to criterion levels 3–9 were each correctly recalled a second and third time and thus contributed to these means, and so on; outcomes were highly similar when analyses were conducted only on the basis of items assigned to criterion 9). Figure 3 shows mean first keypress latency (in seconds) as a function of the nth correct retrieval during practice. Results of a 2 (lag) × 9 (nth correct retrieval) mixed factor ANOVA revealed a significant main effect of lag, with shorter latencies for the short-lag group than for the long-lag group, F(1, 39) = 8.56, MSE = .50, p = .006. The main effect of nth correct retrieval during practice was also significant, with latencies decreasing as the number of correct retrievals during practice increased, F(8, 312) = 213.57, MSE = .04, p < .001. The interaction was also significant, F(8, 312) = 7.83, MSE = .04, p < .001.

These results support the possibility that the mnemonic cue of retrieval fluency influenced JOLs during criterion learning. However, at least for criterion level, the extrinsic cue may also have influenced JOLs. Given that both mnemonic and extrinsic cues may influence JOLs, we examined the extent to which criterion level and retrieval fluency uniquely influence JOLs by conducting a series of hierarchical linear models (HLMs).^{Footnote 2} We also examined the extent to which two other cues may have influenced JOLs. Specifically, we included the intrinsic cue of normative item difficulty (from Nelson & Dunlosky, 1994) and the mnemonic cue of number trials involving retrieval failure prior to the first correct recall during practice for each item. The first model assessed the relationship between criterion level and JOLs. Results showed that JOLs significantly increased as criterion level increased, t(1926) = 7.20, p < .001. The second model assessed the relationship between retrieval fluency (first keypress latency) and JOLs. Results showed that JOLs significantly increased as first keypress latencies decreased, t(1926) = 7.03, p < .001. The third and fourth models assessed the relationship between normative item difficulty and JOLs and between number of retrieval failures and JOLs, respectively. Results showed no significant relationship between either of these variables and JOLs, ps > .05.

Given the significant relationships between criterion level and JOLs and retrieval fluency and JOLs, the fifth model examined the extent to which each of these variables influenced JOLs when the other variable was controlled for. Results showed that both criterion level and first keypress latency were significantly related to JOLs, t(1925) = 6.52, p < .001, and t(1925) = 2.16, p = .03, respectively. Taken together, these analyses suggest that both the factors of criterion level and retrieval fluency influenced JOLs during retrieval practice.

Experiment 2

Results demonstrated that JOLs show correct directional sensitivity to the effects of criterion level on final test performance: Both final test performance and JOLs increased as criterion level increased. Furthermore, HLM analyses indicated a relationship between the extrinsic cue of criterion level and JOLs above and beyond the influence of criterion level on the mnemonic cue of retrieval fluency. Presumably, the extrinsic cue reflects a metacognitive belief about the effects of criterion level on final test performance. However, Karpicke (2009) reported results suggesting that learners may not have appropriate metacognitive beliefs regarding criterion level. Of interest here, after items were learned to criterion during practice, participants were asked to make aggregate judgments, in which they judged the number of items they would remember on a final test 1 week later. Results showed that aggregate judgments did not differ for a group of participants who terminated practice after one correct recall versus participants who completed two additional practice trials, suggesting that participants may not understand the memorial benefits of increasing criterion levels. Thus, one goal of Experiment 2 was to provide further evidence that participants have correct metacognitive beliefs about the effects of criterion level on final test performance.

In contrast to the criterion level results, JOLs did not show correct directional sensitivity to the effects of lag between correct retrievals on final test performance. Final test performance was higher for the long-lag versus short-lag group, whereas JOLs did not statistically differ (and were even numerically lower) for the long-lag versus short-lag group. The design of Experiment 1 precluded us from examining the relationship between lag and JOLs using HLM analyses, as we did for criterion level, because lag was a between-participants manipulation. Therefore, in Experiment 2, lag was manipulated within subjects. Additionally, to further diagnose why JOLs did not show correct directional sensitivity to the effects of lag on final test performance, Experiment 2 evaluated metacognitive beliefs about the effects of lag. One possibility is that participants have correct metacognitive beliefs about the effects of lag on final test performance, but the salient mnemonic cue of retrieval fluency overrides the extrinsic cue of lag. Another possibility is that participants do not have beliefs or have incorrect beliefs about the effects of lag. To measure participants’ metacognitive beliefs about the effects of criterion level and lag on final test performance, in addition to making item-specific JOLs, participants in Experiment 2 also made aggregate judgments. In contrast to item-specific JOLs, aggregate judgments are global predictions about performance, in which participants make overall judgments about the number of items within each level of lag and criterion they believed they will later recall.

The results of Experiment 1 are consistent with the idea that participants have correct beliefs about the effects of criterion level on final test performance, and thus we predicted that aggregate judgments would be greater for higher versus lower criterion levels. In contrast, the pattern of results for lag will be more revealing because a number of outcomes are plausible. If participants have correct beliefs about the effects of lag on final test performance, aggregate judgments will be greater for longer versus shorter lags. If participants do not have beliefs about the effects of lag on final test performance, aggregate judgments will not differ for longer versus shorter lags. Finally, if participants have incorrect beliefs about the effects of lag on final test performance, aggregate judgments will be greater for shorter versus longer lags.