Retrieval practice enhances the accessibility but not the quality of memory

Sutterer, David W.; Awh, Edward

doi:10.3758/s13423-015-0937-x

Retrieval practice enhances the accessibility but not the quality of memory

Brief Report
Published: 24 September 2015

Volume 23, pages 831–841, (2016)
Cite this article

Download PDF

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Retrieval practice enhances the accessibility but not the quality of memory

Download PDF

David W. Sutterer^1,2 &
Edward Awh^1,2,3

5064 Accesses
41 Citations
10 Altmetric
Explore all metrics

Abstract

Numerous studies have demonstrated that retrieval from long-term memory (LTM) can enhance subsequent memory performance, a phenomenon labeled the retrieval practice effect. However, the almost exclusive reliance on categorical stimuli in this literature leaves open a basic question about the nature of this improvement in memory performance. It has not yet been determined whether retrieval practice improves the probability of successful memory retrieval or the quality of the retrieved representation. To answer this question, we conducted three experiments using a mixture modeling approach (Zhang & Luck, 2008) that provides a measure of both the probability of recall and the quality of the recalled memories. Subjects attempted to memorize the color of 400 unique shapes. After every 10 images were presented, subjects either recalled the last 10 colors (the retrieval practice condition) by clicking on a color wheel with each shape as a retrieval cue or they participated in a control condition that involved no further presentations (Experiment 1) or restudy of the 10 shape/color associations (Experiments 2 and 3). Performance in a subsequent delayed recall test revealed a robust retrieval practice effect. Subjects recalled a significantly higher proportion of items that they had previously retrieved relative to items that were untested or that they had restudied. Interestingly, retrieval practice did not elicit any improvement in the precision of the retrieved memories. The same empirical pattern also was observed following delays of greater than 24 hours. Thus, retrieval practice increases the probability of successful memory retrieval but does not improve memory quality.

Recognition memory decisions made with short- and long-term retrieval

Article 09 February 2024

The diminishing precision of memory for time

Article 06 August 2021

Testing the primary and convergent retrieval model of recall: Recall practice produces faster recall success but also faster recall failure

Article 08 February 2019

Introduction

Numerous studies have demonstrated that retrieval from long-term memory (LTM) can enhance subsequent memory performance, a phenomenon labeled the retrieval practice effect (Carrier & Pashler, 1992). The benefits of retrieval practice have been observed with a wide variety of memoranda (Roediger & Karpicke, 2006), including word pairs (Pyc & Rawson, 2009), pictures (Wheeler & Roediger, 1992), and spatial positions (Carpenter & Pashler 2007; Rohrer, Taylor, and Sholar, 2010; Carpenter & Kelly, 2012).

Varying explanations have been offered for how retrieval practice enhances memory performance. Some have focused on increased elaborative retrieval during testing (Carpenter, 2009), whereas others have emphasized the narrowing of the retrieval search space via helpful contextual associations (Lehman, Smith, and Karpicke, 2014). One common assumption of these accounts is that retrieval practice enhances the probability of access to a memory rather than the quality of the memory. This focus on accessibility over fidelity may be attributable in part to the fact that past studies have typically used discrete word or picture stimuli (and all-or-none measures of accuracy) that do not allow clear measurements of memory fidelity. That said, some past findings may be consistent with a putative effect of retrieval practice on memory quality. For example, Chan and McDermott (2007) found that retrieval practice improved participants’ ability to avoid semantically similar lures during a recognition test and improved source memory. Likewise, Szpunar, McDermott, and Roediger (2008) found that testing improves list discrimination. However, while each of these findings could reflect a more precise memory (e.g., of specific semantic content, or of the temporal context associated with an item), the binary nature of the responses in these studies also allows for an interpretation based on retrieval probability.

An approach that may provide more traction for understanding the effect of retrieval practice on the quality of item-specific memory is to allow participants to report remembered information along a continuous response space. For example, Carpenter and Kelly (2012) used a continuous response space in a task where subjects recalled the precise positions of different objects. Retrieval practice resulted in a decrease in the average response error for retrieved locations relative to restudied locations. However, although a change in memory quality provides an intuitive explanation of these findings, a reduced guessing rate in the retrieval practice condition also would yield lower average response errors. Thus, the goal of the present work was to examine the retrieval practice effect using an analytic approach that can estimate both the probability of retrieval and the quality of the retrieved representations.

We measured performance in a shape/color recall task in which the possible colors were drawn from a continuous 360-degree space, and we used a mixture-modeling approach (Zhang & Luck, 2008) that provided separate measures of the probability of recall and the quality of the retrieved memories. This analytic approach has been widely applied to the field of working memory (see Luck & Vogel, 2013 for review), and has recently been applied to the study of LTM (Brady et al., 2013). To anticipate our conclusions, retrieval practice elicited robust improvements in the probability of memory access, but absolutely no improvement in the fidelity of the retrieved memories.

Experiment 1: Test versus no test

Method

Participants

Twenty-two undergraduates at the University of Oregon completed the experiment for course credit. All participants gave informed consent according to procedures approved by the University of Oregon institutional review board.

Apparatus

Stimuli were generated in MATLAB using Psychophysics Toolbox extension (Brainard, 1997; Pelli, 1997) and were presented on a 17-in. flat CRT computer screen (60-HZ refresh rate). The viewing distance was ~80 cm. Stimuli were 9.2° × 9.2° of visual angle.

Stimuli

Four hundred nameable pictures (e.g., animals, plants, shapes, countries, U.S. states, and symbols) were obtained via a web search for royalty free clip art. One of 360 continuous colors was assigned to each image, with different color/shape sets for each subject.

Task and procedure

The 400 stimuli were presented in two successive runs, each containing 200 distinct shape/color associations. Each run was comprised of two parts: a learning period and a delayed-retrieval period. During the learning period, images were presented serially in blocks of 10 items, followed either by retrieval practice, during which all 10 colors were recalled or by the start of the next block of 10 items (Fig. 1); thus, subjects did not know during encoding whether or not they would be immediately tested. Images were tested in a random order without feedback.

After viewing all 200 images with retrieval practice for half of the items in the run (~20-30 minutes), subjects were asked to recall the color of each image by clicking on a color wheel that represented all of the presented colors. Images were tested in a random order relative to their initial presentation. Participants received feedback consisting of the presentation of the shape filled with the correct color and a number denoting the magnitude of the error.

During recall, a white shape cue was displayed for 1 second before the cursor and color wheel appeared (Fig. 1B). During response selection, the color of the shape cue shifted continuously to match the hue that was indicated by the mouse cursor on the color wheel. Participants indicated their color choice by clicking the mouse. Responses were unspeeded and accuracy was given highest priority; subjects were instructed to choose a response even if they felt they were guessing. When they thought they were guessing, they were instructed to click with the right mouse button rather than the left. The color wheel was randomly rotated across trials (so that position information was irrelevant to the color response). Following completion of the first run of 200 images, the remaining 200 images were presented and tested using the same procedure (i.e., a learning period and delayed-retrieval period) with 200 new images. One image was presented twice during the learning period of run one and was dropped from the delayed analyses.

Data analysis

Response error was measured as the number of degrees between the presented color and the reported color. Errors ranged from 0° (perfect response) to ±180° (a maximally imprecise response). Responses were centered on 0° but spanned the entire range of responses (for example, see Fig. 2A). These error histograms are well described as a mixture of two distributions that reflect guesses and correct responses (Zhang & Luck, 2008). On some trials, subjects do not remember the color associated with the shape cue and guess randomly with respect to the target color. This results in a uniform distribution of responses with respect to the target color. On other trials, participants remember the color of the shape cue and provide responses centered on the correct color value but with some degree of error. This distribution is well described by a von Mises distribution (the circular analogue of a Gaussian distribution because the tested color space was circular) centered on the correct response. To obtain an estimate of these two distributions, response errors were fit using Markov Chain Monte Carlo (MCMC) as employed by the “memfit” function of Memtoolbox (Suchow, Brady, Fougnie, and Alvarez, 2013). MCMC repeatedly samples parameter values in proportion to how well they describe the data and the prior (in this case an uninformative Jeffreys prior) to obtain a Maximum a Posteriori (MAP) estimate of three parameters: P _mem is the probability that subjects could retrieve nonzero target information, operationalized as the inverse of the height of the uniform distribution (i.e., 1 – proportion of guesses). SD is the standard deviation of the von Mises distribution (with larger values reflecting reduced precision). Mu (μ), the mean of the von Mises distribution, reflects systematic bias in the error distribution (preferred clockwise or anti-clockwise responses on the color wheel).^{Footnote 1}

These parameters are calculated using the distribution of all responses, which is a mixture of responses not guided by memory (guesses) and responses guided by memory. Thus, we can determine the proportion of remembered items and the precision of responses guided by memory, but it is not possible to determine if any individual response was guided by memory.

Results

Aggregate data

All participants’ responses were combined into an aggregate error histogram (Fig. 2A) and fit using the “memfit” function of Memtoolbox (Suchow et al. 2013) to obtain parameter estimates and 95% credibility intervals (CrI); there is a 95% chance that the true value of the parameter for the sample lies between the credibility intervals. We will refer to parameters with overlapping credibility intervals as “not significantly different” and parameters with nonoverlapping credibility intervals as “significantly different.” Unlike confidence intervals, Bayesian credibility intervals are not necessarily symmetrical.

The mixture modeling analysis revealed that 70.7% (CrI: −1.7%, +2.0%) of the items were recalled during the initial test. SD—our operational definition of mnemonic precision—was 21.4° (CrI: −0.8°, +1.1°). At delayed test, subjects recalled significantly more items that they had previously retrieved (53.8%, CrI: −1.9%, +2.3%) than items that that were previously untested (37.9%, CrI: −2.2%, +2.8%; Fig. 2). Mnemonic precision was not significantly different between tested (22.9°, CrI: −1.0°, +1.5°) and untested (24.2°, CrI: −1.6°, +2.6°) items.

Simulations

We were interested in examining the data at the individual subject level, but simulations showed that there would be consistent biases in the precision estimates if the probability of retrieval was too low. We determined this by generating artificial data that presumed varying P_mem values and SD values equal to those observed in our aggregate data (20°). Parameter estimates were obtained from these artificial datasets by sampling 100 times from each dataset and then fitting each sample with a mixture model. These simulations revealed that SD is systematically overestimated when the proportion of successfully retrieved items was less than 40% (Fig. 3A). By contrast, the P_mem parameter is relatively accurate even when probability of retrieval is low. Thus, to avoid misleading estimates of SD, we compared individual parameter estimates of precision only for subjects who successfully retrieved at least 40% of the items in both the tested and untested conditions. Further simulations confirmed that estimates of the SD parameter would not be affected by high guess rates in the aggregate data because of the large number of trials run across all subjects (>4,000 trials per condition). Thus, in the aggregate analysis, accurate P_mem and SD estimates could be obtained even when probability of retrieval was low (Fig. 3B).

Individual parameter comparisons (Delayed Test)

Analysis of the subset of subjects who successfully retrieved 40% or more items in both conditions (n = 12) also showed higher P_mem for retrieved items (M = 69.2%, SD = 13.4%) compared with untested items (M = 53.2%, SD = 9.9%, t(11) = −6.03; p < 0.001). Also in line with the aggregate data, subjects did not exhibit superior mnemonic precision for items that they had previously retrieved (M = 24.0, SD = 5.3) compared with items that were not retrieved (M = 23.7, SD = 5.7, t(11) = −0.22 p = 0.83).

Discussion

Experiment 1 suggests that retrieval practice increases the probability that an item can be retrieved in the future but does not improve the precision of that memory. In Experiment 2, we equated the number of times that participants saw and responded to each item by comparing the retrieval practice condition with a restudy condition (Carrier & Pashler, 1992).