Twenty-two undergraduates at the University of Oregon completed the experiment for course credit. All participants gave informed consent according to procedures approved by the University of Oregon institutional review board.
Stimuli were generated in MATLAB using Psychophysics Toolbox extension (Brainard, 1997; Pelli, 1997) and were presented on a 17-in. flat CRT computer screen (60-HZ refresh rate). The viewing distance was ~80 cm. Stimuli were 9.2° × 9.2° of visual angle.
Four hundred nameable pictures (e.g., animals, plants, shapes, countries, U.S. states, and symbols) were obtained via a web search for royalty free clip art. One of 360 continuous colors was assigned to each image, with different color/shape sets for each subject.
Task and procedure
The 400 stimuli were presented in two successive runs, each containing 200 distinct shape/color associations. Each run was comprised of two parts: a learning period and a delayed-retrieval period. During the learning period, images were presented serially in blocks of 10 items, followed either by retrieval practice, during which all 10 colors were recalled or by the start of the next block of 10 items (Fig. 1); thus, subjects did not know during encoding whether or not they would be immediately tested. Images were tested in a random order without feedback.
After viewing all 200 images with retrieval practice for half of the items in the run (~20-30 minutes), subjects were asked to recall the color of each image by clicking on a color wheel that represented all of the presented colors. Images were tested in a random order relative to their initial presentation. Participants received feedback consisting of the presentation of the shape filled with the correct color and a number denoting the magnitude of the error.
During recall, a white shape cue was displayed for 1 second before the cursor and color wheel appeared (Fig. 1B). During response selection, the color of the shape cue shifted continuously to match the hue that was indicated by the mouse cursor on the color wheel. Participants indicated their color choice by clicking the mouse. Responses were unspeeded and accuracy was given highest priority; subjects were instructed to choose a response even if they felt they were guessing. When they thought they were guessing, they were instructed to click with the right mouse button rather than the left. The color wheel was randomly rotated across trials (so that position information was irrelevant to the color response). Following completion of the first run of 200 images, the remaining 200 images were presented and tested using the same procedure (i.e., a learning period and delayed-retrieval period) with 200 new images. One image was presented twice during the learning period of run one and was dropped from the delayed analyses.
Response error was measured as the number of degrees between the presented color and the reported color. Errors ranged from 0° (perfect response) to ±180° (a maximally imprecise response). Responses were centered on 0° but spanned the entire range of responses (for example, see Fig. 2A). These error histograms are well described as a mixture of two distributions that reflect guesses and correct responses (Zhang & Luck, 2008). On some trials, subjects do not remember the color associated with the shape cue and guess randomly with respect to the target color. This results in a uniform distribution of responses with respect to the target color. On other trials, participants remember the color of the shape cue and provide responses centered on the correct color value but with some degree of error. This distribution is well described by a von Mises distribution (the circular analogue of a Gaussian distribution because the tested color space was circular) centered on the correct response. To obtain an estimate of these two distributions, response errors were fit using Markov Chain Monte Carlo (MCMC) as employed by the “memfit” function of Memtoolbox (Suchow, Brady, Fougnie, and Alvarez, 2013). MCMC repeatedly samples parameter values in proportion to how well they describe the data and the prior (in this case an uninformative Jeffreys prior) to obtain a Maximum a Posteriori (MAP) estimate of three parameters: P
is the probability that subjects could retrieve nonzero target information, operationalized as the inverse of the height of the uniform distribution (i.e., 1 – proportion of guesses). SD is the standard deviation of the von Mises distribution (with larger values reflecting reduced precision). Mu (μ), the mean of the von Mises distribution, reflects systematic bias in the error distribution (preferred clockwise or anti-clockwise responses on the color wheel).Footnote 1
These parameters are calculated using the distribution of all responses, which is a mixture of responses not guided by memory (guesses) and responses guided by memory. Thus, we can determine the proportion of remembered items and the precision of responses guided by memory, but it is not possible to determine if any individual response was guided by memory.
All participants’ responses were combined into an aggregate error histogram (Fig. 2A) and fit using the “memfit” function of Memtoolbox (Suchow et al. 2013) to obtain parameter estimates and 95% credibility intervals (CrI); there is a 95% chance that the true value of the parameter for the sample lies between the credibility intervals. We will refer to parameters with overlapping credibility intervals as “not significantly different” and parameters with nonoverlapping credibility intervals as “significantly different.” Unlike confidence intervals, Bayesian credibility intervals are not necessarily symmetrical.
The mixture modeling analysis revealed that 70.7% (CrI: −1.7%, +2.0%) of the items were recalled during the initial test. SD—our operational definition of mnemonic precision—was 21.4° (CrI: −0.8°, +1.1°). At delayed test, subjects recalled significantly more items that they had previously retrieved (53.8%, CrI: −1.9%, +2.3%) than items that that were previously untested (37.9%, CrI: −2.2%, +2.8%; Fig. 2). Mnemonic precision was not significantly different between tested (22.9°, CrI: −1.0°, +1.5°) and untested (24.2°, CrI: −1.6°, +2.6°) items.
We were interested in examining the data at the individual subject level, but simulations showed that there would be consistent biases in the precision estimates if the probability of retrieval was too low. We determined this by generating artificial data that presumed varying Pmem values and SD values equal to those observed in our aggregate data (20°). Parameter estimates were obtained from these artificial datasets by sampling 100 times from each dataset and then fitting each sample with a mixture model. These simulations revealed that SD is systematically overestimated when the proportion of successfully retrieved items was less than 40% (Fig. 3A). By contrast, the Pmem parameter is relatively accurate even when probability of retrieval is low. Thus, to avoid misleading estimates of SD, we compared individual parameter estimates of precision only for subjects who successfully retrieved at least 40% of the items in both the tested and untested conditions. Further simulations confirmed that estimates of the SD parameter would not be affected by high guess rates in the aggregate data because of the large number of trials run across all subjects (>4,000 trials per condition). Thus, in the aggregate analysis, accurate Pmem and SD estimates could be obtained even when probability of retrieval was low (Fig. 3B).
Individual parameter comparisons (Delayed Test)
Analysis of the subset of subjects who successfully retrieved 40% or more items in both conditions (n = 12) also showed higher Pmem for retrieved items (M = 69.2%, SD = 13.4%) compared with untested items (M = 53.2%, SD = 9.9%, t(11) = −6.03; p < 0.001). Also in line with the aggregate data, subjects did not exhibit superior mnemonic precision for items that they had previously retrieved (M = 24.0, SD = 5.3) compared with items that were not retrieved (M = 23.7, SD = 5.7, t(11) = −0.22 p = 0.83).
Experiment 1 suggests that retrieval practice increases the probability that an item can be retrieved in the future but does not improve the precision of that memory. In Experiment 2, we equated the number of times that participants saw and responded to each item by comparing the retrieval practice condition with a restudy condition (Carrier & Pashler, 1992).