Proactive interference (PI) occurs when memory processing at a given point in time disrupts memory processing at a future time (Baddeley, 1990; Wright, Urcuioli, & Sands, 1986). For example, processing a particular stimulus on a given trial of a memory task can interfere with processing on a subsequent trial that uses the same stimulus. Early studies in human verbal memory led investigators to conclude that PI is a predominant cause of mnemonic failure in laboratory experiments, as well as in everyday memory usage (Keppel & Underwood, 1962; Underwood, 1957).

PI appears to be pervasive in human memory, having been observed in a variety of visual (Badre & Wagner, 2005; Hartshorne, 2008; Makovski & Jiang, 2008; Mecklinger, Weber, Gunter, & Engle, 2003), motor (Burwitz, 1974; Cothros, Köhler, Dickie, Mirsattari, & Gribble, 2006; Herman & Bailey, 1970), verbal (Feredoes, Tononi, & Postle, 2006; Kane & Engle, 2000; Keppel & Underwood, 1962; Postle, Berger, Goldstein, Curtis, & D’Esposito, 2001), and auditory (Ruusuvirta, 2000; Ruusuvirta, Astikainen, & Wikgren, 2002; Ruusuvirta, Wikgren, & Astikainen, 2008; Visscher, Kahana, & Sekuler, 2009) memory tasks. Similarly, the influence of PI has been widely reported in studies of memory processing in animals, including pigeons (Grant, 1975; Hogan, Edwards, & Zentall, 1981; Wright, Katz, & Ma, 2012), rats (De Rosa & Hasselmo, 2000; Dunnett & Martel, 1990; Gleitman & Jung, 1963; Grant, 1981), monkeys (Mishkin & Delacour, 1975; Overman & Doty, 1980; Wright, 2006, 2007), chimpanzees (Hayes & Thompson, 1953), chickadees (Hampton, Shettleworth, & Westwood, 1998), and dolphins (Herman, 1975; Thompson & Herman, 1981). With the exception of the studies in dolphins, nearly all studies of PI in animals have focused on visual short-term memory (see the further discussion below).

One of the traditional approaches to assessing PI in nonhuman primates has been varying the degree to which stimuli are reused from trial to trial in a memory task. The typical finding is that memory capabilities improve when new stimuli are used for each trial, because confusion arising from stimulus repetitions between trials is reduced. For instance, Hayes and Thompson (1953) found that chimpanzees committed fewer errors on a delayed-response task if new stimuli were used for each trial than if a single pair of stimuli were alternately used as the sample and the comparison throughout the experiment. Similarly, Mishkin and Delacour (1975) observed that monkeys require relatively few sessions to learn visual memory tasks if trial-unique stimuli are used, whereas they require significantly more sessions, or fail to learn, if only two memoranda are repeatedly presented throughout a session. Using a similar visual task, Overman and Doty (1980) found that the maximum retention interval in monkeys increased from under 30 s when two stimuli were repeatedly reused to over 24 h when trial-unique stimuli were used. The positive relationship between performance and the number of stimuli used in a session (i.e., stimulus set size) has been consistently reported in several additional studies of visual memory in monkeys (Mason & Wilson, 1974; Medin, 1980; Sands & Wright, 1980; Worsham, 1975).

An additional, more direct method of examining the influence of PI is to evaluate performance on trial n as a function of the stimuli presented on previous trials (intertrial PI). A number of studies have demonstrated that performance on trial n can be significantly altered by memory processing on the immediately preceding trial n – 1 (Edhouse & White, 1988; Grant, 1975; Hogan et al., 1981; Makovski & Jiang, 2008; Moise, 1976; Reynolds & Medin, 1981; Thompson & Herman, 1981; Worsham, 1975). These studies are consistent with decreased overall performance being associated with smaller stimulus set sizes, because the frequency with which the same stimuli are used for both trials n and n – 1 increases when smaller stimulus sets are used.

Relatively few studies have examined the effect of stimulus repetition beyond nonadjacent trials—that is, when the same stimuli are used for trials n and n – 2 or trials n and nK. One such experiment reported that visuospatial memory in rats was influenced by PI produced by trial n – 1, but not by more distant, nonadjacent trials (Dunnett & Martel, 1990). By contrast, Hartshorne (2008) reported that human visual memory was susceptible to PI caused by stimulus repetitions across at least four trials (see also Monsell, 1978). Similarly, Wright et al. (2012) recently reported that significant PI in pigeon visual memory was produced by stimulus repetitions separated by as many as 16 trials, which was the longest distance tested. Finally, recent data from our laboratory have indicated that stimulus repetitions separated by up to ten trials can produce significant PI in auditory memory in monkeys (Bigelow & Poremba, 2012). These studies highlight the potentially perseverative nature of PI by showing that repeating a small number of stimuli throughout a session can negatively impact performance beyond the immediately subsequent trial.

In addition to reducing the number of intertrial stimulus repetitions, an additional means whereby PI can be reduced is increasing the time that elapses between trials, or the intertrial interval (ITI). Increases in overall accuracy resulting from increasing the ITI have been demonstrated in a variety of memory tasks in humans and nonhuman animals (Cermak, 1970; Cohen, Reid, & Chew, 1994; Herman, 1975; Maki, Moe, & Bierley, 1977; Mason & Wilson, 1974; Roberts, 1980; Roberts & Kraemer, 1982). In some cases, the benefits of increasing the ITI have been relatively modest. For instance, Jarrard and Moise (1971) reported that monkeys’ visual short-term memory accuracy improved by approximately 5 % to 10 % (depending on the retention interval) after increasing the ITI from 5 to 15 s, but no significant advantage was gained by further extending the ITI to 30 or 60 s. In other cases, increasing the ITI has led to more substantial benefits, to the extent that the influence of intertrial PI has been reduced to zero. In Dunnett and Martel’s (1990) study of rat visuospatial memory, PI effects from trial n – 1 were eliminated by increasing the ITI from 5 to 15 s. Similarly, two studies of pigeon visual memory reported significant intertrial PI effects when the ITI was 2 s, but not when it was 10 s or greater (Grant, 1975; Hogan et al., 1981). These results suggest that increasing the time between trials allows greater decay of irrelevant memory traces, which might otherwise compete with memory demands of the current trial.

With few exceptions, studies of PI in animals, including monkeys, have been concerned with the visual sensory modality. Because auditory perception and memory are crucial for key aspects of nonhuman primate ethology such as predator evasion and conspecific communication (e.g., Ghazanfar & Hauser, 2001), the relatively sparse auditory memory literature, including that related to PI, constitutes a significant deficit in scientific understanding. One likely reason for the lack of experimental data in the auditory modality in monkeys is that, unlike with visual memory tasks, monkeys require extensive training to learn auditory memory tasks (Cohen, Russ, & Gifford, 2005; Colombo & D’Amato, 1986; D’Amato & Colombo, 1985; Fritz, Mishkin, & Saunders, 2005; Kojima, 1985; Scott, Mishkin, & Yin, 2012). For instance, Fritz et al. reported that monkeys required ~15,000 trials to learn an auditory memory task, whereas Mishkin and Delacour (1975) reported that only ~500 trials were needed to learn a comparable visual memory task. A related finding is that the maximum reported retention interval for auditory memory in monkeys ranges from 16 to 35 s (Colombo & D’Amato, 1986; Fritz et al., 2005; Kojima, 1985), whereas for visual memory it ranges from minutes to hours, or more (Murray & Mishkin, 1998; Overman & Doty, 1980). Monkeys are also capable of retaining tactile information for at least several minutes (Bauer & Steele, 1985; Buffalo et al., 1999; Suzuki, Zola-Morgan, Squire, & Amaral, 1993), suggesting that auditory memory tasks may be uniquely challenging.

Because an investigation of the basic parameters of auditory PI in monkeys is lacking, we conducted two experiments to address the effects of stimulus set size (Exp. 1) and of the duration of the ITI (Exp. 2) in an auditory short-term memory task in monkeys. The experiments were designed in such a way that the results would be roughly comparable to those of previous studies of memory in monkeys using the visual modality. As with visual PI, we hypothesized that the influence of auditory PI would diminish as a function of the stimulus set size and the duration of the ITI.

Experiment 1

Method

Subjects

Two adult male macaque monkeys (Macaca mulatta) served as subjects for this experiment (monkeys F.R. and S.E.). The monkeys had been trained to perform the auditory short-term memory task prior to the experiment. The animals were housed in individual cages with ad libitum access to water and controlled feeding schedules, under a 12:12-h light:dark cycle. Experimental sessions were conducted 3–5 days per week. The majority of food was given after the experimental session each day (Harlan Monkey Diet plus fresh fruit, vegetables, and treats), and each animal was maintained above 85 % of their free-feeding weight during the controlled feeding schedule. All procedures were carried out with approval from the Institutional Animal Care and Use Committee at the University of Iowa.

Apparatus

Subjects were placed in a sound-attenuated chamber for the duration of each experimental session. The animal was held in a custom-made primate chair that allowed free arm movement for behavioral responses. Acoustic stimuli were delivered through a single speaker located 15 cm in front of the primate chair at eye level. Behavioral responses were made via a single acrylic button positioned 3 cm below the speaker. Small food rewards were dispensed from a pellet dispenser (Med Associates, St. Albans, VT) through a copper tube into a dish located 3 cm below the response button. A dimmed 40-W house light provided illumination throughout the duration of the experiment, and a second light provided additional illumination during the ITI. Custom-designed software (LabView; National Instruments, Dallas, TX) controlled and recorded the stimulus presentations and other task events.

Procedure

Task

The short-term memory task used for this experiment was a variation of the delayed matching-to-sample (DMS) task, which is suitable for use with auditory stimuli. The typical DMS task begins with the presentation of a sample stimulus, which is followed by a retention interval, after which subjects are rewarded for identifying the sample from two test stimuli. In the same–different variation of the DMS task (Wright, 2006), a single test stimulus is presented following the retention interval, and the subject must indicate whether it is identical to (match trials) or different from (nonmatch trials) the sample. The traditional two-choice and same–different versions of the DMS task produce very similar outcomes in visual short-term memory performance in monkeys (D’Amato & Worsham, 1974). Thus, visual memory experiments using the traditional DMS task can be reasonably compared to the present experiment, as well as to previous studies that have used the auditory same–different DMS task (Colombo & D’Amato, 1986; Kojima, 1985; Wright, Shyan, & Jitsumori, 1990).

In the task for Experiment 1, we used a fixed retention interval, or interstimulus interval (ISI), of 5 s, and a variable ITI averaging 10 s (range: 8–12 s). Each session consisted of a total of 128 trials, with equal numbers of match and nonmatch trials presented in a pseudorandom order. Following the presentation of the test stimulus, the response button was illuminated for 1 s to indicate that a response could be made. If a buttonpress was made outside of the 1-s response window, the current trial was aborted and replaced with a new trial. For match trials, correct responses were defined by the presence of a buttonpress (“go” response) following the test stimulus, whereas for nonmatch trials, correct responses were defined by the absence of a buttonpress (“no-go” response). The task used an asymmetric reinforcement contingency in which correct “go” responses on match trials were rewarded with a small food pellet, and incorrect buttonpresses (false alarms) on nonmatch trials were occasionally punished by a brief, mild air puff presented indirectly from a distance of approximately 15 cm from the animal (approximately 1/10 of incorrect nonmatch trials were punished, on a variable schedule). Similar asymmetric reinforcement contingencies have been used in previous studies of auditory short-term memory in monkeys because they facilitate learning the match-versus-nonmatch rule (Colombo & D’Amato, 1986; Kojima, 1985; Stepien & Cordeau, 1960).

Stimulus sets

Experiment 1 consisted of a systematic manipulation of the number of sounds that were recycled as the sample and test stimuli throughout each 128-trial session. For a given experimental session, one of the following stimulus set sizes was randomly selected: 2, 4, 8, 16, 32, 64, or trial-unique. For the trial-unique condition, 192 stimuli were used (64 sounds used for 64 match trials, and 128 sounds used for 64 nonmatch trials). After a stimulus set size was used for a session, it was not used again until the remaining six set sizes had been used. Each animal completed a total of 20 sessions with each stimulus set size.

The stimuli used for each session were randomly selected from a collection of 192 sounds consisting of 32 exemplars from each of the following six sound classes: conspecific monkey vocalizations, human vocalizations, animal vocalizations, natural and environmental sounds, music clips, and synthetic/abstract sounds. The monkey vocalizations were recorded at a natural monkey reserve in South Carolina, USA (by author A.P.), and included a variety of coos, grunts, screams, shrill barks, and harmonic arches. The human vocalizations consisted of a variety of speech and nonspeech vocalizations from a variety of speakers, including members of each gender. Animal vocalizations came from a wide variety of birds and mammals other than monkeys and humans. The natural and environmental sounds included natural phenomena, such as thunder and breaking tree branches, as well as sounds that the animals might have been exposed to in the laboratory, such as a door closing or a broom falling on the floor. Music clips comprised multinote sequences extracted from a variety of sources, such as solo instrument performances, popular music recordings, and TV commercials. The synthetic and abstract sounds were artificial sounds generated by electronic synthesizers or downloaded from the abstract sound categories (e.g., “science fiction”) of a commercially available sound effect database (www.soundsnap.com). All sounds were trimmed to 500 ms, volume normalized, and presented at 75 ± 5 dB. Within a session, each sound had an equal chance of being presented on a given trial as the sample and/or the test stimulus, depending on whether it was a match or nonmatch trial, with the constraint that each sound was presented an equal number of times throughout the session.

We found no evidence that the effects of PI differed among the sound types used in our study, and thus the results presented below are collapsed across sounds. It should be noted, however, that our experiment was not specifically designed to test for differences in PI effects among the sound types. For instance, by randomly selecting the sounds to be used for each session, the different sound types were used in unequal numbers of sessions. Thus, the question of whether PI interacts with sound type remains to be addressed by future studies.

Analysis

Although the PI literature has traditionally focused on accuracy (percent correct) as the dependent measure, a relatively small number of publications have reported modulation of response latencies by PI (Hendrikx, 1986; Monsell, 1978; Wixted & Rohrer, 1993). For this reason, both accuracy and response latencies were evaluated as dependent variables in our study.

The animals occasionally stopped responding before the final trial of the experiment session. For sessions in which the subject did not make a single response during the last 20 trials, we considered the final response to be the end of the session. The remaining trials were rejected from accuracy and response latency analyses to ensure that any observed effects could be attributed to mnemonic rather than attentional or motivational factors. The statistical test used to evaluate all effects of PI was repeated measures analysis of variance (ANOVA) with an alpha level of .05, using the session means as individual data points. We found small but significant differences in mean overall accuracy between the monkeys in both experiments. In Experiment 1, monkey F.R. averaged 79.7 % correct, and monkey S.E. averaged 83.5 % [F(1, 19) = 7.57, p < .05, η p 2 = .29]. In Experiment 2, monkey F.R. averaged 68.0 % correct, and monkey S.E. averaged 74.3 % [F(1, 19) = 27.13, p < .05, η p 2 = .59]. However, the accuracy for both animals was affected by PI in similar ways: We replicated each of the analyses below with Subject (monkey F.R., monkey S.E.) as an additional factor and found no significant interactions. Thus, the results below are given as the combined average for both animals. Any data points that were missing for a given analysis were substituted with the series mean (Bigelow & Poremba, 2012; Roth, 1994). For example, in the unusual case that no incorrect buttonpresses were made on nonmatch trials for a given session, and therefore no incorrect response latency data were available, the missing data point was estimated for the repeated ANOVA as the mean incorrect nonmatch response latency of the remaining sessions within the same stimulus set size condition in which such responses occurred.

Results and discussion

An unanticipated but interesting initial observation in our data set was that the animals were more likely to quit responding before the final trial of the experimental sessions using the smallest stimulus sets. Thus, the mean number of incomplete trials per session for the two-stimulus condition was 25.7, whereas for the trial-unique condition it was 12.7 (Fig. 1). A repeated measures ANOVA confirmed that the effect of stimulus set size on early quitting was significant: F(6, 234) = 3.82, p < .05, η p 2 = .09. Post-hoc tests (Fisher’s LSD, alpha level .05) indicated that the number of incomplete trials was significantly greater for the two-stimulus condition than for the eight- or 64-stimulus or trial-unique conditions. This finding perhaps reflects the tendency of the animals to quit when the task became particularly difficult (see below).

Fig. 1
figure 1

Average number of trials not completed (out of 128) as a function of stimulus set size. Early quitting was more frequently observed during sessions using the smallest stimulus set size (two). TU = trial-unique. Error bars indicate the standard errors of the means

As expected, overall accuracy was significantly affected by stimulus set size: F(6, 234) = 37.38, p < .05, η p 2 = .49. A significant linear trend also emerged, indicating that accuracy increased as a function of stimulus set size: F(1, 39) = 141.82, p < .05, η p 2 = .78. Accuracy was poorest (72 %) during sessions in which only two stimuli were used, and increased for the larger stimulus set sizes, reaching its maximum (88 %) during sessions with trial-unique stimuli (Fig. 2). By comparison, Mishkin and Delacour (1975) reported that visual DMS accuracy fell from 90 %, when trial-unique stimuli were used, to 65 %, when only two stimuli were used. The differences in accuracy between sessions using only two stimuli versus trial-unique stimuli for the visual (25 %) and auditory (16 %) DMS tasks suggest that the influences of PI are roughly comparable, if perhaps somewhat less severe, for auditory short-term memory. However, because this comparison was limited by differences in subjects and experimental designs, future research will be needed to substantiate any difference in PI between sensory modalities.

Fig. 2
figure 2

Overall accuracy improves as a function of stimulus set size. Accuracy was significantly greater for trial-unique sessions than for sessions using stimulus set sizes of 32 or smaller. TU = trial-unique. Error bars indicate the standard errors of the means

To examine the influence of stimulus set size on response latency, it was first necessary to separate the data by trial type, since buttonpresses on match trials reflected correct responses, whereas buttonpresses on nonmatch trials reflected incorrect responses. A repeated measures ANOVA with Trial Type and Stimulus Set Size as factors resulted in significant main effects for both factors [trial type, F(1, 39) = 109.19, p < .05, η p 2 = .74; stimulus set size, F(6, 234) = 5.38, p < .05, η p 2 = .12], as well as a significant interaction [F(6, 234) = 5.73, p < .05, η p 2 = .13]. We also observed significant trends in the nonmatch and match response latency data. For incorrect responses on nonmatch trials, we found a significant linear trend, indicating that the latencies of incorrect responses increased as a function of stimulus set size [F(1, 39) = 29.02, p < .05, η p 2 = .43]. For correct responses on match trials, we found a significant quadratic trend [F(1, 39) = 7.65, p < .05, η p 2 = .16]. This outcome suggests that correct responses are made more slowly under high- than under moderate-PI conditions. However, when PI is very low or absent, response times are greater than when a moderate amount of PI is present. Correct “match” responses were faster on average (562 ms) than incorrect responses (701 ms). Post-hoc analyses (Fisher’s LSD, alpha level .05) revealed that nonmatch errors were slower for the trial-unique condition than for all other stimulus set sizes (Fig. 3). By contrast, correct “match” responses were slower for the two-stimulus set than for several of the larger stimulus sets (four, eight, and 32). These results suggest that PI negatively impacts performance by increasing the speed with which nonmatch errors are made and by slowing correct “match” responses.

Fig. 3
figure 3

Response latencies for match and nonmatch trials as a function of stimulus set size. (a) Response latencies for match trials were significantly slower during sessions using the smallest set size (two), suggesting increased processing time for correct “match” responses under relatively high-PI conditions. (b) By contrast, erroneous buttonpresses on nonmatch trials were significantly slower for trial-unique conditions, suggesting that errors are committed more quickly under high-PI conditions. TU = trial-unique. Error bars indicate the standard errors of the means

Evaluating the accuracy for match and nonmatch trials separately revealed that the majority of the deficit in overall accuracy associated with the smaller set sizes (Fig. 2) was due to increased errors or false alarms on nonmatch trials (Fig. 4). A two-factor ANOVA testing these differences resulted in significant main effects of trial type [F(1, 234) = 19.13, p < .05, η p 2 = .33] and set size [F(6, 234) = 35.65, p < .05, η p 2 = .48], as well as a significant interaction of these factors [F(6, 234) = 4.91, p < .05, η p 2 = .11]. Post-hoc analyses (Fisher’s LSD, alpha level .05) indicated that nonmatch accuracy declined steadily from 89 % in trial-unique sessions to 67 % in sessions in which the same two stimuli were repeatedly reused. This outcome is consistent with the view that increasing the frequency with which stimuli are presented from trial to trial leads subjects to commit more false-positive errors, because they might have heard a sound on a recent trial that “matches” the test stimulus on the current trial.

Fig. 4
figure 4

Accuracy for match and nonmatch trials as a function of stimulus set size. (a) Accuracy for match trials was relatively stable across stimulus set sizes, except that fewer correct “match” responses were made during sessions using the smallest sets (two and four). (b) PI associated with the smaller stimulus sets had a much larger impact on nonmatch accuracy, rising steadily from 67 % in the two-stimulus set condition to 89 % in the trial-unique condition. TU = trial-unique. Error bars indicate the standard errors of the means

In contrast to the effects of PI on nonmatch trials, accuracy on match trials was only significantly reduced for the smallest set sizes (set size two, 77 %; set size four, 84 %), with no differences among set sizes eight through trial-unique (range: 86 %–87 %). One possible interpretation for why match accuracy might decrease under high-PI conditions involves the concept of feedback-related changes in the criterion of familiarity with which same–different judgments are made, as discussed by Wright (2006, 2007). According to Wright, subjects will make a “match” response only if the degree of familiarity evoked by a test stimulus exceeds the animal’s familiarity criterion; otherwise, a “nonmatch” response will be made. When trial-unique stimuli are used, a test stimulus will only be familiar if it matches the sample stimulus from the same trial. In this context, adopting any criterion level of familiarity will suffice in making accurate same–different judgments. However, when stimuli are recycled from trial to trial, a test stimulus might evoke a certain level of familiarity by virtue of having been presented on a recent (and now irrelevant) trial, and not because it matches the sample stimulus of the current trial. Under these conditions, adopting a relatively lax criterion of familiarity will result in a high rate of false matches. Thus, one strategy for coping with a high degree of PI is to rely on a more rigid familiarity criterion, such that only the most familiar test stimuli are accepted as matches. One of the predicted consequences of increasing the familiarity criterion is that, along with a decrease in false matches, the frequency with which true matches are rejected will also increase. Our data fit well with this prediction, inasmuch as more false-negative errors were observed for the sessions with the greatest amounts of PI.

To provide more direct evidence for a shift in familiarity criterion resulting from PI, we investigated whether the rates of false matches (nonmatch errors) and false rejections (match errors) changed as the experimental session progressed for each stimulus set size. The accuracy data were separated by trial types and averaged for the first, second, third, and fourth quarters of the session (i.e., successive blocks of 32 trials). A three-way repeated ANOVA produced a significant interaction among trial type, stimulus set size, and trial block [F(18, 702) = 3.12, p < .05, η p 2 = .07]. As can be seen in Fig. 5, the higher-PI conditions led to a steady decrease in false matches throughout the session, as well as a corresponding increase in false rejections. The magnitude of these reciprocal trends diminished as a function of stimulus set size, to the extent that there was no significant Trial Type × Trial Block interaction for the trial-unique condition. The absence of a progressive change in accuracy during trial-unique sessions is helpful in interpreting the Trial Type × Trial Block interactions observed for the remaining conditions, because it argues against attentional or motivational explanations for the changes in error rates associated with the smaller set sizes. For example, it is unlikely that the observed decrease in “match” responses for the smaller set sizes reflects reward satiation, because these changes were not observed when new stimuli were used for each trial. Rather, it seems likely that the shift toward fewer “match” responses (for both trial types) reflects the number of negative outcomes associated with false “match” responses. This interpretation lends support to Wright’s (2006, 2007) suggestion that PI will gradually produce a proportional increase in the familiarity criterion that forms the basis for the match-versus-nonmatch decision.

Fig. 5
figure 5

Progressive changes in accuracy by trial types for the first through the fourth quarters of the experimental session (successive blocks of 32 trials). Nonmatch errors became less frequent, whereas match errors became more common as the session progressed. The magnitude of this interaction diminished with increasing stimulus set size, such that no significant interaction was observed for trial-unique sessions. Error bars indicate the standard errors of the means

Curiously, the main effect of trial block and the interaction between trial block and stimulus set size were both nonsignificant [F(3, 117) = 1.36, p > .05, η p 2 = .03, and F(18, 702) = 1.40, p > .05, η p 2 = .04, respectively]. This indicates that, although the rate of false “match” responses decreased throughout a session, the rate of true “match” responses decreased at an approximately equal rate. Thus, the benefits of reducing nonmatch errors were offset completely by the costs of increasing match errors, leading to zero net improvement during the session. We also examined the overall accuracy for each stimulus set over the course of four successive blocks of five experimental sessions but found no evidence of significant improvement: A repeated measures ANOVA returned neither a main effect of the block of five experimental sessions [F(3, 27) = 0.87, p > .05, η p 2 = .09] nor an interaction between stimulus set size and experimental block [F(18, 162) = 1.22, p > .05, η p 2 = .12]. In several previous visual short-term memory experiments, pigeons and monkeys had shown improvements in overall accuracy, despite high-PI conditions, that gradually emerged with consistent, extended experience with small stimulus sets (D’Amato, 1973; Grant, 1975, 1976; Wright, 2007). It is possible that, since sessions with small and large stimulus sets were interleaved in our study, the animals lacked sufficient consistent experience with the high-PI conditions to result in an adaptive adjustment of their familiarity criterion for “match” responses. A design presenting a block of multiple, consecutive sessions using a given stimulus set size, followed by a subsequent block using a different set size, could be useful in determining whether this is the case. Alternatively, it is possible that the lack of improvement over the course of the experiment is related to the difficulty that monkeys have with establishing enduring memories in the auditory modality.

Two additional predictions are made by Wright’s (2006, 2007) suggestion that animals’ same–different decisions are under the influence of a familiarity criterion, which can be modulated by error-related feedback. The first is that intertrial PI should have a graded effect on accuracy, such that stimulus repetitions separated by a large number of trials should have a smaller effect than do stimulus repetitions separated by only a few trials or between adjacent trials. For example, fewer false “match” responses should occur if the test stimulus on trial n had most recently been presented on trial n – 20 than if it had been presented on trial n – 1 or n – 2. The second prediction is that the more rigid familiarity criterion that results from frequent exposure to PI should result in fewer nonmatch errors on trials in which the test stimulus has been presented on a relatively recent trial. Thus, even though training with smaller stimulus sets should yield relatively poor overall nonmatch accuracy, the frequency with which the nonmatching test stimuli are erroneously accepted as matches by virtue of having appeared on a recent trial, such as n – 1 or n – 2, should decrease (see Fig. 9.5 in Wright, 2006, for a hypothetical relationship between susceptibility to intertrial PI and the familiarity criterion).

To test these predictions, we evaluated nonmatch accuracy on trials in which the test stimulus on trial n had most recently been presented on trial n – 1, n – 2, or n – 3. We only evaluated the effects of PI from trials n – 1 through n – 3 because, for the two-stimulus set size, the number of nonmatch trials on which the test stimulus had most recently occurred on trial n – 4 or greater was negligible (0.1 % of total trials). Similarly, we did not include sessions using stimulus set sizes of 16 or greater in this analysis because there were too few nonmatch trials in which the test stimulus had been presented on trials n – 1 through n – 3. Repeated measures ANOVA indicated significant main effects of stimulus set size [F(2, 78) = 9.42, p < .05, η p 2 = .20] and PI location [F(2, 78) = 70.72, p < .05, η p 2 = .65]; the interaction was not significant [F(4, 156) = 0.97, p > .05, η p 2 = .02]. Consistent with similar analyses in several previous studies (Hartshorne, 2008; Wright et al., 2012), nonmatch accuracy increased steadily according to the distance of the most recent test-stimulus repetition (Fig. 6). Furthermore, in confirmation of Wright’s (2006, 2007) expectations, false “match” responses on trials with PI from trials n – 1, n – 2, and n – 3 were less likely for sessions in which only two stimuli were used, as compared to sessions involving four or eight stimuli. It should be noted that overall nonmatch accuracy was better for the larger set sizes (Fig. 4) because of the larger number of trials in those sessions that did not have PI from trial n – 1, n – 2, or n – 3. Nevertheless, for the two-stimulus set size condition, accuracy on trial n when the test stimulus hadn’t been presented since trial n – 3 reached a level (91 %) similar to that observed for trial-unique sessions (89 %).

Fig. 6
figure 6

Nonmatch accuracy on trials for which the test stimulus had been presented on trial n – 1, n – 2, or n – 3. Although overall nonmatch accuracy was lowest for the two-stimulus set condition (see the text), accuracy on the subset of trials with recent PI that were included in this analysis was greater for the two- than for the four- or the eight-stimulus set. Intertrial PI had a graded effect on nonmatch accuracy for all three conditions. Error bars indicate the standard errors of the means

To summarize the results of Experiment 1, PI produced by reusing a relatively small number of sounds from trial to trial decreased overall accuracy primarily by producing more false alarms on nonmatch trials. These nonmatch errors tended to be committed faster under high-PI conditions than when trial-unique stimuli were used. The smallest set sizes, and therefore those that produced the most pervasive PI, also reduced the number of correct “same” decisions on match trials and increased the amount of time before these decisions were made. These outcomes are consistent with Wright’s (2006, 2007) prediction that subjects will adopt a more stringent criterion of familiarity for “same” judgments when PI becomes highly saturated. This notion of a familiarity criterion is further supported by several additional results from our study. First, as the experimental session progressed, subjects committed fewer false alarms on nonmatch trials, but also fewer correct “same” responses on match trials. This effect was roughly proportional to the degree of PI caused by stimulus repetitions. Second, as in several previous studies (Bigelow & Poremba, 2012; Hartshorne, 2008; Wright et al., 2012), PI originating from progressively more distant trials produced a graded effect on nonmatch accuracy. Finally, a greater degree of PI throughout the session (resulting from a smaller stimulus set size) resulted in fewer nonmatch errors on trials with PI originating from one of the three most recent trials.

Experiment 2

Method

Subjects

The subjects were the same as in Experiment 1.

Apparatus

The apparatus was the same one used in Experiment 1.

Procedure

Task

The task was similar to the one used in Experiment 1, except that the stimulus set size was held constant while the ITI was manipulated between sessions. A set of four stimuli was used, in order to produce an intermediate amount of PI. The stimuli for each session were randomly drawn from the same stimulus population that had been used in Experiment 1.

Intertrial intervals

Experiment 2 consisted of a parametric manipulation of the ITI duration. A fixed ITI of 5, 10, or 20 s was randomly selected for each session. Note that these values corresponded to ISI:ITI ratios of 1:1, 1:2, and 1:4, respectively (see D’Amato, 1973). After a session was completed using one of the ITI values, this value was not used again until the monkeys had completed sessions using the remaining two ITIs. As in Experiment 1, each animal completed a total of 20 sessions using each ITI.

Analysis

The analyses were similar to those of Experiment 1, except that the independent variable was the ITI.

Results and discussion

Repeated measures ANOVA indicated that ITI had a significant effect on overall accuracy [F(2, 78) = 7.41, p < .05, η p 2 = .16]. As revealed by post-hoc tests (Fisher’s LSD, alpha level of .05), the 5-s ITI resulted in lower overall accuracy (67 %), whereas accuracy in the 10-s and 20-s ITI conditions was equal (73 %; Fig. 7). Evaluating the differential effect of ITI on match and nonmatch accuracy again revealed that the decrease in overall accuracy was caused primarily by an increase in nonmatch errors at the shortest ITI (Fig. 8). Repeated measures ANOVA revealed significant main effects of ITI [F(2, 78) = 6.50, p < .05, η p 2 = .14] and trial type [F(1, 39) = 7.44, p < .05, η p 2 = .16], but no significant interaction between the two factors [F(2, 78) = 0.60, p > .05, η p 2 = .02]. Post-hoc analyses indicated no significant differences among the three ITI conditions for match accuracy. As with overall accuracy, nonmatch accuracy was significantly reduced for the 5-s ITI condition, whereas the 10-s and 20-s ITI conditions did not differ from each other.

Fig. 7
figure 7

Overall accuracy as a function of the duration of the ITI. Accuracy improved significantly when the ITI was extended from 5 to 10 s, but no further advantage was gained by increasing the ITI to 20 s. Error bars indicate the standard errors of the means

Fig. 8
figure 8

Accuracy for match and nonmatch trials as a function of the duration of the ITI. (a) We observed no significant effect of ITI on match accuracy. (b) However, nonmatch accuracy improved significantly by increasing the ITI from 5 to 10 or 20 s. Error bars indicate the standard errors of the means

As in Experiment 1, we examined response latencies by separating the data by trial types. Repeated measures ANOVAs revealed significant effects of trial type [F(1, 39) = 95.04, p < .05, η p 2 = .71] and ITI [F(2, 78) = 6.45, p < .05, η p 2 = .14], as well as a significant interaction [F(2, 78) = 4.22, p < .05, η p 2 = .10]. As in Experiment 1, correct “match” responses were faster (582 ms) than erroneous responses on nonmatch trials (690 ms). Unlike Experiment 1, post-hoc tests revealed no differences for nonmatch errors as a function of ITI (Fig. 9). For correct “match” responses, slower response latencies were observed for the 5-s ITI condition, but no differences were observed between the sessions using 10-s and 20-s ITIs. The latter outcome is consistent with the results from Experiment 1 in suggesting that the decision time for correct match trials increases when PI becomes saturated.

Fig. 9
figure 9

Response latencies for match and nonmatch trials as a function of the duration of the ITI. (a) Correct “match” responses were significantly slower for sessions using the shortest ITI (5 s). (b) No significant effect of ITI was found for erroneous responses on nonmatch trials. Error bars indicate the standard errors of the means

Although nonmatch accuracy increased significantly by extending the ITI from 5 s (62 %) to 10 (69 %) or 20 s (70 %), it was still well below the nonmatch accuracy achieved in trial-unique sessions in Experiment 1 (89 %). This outcome suggests that substantial PI might still be caused by reuse of stimuli, even when trials have been separated by as much as 20 s. Thus, as in Experiment 1, we directly investigated the influence of intertrial PI by evaluating nonmatch accuracy on trial n as a function of the trial on which the test stimulus had most recently been presented. Since a four-stimulus set was used in all conditions, it was possible to evaluate the influence of PI originating from trials n – 1 through n – 5 (the most recent PI stimulus was found on trial n – 6 or greater for only 9.0 % of the nonmatch trials). As can be seen in Fig. 10, accuracy improved for each ITI condition as the number of trials between stimulus repetitions increased. A repeated measures ANOVA revealed a main effect of ITI that was of borderline significance [F(2, 78) = 3.10, p = .05, η p 2 = .07] and a main effect of PI location [F(4, 156) = 34.46, p < .05, η p 2 = .47], but the interaction of these factors was not significant [F(8, 312) = 0.84, p > .05, η p 2 = .02]. These results indicate that, unlike several experiments in pigeons and rats, which eliminated visual PI by increasing the ITI to 15 or 20 s (Dunnett & Martel, 1990; Grant, 1975; Hogan et al., 1981; but see Wright et al., 2012), auditory PI in monkeys can influence subsequent trials even when they are separated by up to 20 s.

Fig. 10
figure 10

Intertrial proactive interference (PI) as a function of the duration of the ITI. PI had the largest effect in the 5-s ITI condition, whereas the 10-s and 20-s conditions were similar. PI had a significant influence spanning multiple trials for all three ITI conditions. Error bars indicate the standard errors of the means

Post-hoc tests indicated that, for each condition, nonmatch accuracy improved when the test stimulus had most recently occurred on trials n – 2 through n – 5, as compared to when it had occurred on the previous trial, n – 1. For sessions using the 5-s ITI, accuracy on trials for which the test stimulus had most recently occurred on trial n – 5 was significantly higher than when the stimulus had occurred on either trial n – 1 or n – 2. Similarly, for the 10-s ITI condition, accuracy was greater when the test stimulus had not been presented since trial n – 4 or n – 5 than when it occurred on trial n – 1 or n – 2. However, for the 20-s ITI condition, there was no statistically significant difference in accuracy when the test stimulus had most recently occurred on trial n – 2 through n – 5. This suggests that increasing the ITI from 10 to 20 s may have slightly reduced the extent of intertrial PI, although this difference was insufficient to result in any significant change in overall accuracy (Fig. 7) or in averaged nonmatch accuracy (Fig. 8).

In a prior study, we examined the extent of intertrial PI in monkeys performing an auditory DMS task with a stimulus set size of eight and a retention interval of 5 s (Bigelow & Poremba, 2012). A variable ITI averaging approximately 10 s was used, making it most similar to the 10-s ITI condition in Experiment 2. In that study, we found that the monkeys were more likely to commit errors on nonmatch trials when the test stimulus was repeated after as many as ten trials. By contrast, in the present study (Exp. 2, 10-s ITI), the effects of PI appeared to reach asymptote by about trial n – 4. This difference could plausibly result from an increase in the familiarity criterion for “match” responses, resulting from the relatively high-PI conditions produced by the smaller stimulus set size (four as compared to eight). Specifically, if the monkeys were using a relatively lax criterion of familiarity for “match” decisions in the eight-stimulus set condition, they would be more likely to erroneously accept a nonmatching test stimulus as a “match” if it had occurred on trial n – 4 or n – 5, or even earlier.

To summarize the results of Experiment 2, we observed that overall accuracy increased, if only to a small extent, by extending the ITI from 5 to 10 s, but no additional improvement was gained by increasing the ITI to 20 s. These outcomes are comparable to those from a study of visual short-term memory in monkeys reported by Jarrard and Moise (1971), in which a small increase in accuracy was produced by increasing the ITI from 5 to 15 s, but not by further extending the ITI to 30 or 60 s. We further found that the effects of intertrial PI were substantial even when trials were separated by 20 s. Although these results are generally consistent with the previous literature, the decay of PI from previous trials produced by increasing the ITI was less than we had initially expected. Given the significant intertrial effects of PI in each condition, it is likely that future studies will require an ITI substantially longer than 20 s in order to completely eliminate auditory PI in monkeys.

General discussion

Our experiments showed that, like other forms of memory, auditory short-term memory in monkeys is susceptible to PI caused by reusing the same stimuli for multiple trials within an experimental session. In Experiment 1, PI was most severe when only a very small number of stimuli (e.g., two or four) were used throughout the session. The effects of PI diminished steadily as the stimulus set size increased, and maximum accuracy was observed when new stimuli were used for each trial (Fig. 2). In Experiment 2, PI was modestly attenuated by increasing the ITI from 5 to 10 s, but no additional advantage was gained by increasing the ITI to 20 s (Figs. 7, 8, 9, and 10).

In both experiments, PI reduced overall accuracy primarily by increasing the number of erroneous “match” responses on nonmatch trials (Figs. 4 and 8). This finding is similar to what has been reported in experiments using list memory tasks in which a small number of stimuli are reused from trial to trial (Wright, 1998). In these tasks, a list of several sample stimuli is presented, each separated by a brief ISI. The last item in the list is followed by a retention interval, after which a single probe stimulus is presented, and the subject must indicate whether the probe had been presented in the list (“same”) or not (“different”). In both visual and auditory list memory tasks in monkeys, using a small stimulus set throughout the session, and thereby increasing item repetition among trials, increased the rate of erroneous “same” responses on trials in which the probe was different from the items presented in the list (Sands & Wright, 1980; Wright, 1999). This likely occurred because, although the probe did not match one of the list items from the current trial, it may have matched a list item from a recent trial. In other words, the error being committed by the subjects seems to have been forgetting whether the probe had occurred on the current trial or some previous, now irrelevant trial (Wright, 2006). In the present study as well as in previous DMS and list memory experiments, errors on nonmatch trials were most likely if the probe had been presented on the immediately previous trial, and they became less likely as the number of trials since the probe had been presented increased (Bigelow & Poremba, 2012; Hartshorne, 2008; Wright et al., 2012; Wright et al., 1986).

Several results from Experiment 1 provide support for Wright’s (2006, 2007) view that a criterion threshold of familiarity is used to make the same-versus-different choice, and that this familiarity criterion can be increased as a result of committing frequent false “match” errors. Although PI consistently increased the false alarm rate on nonmatch trials, the extremely saturated PI conditions also reduced the number of correct “same” judgments on match trials (Fig. 4). Additional analyses revealed a steady decrease in “match” responses as the session progressed for both match and nonmatch trials, an effect that was most pronounced for the highest-PI conditions (Fig. 5). Furthermore, although the overall nonmatch accuracy was lowest for the two-stimulus set, nonmatch accuracy on trials with recent PI (originating from trial n – 1, n – 2, or n – 3) was higher for the two-stimulus than for the four- or eight-stimulus sets (Fig. 6). Each of these results can be seen as a consequence of increasing the familiarity criterion for “match” responses, which reflected the degree of PI within an experimental session. These observations fit well with several previous animal studies showing gradual improvement over time under consistently high PI conditions (D’Amato, 1973; Grant, 1975, 1976; Wright, 2007), and with data from humans showing that their familiarity criteria could be modified by changes in stimulus presentation frequency (Yonelinas, 2002).

In general, our results show that the effects of stimulus set size and ITI in auditory short-term memory in monkeys are similar to what has previously been reported in visual studies. In Experiment 1, subjects reached 88 % overall accuracy for sessions that used trial-unique stimuli, which compares favorably with the 90 % accuracy reported by Mishkin and Delacour (1975) for monkeys performing a trial-unique visual DMS task. When only two stimuli were repetitively used throughout the session, accuracy fell to 72 % in our study, and 65 % in Mishkin and Delacour’s visual study. In Experiment 2, overall accuracy increased from 67 % when the ITI was 5 s to 73 % when the ITI was increased to 10 or 20 s. This modest increase in accuracy is similar to that observed by Jarrard and Moise (1971), who reported that increasing the ITI from 5 to 15 s increased accuracy by 5 %–10 % for various retention intervals in monkeys performing visual DMS. Moreover, as in our study, in which no additional benefit was gained by extending the ITI from 10 to 20 s, Jarrard and Moise also observed no significant increase in accuracy after increasing the ITI from 15 to 30 or 60 s. In both experiments, we observed significant effects of intertrial PI that extended beyond immediately adjacent trials. PI in visual short-term memory in monkeys has similarly been observed to span multiple trials (Wright, 2007). In Wright’s (2007) study, as well as ours, the effects of PI diminished as a function of the number of trials separating the current trial from the source of the PI. However, a direct comparison of the extents and impacts of intertrial PI in these studies is complicated by differences in the experimental parameters. For instance, in Wright’s (2007) study, the influence of PI was far more severe when a relatively long (20 s) rather than a short (1 s) retention interval was used, whereas our study used an intermediate retention interval (5 s). These differences notwithstanding, the foregoing results imply that the operation of PI is at least qualitatively similar in auditory and visual short-term memory in monkeys. Additional studies using similar task parameters, and ideally the same subjects, will be needed to determine whether any quantitative differences exist between PI in auditory and visual short-term memory.

In view of our experimental outcomes as well as those of previous studies, future attempts to maximize accuracy by decreasing PI should avoid recycling stimuli between trials as much as possible. Ideally, new stimuli should be used for each trial. However, in some experimental paradigms, presenting multiple trials using the same stimuli is unavoidable. For instance, neurophysiological investigations of visual and auditory short-term memory require multiple repetitions of each stimulus in order to establish reliable stimulus-evoked neuronal responses (Bigelow & Poremba, 2012). Under such circumstances, a relatively long ITI may help reduce PI by increasing the decay time for irrelevant memory traces from previous trials. In addition to these factors, previous studies have indicated that PI may be reduced by increasing the stimulus exposure time (Grant, 1975) and by reducing the retention interval (Meudell, 1977; Wright, 2007; Wright et al., 2012; Zentall & Hogan, 1974). Thus, at minimum, optimal performance on DMS and similar tasks will depend on the stimulus set size, ITI, retention interval, and stimulus exposure time, as well as interactions among these variables (see also van Hest & Steckler, 1996).

In summary, auditory memory in monkeys is highly susceptible to PI, which can be minimized by increasing the number of new stimuli that are presented throughout the trial. To a lesser extent, PI may be reduced by allowing an adequate interval of time between trials for sessions that reuse stimuli from trial to trial. Whether or not the monkeys will make a “same” judgment (whether correct or incorrect) may depend on whether the test stimulus exceeds a threshold level of familiarity, which may result either from having been recently presented as the sample for the current trial or from presentation on a prior, and currently irrelevant, trial. Following error-related feedback from having incorrectly chosen a nonmatching test stimulus as a “match,” this threshold of familiarity may become more stringent, in order to minimize future nonmatch errors. These findings expand our understanding of the variables governing auditory short-term memory in monkeys. Further studies directly comparing auditory, visual, and tactile short-term memory will be needed in order to reveal the extent to which these findings generalize across sensory modalities.