The ability of an animal to adapt to environmental change depends on its ability to learn that the reward value of a stimulus can suddenly change. To the degree that the animal is influenced only by the accumulated reinforcement history associated with the stimuli that it has experienced, its ability to respond to a change in reward value should be rather slow, whereas if it is able to benefit from feedback from the outcome of the preceding trial, it should be able to acquire new learning faster, perhaps by learning to ignore irrelevant cues and learning to quickly inhibit earlier behavior.

One approach to assessing an animal’s learning-to-learn ability is to train the animal on a serial reversal task in which the animal is given a simultaneous discrimination. Following acquisition, the discrimination is reversed (i.e., what was once correct is now incorrect), and once the animal has acquired the reversal, the discrimination is reversed repeatedly (e.g., Mackintosh, McGonigle, Holgate, & Vanderver, 1968). If one uses original learning as a baseline against which to measure improvement, one should be able to control for the difficulty of the original discrimination and thereby assess individual differences, and even make comparisons among different species. That is, the degree of improvement with successive reversals, relative to the original acquisition baseline, should be a measure of the animal’s cognitive flexibility (Bitterman, 1975). Research has shown that a variety of animals, including apes and monkeys (Beran et al., 2008; Warren, 1966), horses (Martin, Zentall, & Lawrence, 2006), rats (Bushnell & Stanton, 1991; Reid & Morris, 1992; Williams, 1972), and birds (Bond, Kamil, & Balda, 2007; Ploog & Williams, 2010) show substantial improvement with reversals, suggesting that, since it is so prevalent, this type of flexibility should have adaptive value (Shettleworth, 1998). Furthermore, serial reversal studies have shown that the variability in improvement over reversals differs among species, suggesting that some species can quickly adjust to changes in the value of stimuli as a function of feedback.

When a reversal has been experienced repeatedly, the optimal strategy with this task would be to base one’s choice on the consequences of the last trial. If the previous response was rewarded, one should stay with it; if it was not rewarded, one should shift to the alternative response. Humans are quite good at adopting this optimal strategy (Bechara, Tranel, & Damasio, 2000), whereas other species typically do not show such optimal reversal performance.

A variation of the serial reversal procedure is one in which each session, involving a simple simultaneous discrimination, begins with one stimulus (S1) as the correct (positive, S+) stimulus and a different one (S2) as the incorrect (negative, S–) stimulus (S1+, S2–), and halfway through the session the discrimination reverses (S2+, S1–; Rayburn-Reeves, Molet, & Zentall, 2011; see also Cook & Rosen, 2010; Mackintosh et al., 1968). Unlike other serial reversal procedures, two novel aspects of this procedure are that the correct response at the start of each session is predictable and that the reversal occurs at a predictable point in each session.

In Rayburn-Reeves et al.’s (2011) procedure, for each pigeon, one stimulus was randomly assigned as the first correct stimulus (S1), and responses to that stimulus and not the other (S2) were reinforced for the first half of each 80-trial session (S1+, S2–). However, for the last half of the session, the contingencies were reversed (S1–, S2+). For a given pigeon, S1 was always the same from session to session. The results indicated that the pigeons made two distinct types of errors: anticipatory errors (choosing S2 prior to the reversal) and perseverative errors (choosing S1 after the reversal). This pattern of results suggests that the pigeons were not very sensitive to the feedback from the outcome of the most recent trial(s) and that they were using the number of trials or the passage of time into the session as a cue to estimate the point of the reversal. That is, although use of the number of trials or the time into the session resulted in reasonable overall choice accuracy (near 90 % correct), they responded suboptimally because they did not receive as much reinforcement as they could have had, had they used the cues provided by their recent history of reinforcement (a single error).

In an attempt to reduce the relevance of time into the session as a dominant cue, Rayburn-Reeves et al. (2011) varied the location of the reversal in the session in an unpredictable manner from session to session. Specifically, in each session, the reversal could occur at one of five different locations in each session (after Trial 10, 25, 40, 55, or 70). Surprisingly, after 100 sessions of training, they found that when the reversal occurred early in the session, the pigeons showed a large number of perseverative errors but few anticipatory errors, and when the reversal occurred late in the session, the pigeons showed a large number of anticipatory errors but few perseverative errors. In addition, the total number of errors increased, the further from the middle of the session the reversal occurred. Thus, even when the reversal location was unpredictable, the number of trials or the time into the session continued to influence sensitivity to the reversal. Furthermore, when the reversal occurred at an unpredictable point in the session, the total number of errors actually increased relative to when the reversal occurred at a predictable point in the session. That is, in part, the pigeons appeared to continue to use time or trials into the session as a cue to reverse, suggesting that their ability to use cues provided by changes in the value of the visual stimuli was somehow impaired.

It is possible, however, that because the visual cues alternated randomly between the two locations (left and right), the only relevant cue for the pigeons was whether a response to the previously chosen color was reinforced. In this respect, optimal choice required that the spatial location of the stimulus associated with reinforcement had to be ignored. To encourage the pigeons to use the stimulus selected and the outcome of the previous trial as the basis for choice on the current trial, Rayburn-Reeves, Stagner, Kirk, and Zentall (2013) converted the task to a spatial (left, right) midsession reversal discrimination, rather than the color midsession reversal discrimination that had been used by Rayburn-Reeves et al. (2011); however, once again, the pigeons made many anticipatory errors just prior to the reversal and many perseverative errors after the reversal. Another hypothesis for the pigeons’ poor performance with this task was that a component of the key-peck response remained under the control of time into the session or trial number. To determine whether the key-pecking response was responsible for the pigeons’ inability to use the prior response and its outcome as the basis for choice, Stagner, Michler, Rayburn-Reeves, Laude, and Zentall (2013) had the pigeons use a spatial treadle response rather than the key-peck response that had been used earlier. Once again, the pigeons were relatively insensitive to the change in contingency.

Recently, Rayburn-Reeves et al. (2013) proposed that the reversal task for pigeons involved memory for the response made on the previous trial and the outcome of that response, maintained in memory over the 5-s intertrial interval that had been used in previous research. They suggested that the task could be thought of as a biconditional discrimination in which a 5-s delay had been inserted between the sample stimulus (consisting of the stimulus chosen and the outcome of that response) and the comparison stimuli (the stimuli presented on the following trial). If the intertrial interval was at least partially responsible for the errors around the reversal, Rayburn-Reeves, Laude, and Zentall suggested that manipulation of the duration of the intertrial interval might affect the error rate before and after the reversal. Consistent with this hypothesis, they found that in a spatial version of the midsession reversal task, when the intertrial interval was 5 or 10 s, the pigeons were less sensitive to the reversal, whereas when the intertrial interval was only 1.5 s, near-optimal choice was found. That is, the pigeons appeared to be using their recent reinforcement history, as opposed to or in addition to the time or number of trials into the session, as a cue to reverse their choice. Rayburn-Reeves, Laude, and Zentall concluded that when the intertrial interval was sufficiently short, the pigeons were able to use their memory for the previous response as well as the outcome of that response as the basis for the choice on the current trial, and that this pattern of responding approximated a win–stay/lose–shift strategy.

However, the improved performance that resulted from the spatial discrimination with very short intertrial intervals also may have resulted from a repetitive response pattern involving the S1 location and the feeder (e.g., peck left, eat, peck left, eat, peck left, eat, . . .) that was interrupted by the omission of reinforcement on Trial 41 (i.e., the first trial on which choice of S1 was no longer reinforced). When the intertrial interval was 5 s, the reason that the pigeons’ accuracy on the spatial midsession reversal was so poor may have been because the time between trials was sufficient for the pigeons to engage in interfering behavior (i.e., behaviors other than the correct response based on the feedback from the last trial).

The purpose of the present experiment was to test the repetitive-response-pattern hypothesis by replicating the intertrial-interval manipulation used by Rayburn-Reeves et al. (2013) using a visual (color) midsession reversal task. In the present experiment, pigeons were trained on a midsession reversal involving either a spatial or a color discrimination and either a 1.5-s or a 5.0-s intertrial interval. If poor memory for the stimulus to which a response had been made on the preceding trial and the outcome of that response was responsible for the large number of anticipatory and perseverative errors when the intertrial interval was 5.0 s, then we should see fewer errors with both the spatial and color discriminations when the intertrial interval was short (i.e., 1.5 s) than when it was long (i.e., 5 s). On the other hand, if the improved performance with the short 1.5-s intertrial interval found by Rayburn-Reeves, Laude, and Zentall was due to the repetitive response pattern, then the short intertrial interval should reduce errors in the spatial task but not in the other three conditions. This outcome would be predicted because in the color discrimination task, the location of the correct response requires that the pigeon track the positive discriminative stimulus, which can appear on either the left or the right response key from trial to trial. That is, no repetitive response pattern would be possible, as it is in the spatial version of the task.

Method

Subjects

A total of 16 unsexed White Carneaux pigeons (ages 3–12 years) purchased from the Palmetto Pigeon Plant, Sumter, South Carolina, served as subjects. All of the pigeons had had experience with successive color discriminations, but not with simultaneous discriminations or with a reversal-learning task. The subjects were maintained at 85 % of their free-feeding weight and were individually housed in wire cages with free access to water and grit in a colony room that was maintained on a 12-h/12-h light/dark cycle. The pigeons were cared for in accordance with University of Kentucky animal care guidelines.

Apparatus

The experiment took place in a BRS/LVE (Laurel, MD) standard sound-attenuating operant test chamber measuring 34 cm high, 30 cm wide, and 35 cm across the response panel. Three circular response keys (2.54-cm diameter) were horizontally aligned on the response panel (spaced 6.0 cm apart from edge to edge) and were located 25 cm from the floor. A 12-stimulus in-line projector (Industrial Electronics Engineering, Van Nuys, CA) with 28-V, 0.1-A lamps (GE 1820) was mounted behind the left and right response keys to project green and red hues (Kodak Wratten Filter Nos. 2 and 60, respectively). Reinforcement consisted of 1.5-s access to mixed grain (Purina Pro Grains, a mixture of corn, wheat, peas, kaffir, and vetch) that was provided from a food hopper. A 28-V, 0.04-A lamp illuminated the hopper when reinforcement was delivered. Experimental events were controlled by a microcomputer and interface located in an adjacent room.

Procedure

Subjects were randomly assigned to one of four conditions (n = 4 in each group) that varied in the duration of the intertrial interval (ITI; 1.5 or 5.0 s) and the discrimination type (visual or spatial): Group 1.5–Spatial, Group 5–Spatial, Group 1.5–Visual, and Group 5–Visual.

For subjects assigned to the spatial versions of the task, at the start of each trial, both the left and right response keys were illuminated, for example, green (for the other half of the subjects, red key lights were used). For half of the subjects, a single peck to the left side key (i.e., the first correct stimulus, or S1) turned off both key lights, started the ITI (1.5 s for pigeons in Group 1.5–Spatial, and 5.0 s for pigeons in Group 5–Spatial), and provided reinforcement consisting of 1.5-s access to mixed grain. A response to the right key (i.e., the second correct stimulus, or S2) turned off both keys and resulted in the ITI alone. For the other half of the pigeons, initial choice of the right key (S1) was reinforced, and the left key (S2) was not. For the first 40 trials of each 80-trial session, the subjects were trained with S1+/S2−. On Trial 41 a reversal occurred, such that for Trials 41–80, choice of the previously nonreinforced spatial location was reinforced (S2+/S1−).

Pigeons assigned to the visual version of the task were presented with red and green hues, illuminated on the left and right response keys. The positive discriminative stimulus (S1) appeared randomly on either the left or the right key from trial to trial. For half of the subjects, a response to the red key (S1) turned off both keys, started the ITI (1.5 s for pigeons in Group 1.5–Visual, and 5.0 s for pigeons in Group 5–Visual), and provided reinforcement consisting of 1.5-s access to mixed grain. Responding to green (S2) turned off both keys and resulted in the ITI alone. For the other half of the subjects, choice of the green key (S1), not the red key (S2), was initially reinforced. For the first 40 trials of each 80-trial session, subjects were trained with S1+/S2−, and on Trial 41 a reversal occurred, such that for Trials 41–80, the previously nonreinforced color was reinforced (S2+/S1−). All subjects were trained for 60 sessions.

Results

Acquisition

In line with our previous research (Rayburn-Reeves et al., 2013, 2011), in approximately 20–30 sessions, the discrimination reversals were acquired to a level at which accuracy changed little with continued training. The pigeons in both spatial groups acquired the midsession reversals very quickly, whereas the pigeons in the two visual groups were slower to acquire the midsession reversals. In fact, from the first session of training, pigeons in the two spatial groups performed the midsession reversal at better than 90 % correct. To compare the rates of acquisition among the four groups, a three-factor mixed-model analysis of variance (ANOVA) was conducted on the total percentages of choices correct for each session with ITI Duration (1.5 or 5.0 s) and Task Type (visual or spatial) as between-subjects factors, with repeated measures on the third factor, Session (1–30). The analysis indicated a main effect of session, indicative of an increase in overall accuracy with training, F(29, 348) = 3.39, p < .001. We also observed a main effect of task type, F(1, 12) = 16.97, p = .001, which was qualified by a significant Task Type × Session interaction, F(29, 348) = 2.83, p < .0001, indicating that the spatial task was acquired at a faster rate than the visual task (see Fig. 1). None of the other terms in the model was statistically significant (all ps > .25). Minimal improvements in task accuracy were observed over the subsequent 30 sessions. When the data were pooled over the last 20 sessions, a one-way ANOVA revealed that terminal, overall performance was not significantly different as a function of condition: Group 1.5–Spatial, M = 94.1, SE = 1.42; Group 1.5–Visual, M = 88.0, SE = 2.47; Group 5–Visual, M = 89.5, SE = 1.68; and Group 5–Spatial, M = 89.4, SE = 0.89; F(3, 12) = 2.44, p = .11.

Fig. 1
figure 1

Acquisition (Sessions 1–30) by pigeons of a spatial midsession reversal with a 5.0-s ITI (Group 5–Spatial) or a 1.5-s ITI (Group 1.5–Spatial), or of a visual (color) midsession reversal with a 5.0-s ITI (Group 5–Visual) or a 1.5-s ITI (Group 1.5–Visual)

Sensitivity to the reversal

The percentage of choices of the first correct stimulus (S1) as a function of trial number (in blocks of five trials) averaged over subjects for the last 20 training sessions (Sessions 41–60) can be found in Fig. 2. As can be seen in the figure, Group 1.5–Spatial appears to have greater sensitivity to the reversal, whereas the other three groups show considerably less sensitivity to the change in contingency. On the other hand, the four groups do show comparable degrees of insensitivity to the reversal.

Fig. 2
figure 2

Asymptotic choice of the stimulus that was correct on the first 40 trials of each session plotted in blocks of five trials for pigeons trained on a spatial midsession reversal with a 5.0-s ITIs (Group 5–Spatial), or a 1.5-s ITIs(Group 1.5–Spatial) and for pigeons trained on a visual (color) midsession reversal with a 5.0-s ITIs (Group 5–Visual), or a 1.5-s ITIs (Group 1.5–Visual)

Trial-by-trial analysis

To more closely examine each group of pigeons’ sensitivity to the reversal, we examined the trial-by-trial accuracy of each group on the trials immediately before and after the reversal (see Fig. 3). As can be seen in Fig. 3, the pigeons in the spatial groups made fewer anticipatory errors than did the pigeons in the visual groups, and the pigeons in the 1.5-s groups made somewhat fewer anticipatory errors than did the pigeons in the 5.0-s groups; however, the groups did not appear to differ in the numbers of perseverative errors made. To determine whether the groups differed in their sensitivity to the feedback from the reversal, a one-way ANOVA was conducted on the difference between the average of the percentage choices of the first correct stimulus (S1) on Trials 37–41 (the five trials that occurred just prior to the feedback from the first reversal trial) and on Trials 42–46 (the first five trials following the reversal), involving the data from all four conditions. The analysis indicated an overall effect of condition, F(3, 12) = 10.61, p = .001. A planned comparison was then conducted comparing the sensitivity to the reversal (the difference between the averages of the percentages of choices of the first correct stimulus (S1) on Trials 37–41 and Trials 42–46) for Group 1.5–Spatial (M = 71.8, SE = 8.57) to the other three groups: Group 1.5–Visual (M = 30.0, SE = 7.82), Group 5–Spatial (M = 42.2, SE = 4.75), and Group 5–Visual (M = 25.8, SE = 2.29). The analysis revealed that pigeons in Group 1.5–Spatial were significantly more sensitive to the change in contingency, F(1, 12) = 28.23, p < .05 (see Fig. 3).

Fig. 3
figure 3

Asymptotic choice of the stimulus that was correct on the first 40 trials of each session for the final five trials before and first five trials after the reversal of each session (trial by trial for Trials 37–46) for pigeons trained on a spatial midsession reversal with a 5.0-s ITIs (Group 5–Spatial), or a 1.5-s ITIs (Group 1.5–Spatial) for pigeons trained on a visual (color) midsession reversal with a 5.0-s ITIs (Group 5–Visual), and or a 1.5-s ITIs (Group 1.5–Visual)

A further planned comparison was conducted comparing Group 5–Spatial with the two visual groups (Group 1.5–Visual and Group 5–Visual). The analysis revealed that the difference was not statistically significant, F(1, 12) = 3.39, p = .09. Finally, a planned comparison was conducted comparing Group 1.5–Visual with Group 5–Visual, and the analysis revealed that the difference was not statistically significant, p > .05.

Discussion

Rayburn-Reeves et al. (2011) found that when pigeons were trained on a color-discrimination midsession reversal with 5-s ITIs, the subjects were relatively insensitive to the point of the reversal. The results of the present study confirm the earlier finding that when pigeons are trained on a midsession reversal involving a visual (color) discrimination with an ITI of 5 s, the reversal is guided largely by time or the number of trials into the session, rather than by the local feedback cues resulting from the stimulus and outcome from the last response made. In later research, it was hypothesized that the reversal might be made more salient if the discrimination was spatial (Rayburn-Reeves et al., 2013), but only a small decline in anticipatory errors was found. Rayburn-Reeves, Laude, and Zentall suggested that the task could be considered a biconditional discrimination, with the stimulus selected on a given trial and the outcome of the choice serving as the biconditional sample and the 5.0-s ITI serving as a delay between the offset of the sample and the onset of the comparison stimuli. When viewed in this way, the relatively poor accuracy prior to and following the reversal could be attributed largely to the 5.0-s ITI (see, e.g., Randall & Zentall, 1997). If this characterization is correct, Rayburn-Reeves, Laude, and Zentall reasoned that anticipatory and perseverative errors might be reduced by shortening the ITI. In fact, when they shortened the ITI to 1.5 s, no evidence of systematic anticipatory errors emerged, and perseverative errors were largely eliminated, as well.

In the present experiment, we manipulated the nature of the task (spatial or visual) and the intertrial interval (1.5 or 5.0 s), and we found that in line with our hypothesis, pigeons trained with a 1.5-s ITI showed fewer errors around the reversal than did the other three groups: Group 1.5–Visual, Group 5–Visual, and Group 5–Spatial. Consistent with earlier research (Rayburn-Reeves et al., 2013), pigeons trained on the spatial discrimination midsession reversal with short ITIs showed a pattern of errors that approximated a win–stay/lose–shift strategy.

The results of the present experiment suggest that with the visual-discrimination midsession reversal with relatively long ITIs used by Rayburn-Reeves et al. (2011), poor memory for the events from the preceding trial was largely responsible for the reliance on time or number of trials into the session as a cue to reverse (but see McMillan & Roberts, 2012, who found that their pigeons showed evidence of timing when they increased or decreased the ITI). The advantage of the spatial discrimination was that proprioceptive and kinesthetic cues provided by pecking the key and moving from the key to the feeder provided relevant cues that, together with the outcome of the choice, were sufficient to guide choice on the following trial. Furthermore, the shorter ITI greatly reduced the memory load for the events from the previous trial. This suggests that when the difficulty of the biconditional discrimination is increased by adding time between the events that cue future reinforcement, pigeons tend to rely more heavily on the passage of time or the number of trials within the session, which influence the pigeon’s choice more than the cues from the preceding trial.

Although the conditions in which pigeons continue to make errors around the reversal could have involved the use of either time or trials into the session as a cue to estimate the reversal point, the results reported by Rayburn-Reeves et al. (2013) suggest that estimation of trial number was the more likely cue. In their study, they found that manipulation of the ITI resulted in near-identical anticipatory and perseverative error rates among pigeons that had 5.0-s ITIs and 10.0-s ITIs. Had the pigeons been using time into the session as a cue to reverse, one would have expected timing to the middle of the session to have been somewhat poorer when the ITI was 10 s than when it was only 5 s; however, estimation of the number of trials into the session would be expected to be about the same in both conditions.

When Rayburn-Reeves et al. (2011) reported that pigeons are relatively insensitive to the local cues provided in the midsession reversal, they focused on the presumed simplicity of the simultaneous visual discrimination, and failed to consider the contribution of the 5.0-s ITI. When, in the present experiment, we simplified the discrimination by making it spatial and shortened the ITI from 5.0 s to 1.5 s, performance on the midsession reversal approached ideal win–stay/lose–shift performance.

It is interesting to note that rats trained on a spatial discrimination midsession reversal have shown virtually no anticipatory errors, even when the ITI was relatively long (5.0 s). The difference between rats and pigeons on this version of the task may be related to inherent differences in the abilities of the animals to benefit from experience with tasks involving multiple reversals (serial reversal tasks; see Bitterman, 1975), but it is also possible that differences in the ways that the two species make their responses contribute to the differences found. Pigeons must peck and eat with their beaks. Thus, they must move their head away from the key when they eat, and the 5.0-s ITI introduces a memory load for the response key last pecked. Rats, on the other hand, press the lever with their paw and can maintain contact with the lever (or stay in close proximity to it) when they eat from the magazine. Thus, the spatial location of their paw between trials can serve as a salient cue for the next response 5.0 s later, and at the point of the reversal (Trial 41), the absence of reinforcement serves as a salient cue to press the other lever. For the pigeons, the short, 1.5-s ITI reduces the memory load and allows for a continuous sequence of key-peck, consummatory response, and back to the pecking key for the pigeon, disrupted only by the absence of reinforcement on Trial 41. Thus, the combination of a spatial midsession reversal together with a short ITI allowed the pigeons to show a near win–stay/lose–shift reversal pattern.

With the short, 1.5-s ITI, the pigeons should have had little difficulty remembering both the color of the stimulus pecked from the preceding trial and the outcome of that choice. The fact that they could not do so efficiently with the visual discrimination reversals suggests that the repetitive response pattern involving the S1 location and the feeder was what allowed the pigeons to overcome their tendency to use the time into the session or an estimate of the number of trials experienced as a cue to begin to reverse, even though that pattern of choice was not as efficient as the more optimal win–stay/lose–shift response pattern. Thus pigeons’ performance on the repeated midsession reversal continues to present animal learning researchers with a curious pattern of responding.