Behavioral flexibility allows organisms to adapt to environmental change quickly, while avoiding repeated negative consequences resulting from the use of obsolete strategies. When humans are exposed to a serial reversal task in which the valences of the stimuli in a simultaneous discrimination change abruptly, we often quickly see the development of what might be described as a win–stay/lose–shift response rule. Specifically, following the first instance in which a choice is not reinforced, humans will learn to choose the previously nonreinforced alternative. This type of cognitive flexibility, or ability to change behavior in accordance with changes in the environment, has been suggested to be positively correlated with intelligence (Bitterman, 1965).

Early work investigating behavioral flexibility used reversal learning tasks in which a subject is given a simultaneous discrimination in which responses to a positive stimulus (S+) are reinforced and responses to a negative stimulus (S−) are not. At a point determined by an acquisition criterion or number of training trials, the discrimination is reversed (the S+ becomes the S− and the S− becomes the S+). How quickly the subjects learn to respond consistently to the former S− following the reversal and the degree to which the number of errors per reversal decrease with successive reversals have been taken as measures of behavioral flexibility (Bitterman, 1965). Differences found among species in the degree of improvement with successive reversals have been taken as a measure of flexibility (Bitterman & Mackintosh, 1969; Mackintosh, 1969).

The logic involved in the use of improvement in serial reversal learning with increasing reversals as a measure of behavioral flexibility is that it measures the improvement in reversal performance relative to the degree of difficulty of the original discrimination. That is, it controls for the difficulty of the original discrimination, which may depend on the sensory apparatus of the species (e.g., pigeons are much more visual than rats). However, there is evidence that procedural variables can affect not only the degree of difficulty of the original discrimination, but also the improvement in reversal learning with successive reversals (Warren, 1965).

In a variation of the serial reversal procedure, each session begins with a consistent S+ and S−, and the reversal occurs halfway through the session (Rayburn-Reeves, Stagner, Kirk, & Zentall, in press; see also Cook & Rosen, 2010). The question asked is how animals deal with the fact that the reversal occurs at a predictable point in each session. Interestingly, when this task has been used with pigeons, they begin to anticipate the reversal well before it occurs, and once the reversal occurs, they typically continue to perseverate by choosing the former S+ .

Rayburn-Reeves et al. (in press) reasoned that the fact that the reversal occurred at the midpoint of the session may have encouraged the pigeons to use the passage of time from the start of the session as a cue for reversal (although not a very efficient cue), rather than the feedback from the first error to the original S+. To discourage pigeons from timing, they varied the point of reversal within a session, randomly across sessions. The results indicated that, although the pigeons had no training with the reversal exclusively at the midpoint of the session and now the point of the reversal was much less predictable, the pigeons continued to use the passage of time as the primary cue for reversal, resulting in still poorer task accuracy. Specifically, when the reversal point occurred early in a test session, the pigeons committed few anticipatory errors, but they committed many more perseverative errors. But when the reversal occurred late in a test session, the pigeons committed many more anticipatory errors but few perseverative errors. It appeared as if the pigeons were averaging over all of the reversal points experienced and continued to use time into the session as a cue for reversal. To make the pigeons’ choice more salient and memorable to them, Rayburn-Reeves, Molet, and Zentall (2011) required pigeons to indicate their choice of stimulus by pecking it 20 times, but they found no improvement in either anticipatory or perseverative errors.

Even more interesting, perhaps, is the fact that rats are much more efficient at this task than are pigeons (Rayburn-Reeves et al., in press). When given a spatial version of this task (e.g., pressing the left lever but not the right lever provides a pellet of food for the first 40 trials of each session, and then pressing the right lever but not the left lever provides a pellet for the remaining 40 trials), the rats showed no evidence of using the passage of time as a cue for reversal, and they approached a win–stay/lose–shift response strategy, showing no anticipatory responding and virtually no perseverative responding. Furthermore, the rats transferred readily to varying the point of the reversal within the session, making it unpredictable, and after a very few sessions in which to adjust, they transferred to two and then three reversals per session.

One might hypothesize that the spatial discrimination and reversal for the rats was easier to acquire than the visual (color) discrimination and reversal for the pigeons and that difference could have accounted for the species difference. However, when pigeons were trained on a spatial discrimination and reversal, they did no better than the pigeons that were trained on the visual discrimination (Rayburn-Reeves et al., in press).

The purpose of the present experiments was to examine several variables that might encourage pigeons to use the feedback from the preceding trial as a choice cue and rely less on the passage of time as a cue to determine when to reverse their choice. The first variable that we examined was the effect of the difference between the pigeons’ key peck and the rats’ leverpress. That is, there is evidence that the pigeon’s key peck response consists of two components, an operant component and a Pavlovian component (Gamzu & Schwartz, 1973). The Pavlovian component is elicited by the signaling value of the stimulus (pecking that occurs with an autoshaping procedure; Brown & Jenkins, 1968). In the rats’ leverpressing response, no such Pavlovian component leading to leverpressing has been found. It may be that those Pavlovian pecks are not as sensitive to the outcome of the preceding trial as are the operant pecks. If so, the difference between rats and pigeons may be in the nature of the response that they make (i.e., the difference between making the response with the beak for pigeons and with the paw for rats). In Experiment 1, we asked whether, if the pigeons’ response consisted of stepping on a treadle rather than pecking a response key, it would result in a different pattern of anticipatory and perseverative errors.

In Experiment 2, we asked whether an irrelevant salient event that signaled the reversal would alter the pattern of anticipatory and perseverative errors. To help answer this question, we added four simultaneous discrimination trials involving colors different from the original colors involved in the original discrimination and the reversal. The appearance of trials involving different colors could serve as a signal that the reversal was about to occur, and they could have an effect not only on perseverative errors, but also on anticipatory errors. They possibly could also encourage the development of a win–stay/lose–shift response rule.

In Experiment 2, we also asked whether making errors a bit more aversive would alter the pattern of errors. To make the errors more aversive, a single peck to the incorrect color resulted in the offset of the correct color, while the incorrect color remained on for an additional 5 s (a time-out period).

Experiment 1

The purpose of Experiment 1 was to make the method used to test pigeons more similar to that used with rats by Rayburn-Reeves et al. (in press) to determine whether the pecking response for pigeons was responsible for the use of temporal cues, rather than the outcome from the preceding trial, as a cue to switch from choosing one discriminative stimulus to the other. Thus, in Experiment1, we gave pigeons a spatial reversal learning task that required them to step on either a left or a right treadle, rather than to peck at a left or a right response key, to make their response.

Method

Subjects

Eight White Carneaux pigeons (Columba livia) ranging in age from 2 to 12 years served as subjects. All subjects had been given experience in a previous study involving a simultaneous color discrimination, but they had never been exposed to a discrimination reversal procedure. Subjects were maintained at 85 % of their free-feeding weight throughout the experiment and were individually housed in wire cages with free access to water and grit in a colony room that was maintained on a 12:12-h light:dark cycle. The pigeons were maintained in accordance with a protocol approved by the Institutional Animal Care and Use Committee at the University of Kentucky.

Apparatus

The experiment was conducted in an operant chamber (Coulbourn Instruments, Lehigh Valley, PA) measuring 25.7 cm across the response panel, 33 cm from ceiling to floor, and 31 cm from response panel to the back wall. The chamber had a white houselight, centered on the response panel and located 1.3 cm from the ceiling. A pellet dispenser delivered pellets (45-mg grain-based pigeon pellets, Bioserv, Frenchtown, NJ) to a food well that was centered on the response panel, 5.6 cm from the floor. Two response treadles, 5.08 cm wide and 5.08 cm deep, were located on either side of the food well, located 5.08 cm from the side walls, respectively, and 0.64 cm from the floor. The experimental chamber was located in a small isolated room to reduce extraneous visual and auditory stimulation. The experiment was controlled by a microcomputer and interface located in an adjacent room.

Procedure

Pigeons were initially shaped to step on each treadle by the method of successive approximation. At the start of each experimental session, the houselight was illuminated, indicating that both treadles were operable. For half of the subjects, a single response to the left treadle (S1) resulted in the feeder light turning on and a single pellet being delivered to the food well. After 2 s, both the feeder light and houselight turned off for a 3-s dark intertrial interval (ITI). If the pigeon chose the right treadle (S2), the houselight turned off for a 5-s dark ITI, and no food was delivered. Immediately following the ITI, the houselight turned on, indicating the start of the next trial. After 40 trials, the contingencies were reversed for the last 40 trials such that responses to S2 were reinforced and responses to S1 were no longer reinforced. For the other half of the subjects, choice of the right treadle (S1), and not the left (S2), was reinforced for the first half of each session. Subjects were trained for 50 sessions.

Results and discussion

Pigeons reached stable choice accuracy in about 30 sessions of training. Asymptotic performance for sessions 41–50 can be viewed in Fig. 1. Also appearing in Fig. 1 are the results of the spatial discrimination reversal task using key pecking (from Rayburn-Reeves et al., 2011). In Fig. 1, it can be seen that the pigeons in Experiment 1 did not perform as well as those in the Rayburn-Reeves et al. (2011) procedure. Relative to pigeons in the spatial key-peck discrimination task, pigeons in the present treadle response study made more anticipatory errors during the first half of sessions 41–50 and more perseverative errors during the last half of those sessions. Overall, for sessions 41–50, pigeons in the present experiment were 79.4 % correct, whereas those in Rayburn-Reeves et al. (2011) were at 89.6 % correct, t(16) = 3.49, p = .003. In the present experiment, pigeons chose S2 before it was correct over 28.8 % of the time (trials 36–40), and they continued to choose S1 53.0 % of the time after the reversal (trials 41–45).

Fig. 1
figure 1

Experiment 1: Percentage choice of S1 as a function of five-trial block number averaged over pigeons for sessions 41–50 (solid circles), as compared with spatial reversal data (open circles) from Rayburn-Reeves, Stagner, Kirk, and Zentall (in press). The dotted line indicates the point at which the reversal occurred in the session

A more detailed presentation of the data appears in Fig. 2, in which trial-by-trial data from the five trials before the reversal (trials 36–40) and the five trials after the reversal (trials 41–45) are presented for the same data that appear in Fig. 1. On trial 41, the first reversal trial and the first trial that provided feedback about the reversal, pigeons chose the previously correct stimulus 56.2 % of the time. On trial 42, the first trial following feedback on which the reversal had occurred, pigeons still chose the previously correct stimulus 55.0 % of the time. Thus, there was very little effect of the feedback from the first reversal trial.

Fig. 2
figure 2

Experiment 1: Percentage choice of the first correct stimulus (S1) as a function of trial number for trials 36–45, averaged over subjects, for sessions 41–50 (closed circles), as compared with spatial reversal data (open circles) from Rayburn-Reeves, Stagner, Kirk, and Zentall (in press). The dotted line indicates the point at which the reversal occurred in the session

The results of Experiment 1 indicate that having the pigeons respond by stepping on a treadle, rather than pecking a response key, did not improve choice accuracy on the midsession spatial reversal. In fact, judging from the overall error rate, the pigeons had more trouble with the treadle discrimination reversal than with the spatial key reversal.

Experiment 2

Since the use of spatial discriminations did not improve reversal accuracy either in Rayburn-Reeves et al. (in press) or in Experiment 1 of the present research, in Experiment 2, we returned to a simultaneous color discrimination reversal procedure. In Experiment 2, in an effort to make the point of the reversal more salient, for one group (irrelevant trials), we inserted 4 discrimination trials involving a blue/yellow discrimination unrelated to the red/green discrimination used on the first 40 and last 40 trials. Our rationale for inserting irrelevant trials at the point of the reversal was that it might help to signal the reversal and, thus, facilitate detection of the change in contingency. Thus, we expected that the insertion of irrelevant trials might reduce the number of perseverative errors. We were also interested in whether the expectation of the appearance of irrelevant trials might reduce the number of anticipatory errors. If the pigeons learned that an irrelevant stimulus discrimination would be presented, they might forgo using the passage of time as a cue and wait for the irrelevant trials to stop choosing S1 and begin choosing S2.

In previous research with this midsession reversal procedure, one reason that pigeons continued to use the passage of time as a cue is that the consequences of making an error may not have been sufficiently nonrewarding to discourage errors. With this procedure, errors merely result in termination of the stimuli and a 5-s ITI, prior to the start of the next trial. Previous research with matching-to-sample procedures has shown that if comparison choice errors result in maintaining the stimulus display for several seconds (a form of added time out), acquisition of matching can be facilitated (Martin & Zentall, 2005; Strength & Zentall, 1991). For this reason, in Experiment 2, we added a group for which there was mild negative punishment for errors (the failure of the S− stimulus to turn off for a limited time). Our hypothesis was that this procedure might encourage the pigeons either to be more careful in making their choices or to review incorrect choices after they were made and learn to rely more on their memory for the previous stimulus selected and the outcome obtained as a cue for reversal, rather than on the time from the start of the session. In addition, adding a time-out for making errors should make the duration of the trials more variable and, thus, should make it more difficult for the pigeons to use the total time from the start of the session to the reversal as a cue for reversal. Thus, in Experiment 2, for the time-out group, we added a 5-s time-out following each incorrect choice. During the time-out, the correct stimulus was turned off, and the incorrect stimulus remained on for 5 s.

Method

Subjects

Twenty-one White Carneaux pigeons (Columba livia) similar in age and experience to those used in Experiment 1 served as subjects. They were all treated as were the pigeons in Experiment 1.

Apparatus

A standard (LVE/BRS, Laurel, MD) test chamber was used, with inside measurements 35 cm high, 30 cm long, and 35 cm across the response panel. The response panel in the chamber had a horizontal row of three response keys, 25 cm above the floor. The rectangular keys (2.5 cm high × 3.0 cm wide) were separated from each other by 1.0 cm, and behind each key was a 12-stimulus inline projector (Industrial Electronics Engineering, Van Nuys, CA) that projected red, yellow, blue, and green (Kodak Wratten Filter Nos. 26, 9, 38, and 60, respectively). In the chamber, the bottom of the center-mounted feeder (filled with Purina Pro Grains) was 9.5 cm from the floor. When the feeder was raised, it was illuminated by a 28-V, 0.04-A lamp. A 28-V 0.1-A houselight was centered above the response panel, and an exhaust fan was mounted on the outside of the chamber to mask extraneous noise. A microcomputer in the adjacent room controlled the experiment.

Procedure

For pigeons in the control group (n = 7), red and green hues were illuminated on the left and right response keys randomly from trial to trial to indicate the beginning of a trial. For half of the subjects, a response to the red key (S1) turned off both keys and resulted in 1.5-s access to food, followed by a 3.5-s dark ITI, whereas a response to green (S2) immediately turned off both stimuli and resulted in a 5-s dark ITI. For the other half of the subjects, choice of the green key (S1) was reinforced, not the red key (S2). For the first 40 trials of each session, all subjects were trained with S1+/S2−. For trials 41–80, the contingencies were reversed (S2+/S1−).

Subjects in the irrelevant-trial group (n = 8), were treated similarly to pigeons in the control group, with the following exception: Following trial 40, there were four irrelevant trials on which blue and yellow hues were randomly presented on the left and right keys and a response to yellow was always reinforced. Following the four irrelevant blue/yellow discrimination trials, the same red/green reversal contingency was in effect as for the control group (S2+/S1−) for trials 41–80 of the red–green discrimination.

Subjects in the time-out group (n = 6) were treated similarly to pigeons in the control group, with the following exception: Following an error, the correct stimulus was turned off, and the incorrect stimulus remained on for 5 s, after which the incorrect stimulus was turned off and a 5-s ITI began. All subjects were trained for 80 sessions.

Results and discussion

In Experiment 2, the control group performed much like pigeons in the reversal procedure conducted by Rayburn-Reeves et al. (2011). The percentage choice of S1 as a function of block number, pooled over sessions 71–80, is plotted in Fig. 3 in five trial blocks, for all three groups. On trials 36–40 (the last trials prior to the reversal), the control group chose S1 76 % of the time, and on trials 41–45 (the first trials after the reversal), they chose S1 43.7 % of the time. The irrelevant-trial group chose S1 76 % of the time on trials 36–40 (same as the controls), and they chose S1 28.5 % of the time on trials 41–45, a 15.2 % statistically significant difference from controls, t(13) = 3.0, p = .01. The time-out group chose S1 72 % of the time on trials 36–40 and 30.7 % on trials 41–45, 13 % better than the controls, but the difference was not quite significant, t(11) = 2.12, p = .06.

Fig. 3
figure 3

Experiment 2: Percentage choice of S1 as a function of block number averaged over pigeons in the control group (open circles), time-out group (solid circles), and irrelevant-trial group (solid circles, dashed line) for sessions 71–80. The dotted line indicates the point at which the reversal occurred in the session

Examination of trial-by-trial data for the red–green discrimination (trials 36–45) provides a better comparison of the difference in performance by the groups (see Fig. 4). On trial 41, the first trial of the reversal, choice of S1 was 62.9 % for the control group and 65 % for the time-out group, whereas the irrelevant-trial group chose S1 only 43.8 % of the time. The difference between the control group and the irrelevant-trial group on trial 41 was statistically significant, t(19) = 2.63, p = .016. Thus, the effect of the four irrelevant trials was to serve as an effective signal for the reversal. When errors in choice of S1 were pooled over the first four postreversal trials, the difference between the irrelevant-trial group (43.9 %) and control group (37.8 %) remained, t(13) = 3.23, p = .007. Thus, the effect of the irrelevant trials was to signal the reversal, and they continued to have an effect on errors.

Fig. 4
figure 4

Experiment 2: Percentage choice of S1 as a function of trial number for trials 36–45 averaged over pigeons in the control group (open circles), time-out group (solid circles), and irrelevant-trial group (solid circles, dashed line) for sessions 71–80. The dotted line indicates the point at which the reversal occurred in the session

As was expected, the time-out had little effect on anticipatory errors, but it did influence the effect of trial 41, the first reversal trial, on trial 42 performance. Importantly, on trial 42, there was a significant difference in choice of S1 between the time-out group (20.0 %) and the control group (48.6 %), t(11) = 3.17, p = .009. Thus, although the time-out group could not predict when the reversal would occur, it did show a remarkable 45.0 % decline in choice of S1to the first reversal trial.

General discussion

Pigeons given considerable experience with a midsession task show a surprising pattern of responding. They make an increasing number of anticipatory errors as the reversal approaches, and they continue to make perseverative errors once the reversal has occurred. The pattern of errors suggests that the pigeons appear to be timing the occurrence of the reversal, rather than (or in addition to) using feedback (reinforcement and its absence) from the preceding trial(s), as a cue for reversal. Interestingly, when the point of the reversal was made variable, such that timing the point of the reversal would be even less efficient, pigeons continued to use the passage of time as a cue for reversal (Rayburn-Reeves et al., in press). These results are even more surprising given the fact that rats show a very different and more efficient human-like win–stay/lose–shift response strategy (Rayburn-Reeves et al., in press). The purpose of the present experiments was to examine several variables that might influence the reversal performance of pigeons on this midsession reversal task.

The first variable considered was the difference in response topography used by pigeons and rats. Pigeons use their beak to respond (their primary means of eating), whereas rats use their paw. We tested the hypothesis that pigeons may be able to make more efficient choices if they are required to make a response that has less of a Pavlovian component (e.g., treadle stepping). In Experiment 1, we found no evidence that the pigeons were more effective in using their choice and its outcome on the most recent trial(s) as a basis of their choice on the next trial when the required response was treadle stepping than they were when the required response was key pecking.

In Experiment 2, we signaled the reversal for one group by presenting the pigeons with four trials involving an irrelevant discrimination. Relative to a control group that did not receive those irrelevant trials, the four irrelevant trials resulted in fewer perseverative errors on the first reversal trial, and the benefit of those irrelevant trials persisted for several additional trials.

In Experiment 2, we also tried to make errors more salient and, perhaps, more aversive by extending the trial by 5 s following an error. The effect of extending the trial following an error significantly improved the pigeons’ ability to use the feedback from the first reversal trial as a cue for reversal.

In the present research, we have referred to pigeons’ use of time into the session as the cue that they use to determine when to reverse their choice, but we have acknowledged as well that they may use an estimation of the number of trials experienced to decide when to reverse. One way to distinguish between these two alternatives would be to train pigeons with a 5-s ITI and then test them with longer and shorter ITIs. If the pigeons used time into the session as a cue for reversal, anticipation errors should occur earlier in the session when the test ITIs were shorter, and they should occur later in the session when the test ITIs were longer. Alternatively, if the pigeons based their decision to reverse on an estimation of number of trials into the session, manipulation of the duration of the ITI should have little effect on where in the session anticipation error appeared.

It has been suggested that better performance on reversal tasks is a measure of the intelligence of a species (Bitterman, 1965), but it is likely that there are other contributing factors associated with reversal learning. Perhaps the reduced sensitivity to the outcome of a preceding trial by pigeons is related to their foraging ecology. Pigeons often travel quite far to find a patch of food (e.g., a field of grain), but it is likely that the patch will not be quickly depleted, so returning to that patch may be a predisposed behavior. This tendency, together with a tendency toward neophobia may make pigeons predisposed to stay rather than switch (see, e.g., Zentall, Steirn, & Jackson-Smith, 1990). Rats, on the other hand, tend to forage in smaller patches that deplete faster, and thus, they may be predisposed to shift (see, e.g., Olton & Samuelson, 1976). The predisposition to stay within a large patch may also result in a tendency for pigeons to be less sensitive to nonreinforcement than are rats. It is likely that if a pigeon is unable to find food in a particular patch for a short time, it may not be sufficient grounds to move to a new patch. Thus, relative to rats, pigeons may be predisposed to accumulate further evidence for the depletion of a patch before moving to a new patch. Such predispositions could account for the more rapid switching behavior by rats than by pigeons, leading to faster reversal learning, but it would not account for the anticipatory errors made by pigeons with the present midsession reversal task. It may be, however, that the relative insensitivity to nonreinforcement has led pigeons to use other cues to determine when to switch to the alternative discriminative stimulus. It appears that pigeons first learn that one alternative is an effective source of food early in a session, whereas the other alternative is an effective source of food late in a session. This initial learning may encourage the pigeons to use the passage of time as the basis for when to switch. And even though it may not be the most efficient strategy, it works well enough to maintain it.

An alternative hypothesis for why pigeons do not perform more optimally with this task, even with salient cues identifying the reversal and with a time-out following an error, as in Experiment 2 of the present study, is that they have a problem remembering not only the results of their choice (reinforcement or its absence), but also the stimulus that they chose. That is, in a sense, this task can be thought of as a delayed biconditional matching task, with the sample being the successive compound consisting of the stimulus chosen on the preceding trial and the outcome resulting from the choice, the 5-s ITI being the delay, and the choice on the following trial being the comparison choice. If the pigeon makes an error, it must remember what it did and what was the result of its choice over the 5-s ITI. If the pigeon’s problem is one of memory for the prior stimulus chosen and the outcome (i.e., the sample), if one shortened the ITI, one should find that more optimal performance would result.

Of course, it may be simpler to attribute the perseverative S1 errors to within-session interference from the previous S1-reinforced trials. And although one might hypothesize that the anticipatory S2 errors could be attributed to intersession interference from the last 40 (S2-reinforced) trials from the previous session, one would have expected those errors to come earlier in the following session, rather than as the pigeons approached the end of the S1-reinforced trials. Instead, it appears that the pigeons estimated the time to the reversal point in the session. It may be, however, that errors made shortly before the midpoint of the session contributed to perseverative errors because, as was noted earlier, the pigeons would have had to remember not only the stimulus that had been selected, but also the consequences (reinforcement or its absence) of having selected that stimulus, for the preceding trial to be useful as a cue that could guide choice on the current trial. In any case, if the determinants of this suboptimal performance by pigeons can be identified, it should lead to a better understanding of their natural predisposed behavior, as well as the pigeon’s flexibility in dealing with this form of reversal learning.