A three-stimulus midsession reversal task in pigeons with visual and spatial discriminative stimuli
- 145 Downloads
In a two-stimulus visual discrimination task with a reversal in reward contingencies midway through each session, pigeons produce a surprising number of both anticipatory (i.e., before the reversal) and perseverative (i.e., after the reversal) errors. In the current work, we examined pigeons’ (Columba livia) patterns of responding on a 90-trial, three-stimulus visual or spatial discrimination task with two changes in reward contingency (one after Trial 30 and one after Trial 60) during each session. On probe sessions where pecking the first-correct stimulus was rewarded for the first 60 rather than 30 trials, pigeons on a spatial discrimination pecked the first-correct stimulus until it was no longer rewarded, while visual discrimination birds ceased responding to the first-correct stimulus even while it was still being rewarded. On probe sessions where the onset of the first trial was delayed 7 min, pigeons’ performance on the visual discrimination was disrupted by the interval delay, but performance in the spatial condition was more similar to baseline. Pigeons use different strategies (temporal control vs. local reinforcement) on midsession reversal tasks with visual versus spatial stimuli, suggesting that they are selectively permeable to changes in information (global vs. local reinforcement rates).
KeywordsPigeons Timing Midsession reversal Cue dimension
Pigeons make surprising errors on discrimination tasks with a reversal of reinforcement contingencies partway through individual sessions (Cook and Rosen 2010; Laude et al. 2014; McMillan et al. in press; McMillan and Roberts 2012; Rayburn-Reeves et al. 2011, 2013a, b; Rayburn-Reeves and Zentall 2013; Stagner et al. 2013): they make many anticipatory errors (i.e., responding to the second-correct stimulus before doing so is rewarded) and perseverative errors (i.e., responding to the first-correct stimulus after doing so is no longer rewarded). More than simply failures to use reward-efficient strategies, however, these errors have been interpreted as informative about pigeons’ use of time (Cook and Rosen 2010) and the way in which these animals perceive the underlying structure of the tasks presented to them (McMillan et al. in press).
In a typical midsession reversal task, a pigeon is presented with a choice between two options (e.g., a red key and a green key), where responses to one option are reinforced during the first half of the session and responses to the other nonreinforced; these contingencies reverse in the second half of the session. The optimal strategy for an animal to use in the midsession reversal task is win-stay, lose-shift, that is, it should attend to local reinforcement rates and only shift its responding from one stimulus to the other when the previous response fails to produce reinforcement. This is the strategy that humans have been shown to use, especially when the reversal point is unpredictable (Rayburn-Reeves et al. 2011). Conversely, pigeons are largely insensitive to changes in local reinforcement at reversal points, showing gradual shifts in choice and no significant change in behavior immediately after the first trial on which contingencies are changed. This imprecise reversal behavior holds even when effort is increased by using fixed-ratio schedules and the reversal point is made unpredictable (Rayburn-Reeves et al. 2011), and when the number of trials per session is as few as five (Rayburn-Reeves and Zentall 2013). Why pigeons fail to use local reinforcement to maximize reward in this task is not well understood, though the extent of this effect and the alternate strategies pigeons use to solve the task have been the focus of several recent studies.
Cue dimension and strategy use
Recent findings with the midsession reversal task suggest that use of a local reinforcement strategy (and thus improved overall performance) is dependent upon the modality of the stimulus discrimination. McMillan and Roberts (2012) showed that presenting red always on one side key and green on the other (i.e., a confounded visual-spatial discrimination) led to better reversal performance than in a red–green (visual) discrimination midsession reversal. Rayburn-Reeves et al. (2013a) showed that pigeons performed optimally on a spatial discrimination midsession reversal task (switching between left and right keys that were visually identical), but only with 1.5-s intertrial intervals (ITIs); this increase in optimal performance with decreasing ITIs does not hold for pigeons on visual discriminations (Laude et al. 2014). Rats have been shown to produce near-optimal performance on a spatial discrimination midsession reversal task with 5-s ITIs where pigeons do not (Rayburn-Reeves et al. 2013b), suggesting a possible species difference in midsession reversal performance. These results highlight that different experimental conditions (and, possibly, species) may lead to different performance on the task; these results have led previous researchers to suggest that pigeons do not rely on local reinforcement rates because their working memory for the previous trial’s choice and outcome is poor (Laude et al. 2014; Rayburn-Reeves and Zentall 2013). Most experiments examining the effect in pigeons have used ITIs in the 5–6 s range, which is sufficient to cause substantial working memory deficits in pigeons (Roberts 1972).
More recent work in our laboratory (McMillan et al. in press) showed near-perfect reversal performance in pigeons on a confounded visual-spatial stimulus discrimination combined with a variable midsession reversal point. Compared with previous research, our birds also showed better reversal performance with a single midsession reversal point on a spatial discrimination alone than on a visual discrimination alone (each with 6-s ITIs). Finally, we also found that rats on a T-maze spatial discrimination midsession reversal task (e.g., left rewarded for 12 trials, right rewarded for a subsequent 12 trials) showed poor reversal performance similar to pigeons on visual discrimination midsession reversal tasks, illustrating that a spatial discrimination alone is not sufficient to produce optimal midsession reversal performance. Together, these results suggest that animals may use spatially presented stimuli to ‘cheat’ the memory component of the midsession reversal task in the operant chamber, that is, they can make a response, collect reward, and then spatially orient during the ITI toward the to-be-reinforced alternative. This strategy is much more difficult to use in a T-maze than in an operant chamber because the rat spends the ITI in the start box and cannot make a choice until it reaches the choice point on the maze. By reducing or removing the demands on working memory, the animal presented with a spatial discrimination in the operant chamber is better able to use a win-stay, lose-shift strategy rather than relying on other strategies to predict the reversal.
Temporal representation of the midsession reversal session
When pigeons fail to use local reinforcement to obtain optimal reward in midsession reversal tasks, it has been shown that they are primarily using interval time to predict the reversal of reward contingencies (Cook and Rosen 2010; McMillan and Roberts 2012). Pigeons appear to time from some point at the beginning of the session, and the typically observed gradual reversal behavior across the session arises from an imperfect timing system. Where the vast majority of interval timing research has focused on timing within a single trial (or between reinforcers), timing across multiple trials (and reinforcers) is not always explicitly built into models of timing. Since temporal representations could also be used to solve the midsession reversal task (e.g., ‘It usually takes 5 min until the reinforcer contingencies reverse’), this procedure allows for study of timing across discrete trials. Interval timing is normally conceptualized as the measurement of the time between the onset of some stimulus and the delivery of reinforcement, but in midsession reversal, pigeons clearly time between the beginning of the session and a change in reinforcement contingency across many trials.
The midsession reversal task bears theoretical similarity to time-place learning tasks (e.g., Biebach et al. 1989, 1991; Daan and Koene 1981; Wilkie 1995). In these procedures, one location in a spatial arena is baited with food on sessions occurring at one time of day (e.g., morning) with other locations baited at a different time of day (e.g., in the afternoon). Over many sessions, animals will commonly search in the currently correct location, using either an ordinal representation of time (Carr and Wilkie 1997) or both interval and circadian timing (Pizzo and Crystal 2002). A midsession reversal task could also be solved through the use of a temporal representation (with S1 and S2 variably as the ‘temporally correct location’: Crystal 2009), though timescales may differ between the procedures. Indeed, anticipatory and perseverative errors similar to those seen in midsession reversal have also been recorded in time-place tasks when responses to alternatives are recorded continuously through sessions (Wilkie 1995).
Research on midsession reversal has almost entirely used only two choice alternatives, which limits understanding of how complex this temporal representation of sessions may be. In one notable exception, Cook and Rosen (2010, Experiment 2) trained pigeons on a three-phase, delayed-matching-to-sample/matching-to-oddity task, where (for example) when a red sample was followed by a choice of three test stimuli, responses to red were reinforced for Phase 1 during a session, green was reinforced in Phase 2, and blue was reinforced in Phase 3. Because this task was found to be ‘too demanding’ partway through training, the third phase was dropped. After successful acquisition of two-phase sessions, the researchers compared mean choice frequency for each alternative across time. They found that responses to the Phase 1-relevant cue (e.g., red for a red sample) dropped as responses to the Phase 2-relevant cue (e.g., green for a red sample) increased around the reversal point. Importantly, responses to the ‘irrelevant’ cue (e.g., blue for a red sample) did not rise through the session. The researchers suggested this ruled out the possibility that pigeons were simply failing to discriminate toward the middle of the session, implying instead that they were successfully task switching (matching-to-sample vs. matching-to-oddity) based on time. However, due to the complexity of the task and the failure of pigeons to learn the three-phase version, it cannot be ruled out that pigeons learned that blue was never rewarded after a red sample, but otherwise failed to encode the temporal change in discrimination of the task (i.e., errors are due to loss of stimulus control due to ambiguity at the reversal point). It is possible that pigeons’ temporal representation of midsession reversal sessions is not so complex as to include the correct sequence of stimulus–reward contingencies, but rather simply involves learning the anchoring discriminations (e.g., red at the beginning of the session and green at the end) and responding to ambiguity near the midpoint of the session.
We recently studied the ability of pigeons to track the order of stimuli across rewarded and nonrewarded temporal sequences (McMillan and Roberts 2013). We presented pigeons with a sequence of colored keys (blue, red, and green) across an interval (30–90 s) terminating either in food (for a single ‘correct’ sequence) or the intertrial interval. Pigeons responded less to the terminal stimulus (e.g., blue) when it was preceded by a sequence of stimuli predicting nonreward (e.g., green followed by red) than a sequence predicting reward (e.g., red followed by green). This result also held with sequences of five different stimuli with one rewarded sequence and multiple nonrewarded sequences. We suggested based on these results that pigeons represented the order of stimuli in time, though this representation had only weak control over behavior. However, as is common with other timing studies, pigeons in this task were only rewarded once per interval (i.e., at the end of the elapsed interval sequence). Given the observed weak-but-present control of temporal order information on behavior in pigeons, we were interested in how well the pigeons represent the ordinal/temporal structure of midsession reversal sessions.
Rationale for present research
Here, we present two experiments in which birds were trained with three stimuli presented simultaneously across three keys. In Experiment 1, each key was a different color (red, green, or blue), with spatial arrangement varying across trials and with responses to each color differentially reinforced (i.e., a visual discrimination) in different trial blocks throughout the session. In Experiment 2, all three keys were white and only responses to the spatial locations differentially reinforced (i.e., a spatial discrimination) in different trial blocks throughout the session.
Utilizing a three-stimulus reversal task offers the opportunity to understand pigeons’ behavior on this task in several key ways. Firstly, having two reversals increases the costs of using a timing strategy, as subjects make anticipatory and perseverative errors at each reversal point and thus miss more reinforcement than with a single reversal. This increased cost could lead birds to avoid a timing strategy in favor of a less error-prone reward-following strategy. Having three keys to choose from also provides more opportunity to observe what errors pigeons make near the reversal points, in much the same way as the ‘irrelevant’ response did in Cook and Rosen (2010). We were interested in whether all anticipatory and perseverative errors would be made to the next or previously correct key (respectively), or whether reversals would increase the number of errors to the currently irrelevant third option, indicating that pigeons’ errors are due to ambiguity near the reversals. All three alternatives provided equal total reinforcement across the session, and the relevance of the each key was thus only based upon the current temporal location within the trial.
It should be noted that this procedure does not feature what most would consider a ‘reversal’ as seen in previous midsession reversal studies. In purely semantic terms, the pigeons on this task do not ‘reverse’ their behavior so much as ‘transition’ from reinforcement for pecking one particular stimulus to pecking one of the other stimuli. However, the theoretical implications of a three-stimulus, two-transition point task are identical to those of a two-stimulus task with one reversal point; the subjects are still required to adjust their behavior by responding on different stimuli over time, identically to a reversal procedure. As such, we have retained the terminology used in previous studies for the sake of consistency. Throughout this manuscript, ‘reversal’ and ‘transition’ will be used interchangeably to mean any point where an S+ becomes an S− and an S− becomes an S+.
In addition to the baseline performance of groups of pigeons in the three-stimulus reversal task in Experiments 1 and 2, we were interested in what each group learned about the structure of the session. To test the pigeons’ attention to local reinforcement rates, on select probe sessions after training, we rewarded pecks to the first-correct stimulus for the first 60 (rather than 30) trials; pecks to the either of the other two stimuli were rewarded only on Trials 61–90. If pigeons utilized the outcome of each previous trial to inform their behavior on the next, we expected they would continue pecking the first-correct stimulus as long as it was rewarded, but if pigeons attended largely to interval cues, it was expected that they would show little or no change in behavior on these probes relative to baseline. In the event that birds responded to the first-correct stimulus for the entirety of its reward block, we were also interested in whether birds switched to the typically second-correct stimulus (i.e., the correct stimulus according to order) or the typically third-correct stimulus (i.e., the correct stimulus according to the interval time since the beginning of the session).
We were also interested in what pigeons would use as the ‘start’ point for their timer in a midsession reversal task. Where previous studies have found increased anticipatory errors when a long break was inserted partway through a session (Cook and Rosen 2010) or when the ITIs were increased in duration (McMillan and Roberts 2012), here we specifically studied whether an unexpected delay before the onset of the first trial would lead pigeons to respond as if the session had already started and thus to show earlier than normal reversal of behavior. It could be assumed that pigeons ‘start the clock’ with the first presentation of stimuli, and thus, we tested the effects of delaying the start of the first trial by 420 s (the average time to Trial 60, the ‘second reversal’) in several probe sessions. If pigeons started timing from the onset of the first trial (or attended more strongly to local reinforcement than to interval time), they were expected to be largely unaffected by a change in the delay to start the session. Conversely, if pigeons started timing earlier than the first trial (e.g., from the time they are first placed in the operant chamber), then they were expected to produce more anticipatory errors on these trials, because they would have expected an earlier reversal based on their internal clock.
In this Experiment, birds were presented with three colored keys and trained to peck each of red, green, and blue across 90-trial sessions, with pecking to each stimulus reinforced for 30 trials across the session (respectively). Once trained, these pigeons were tested with two types of probe sessions: those on which the first reversal point was delayed (i.e., the red stimulus was correct for 60 trials, and both green and blue were correct for the remaining 30 trials) and those on which the onset of the first trial was delayed (i.e., the chamber was darkened 7 min prior to the beginning of the session).
Eight adult White King pigeons (Columbalivia) were used. These subjects had previously been used in timing operant experiments requiring discrimination of local reinforcement based on color and spatial location. Birds were maintained at approximately 85 % of free-feeding weight throughout the experiment, with constant access to water and health grit. They were individually housed in cages in a room kept environmentally controlled at 22 degrees C. Fluorescent lights were turned on at 8:00 a.m. and off at 8:00 p.m. each day. Testing was performed between 9 a.m. and 4 p.m. for 5 days each week.
Three enclosed, sound-attenuating operant chambers measuring 31 × 35.5 cm (floor) × 35.3 cm (height) were used. The front wall of each chamber held three pecking keys, 2.5 cm in diameter and level with the pigeon’s head, in a row, spaced 8 cm apart. Projectors behind each key projected filtered light, presenting different colors or patterns on the keys. Grain reinforcement was delivered by an electromechanical hopper through a 6 × 6 cm opening in the front wall located near the floor, directly below the center key. Presentation of stimuli, reinforcement, and recording of responses were carried out by microcomputers, in another room, interfaced to the operant chambers.
During each of 50 training sessions, birds were presented with 90 trials. On each trial, one key was lit with a red hue, another with a blue hue, and the third key with a green hue; spatial location of each hue was randomized across trial blocks. Pecks to the red key were reinforced for the first 30 trials, pecks to the green key were reinforced for Trials 31–60, and pecks to the blue key were reinforced on Trials 61–90. Reinforcement was provided on a fixed-ratio 1 schedule. A correct response turned off all three keys and was reinforced with 2 s of access to grain, followed by a 4-s darkened ITI; incorrect responses led directly to a 6-s darkened ITI.
After the initial 50 sessions, pigeons received 20 sessions of Order Delay probe testing. Test sessions were identical to training, as described previously, except that on test Sessions 1, 6, 11, and 16, the red key was reinforced for the first 60 trials, while pecking either of the green or blue keys was reinforced for the remaining 30 trials of the session.
After the 20 sessions of Order Delay testing, the birds received a final 20 sessions of Interval Delay probe testing. These test sessions were identical to training, as described previously, except that on test Sessions 1, 6, 11, and 16, onset of the first trial was delayed by 420 s (i.e., the chamber was dark for seven min before the session began). This delay was used as an approximation of the typical time that subjects took to get from Trial 1 to Trial 60 (M = 421.57 s, SEM = 4.80).
Baseline measures were taken from the last 25 sessions of training, to remove early training effects from the data. All post hoc tests used Bonferroni correction.
Results and discussion
One noteworthy aspect of the data presented in Fig. 1 is how little the pigeons responded to currently irrelevant cues, such as responding to the third-correct stimulus before the first reversal. We also measured this responding as the number of choices of the third-correct stimulus during the first 30 trials, or the choice of the first-correct stimulus during Trials 61–90. On average, subjects made 0.5 responses (SEM = 0.16) to the third-correct stimulus in the 30 trials before the first reversal, and 0.3 responses (SEM = 0.07) to the first-correct stimulus in the 30 trials after the second reversal. While these response rates were significantly greater than zero (ts > 3.00, Ps < 0.02), they were significantly lower than responding to the second-correct stimulus in the first 30 trials (M = 5.86, SEM = 1.27; paired t = 9.80, P < 0.001) or last 30 trials (M = 8.51, SEM = 1.45; paired t = 16.63, P < 0.001), respectively, showing that the birds’ errors were largely directed to the temporally relevant stimulus rather than a result of random responding to ambiguity near the reversals.
Pigeons appeared to be at least marginally sensitive to local reinforcement changes after the first reversal, but were not at all sensitive to local reinforcement changes after the second reversal. We measured sensitivity to the reversal by comparing the change in responding to the previously correct stimulus from the first trial after the reversal to the following trial (i.e., responses to S1 at Trials 31–32 and responses to S2 at Trials 61–62, respectively), and contrasting this drop in responding to the change in responses to the same stimulus across the five trials before (‘Anticipatory’; i.e., Trials 26–31 and Trials 56–61, respectively) and the five trials after (‘Perseveratory’; i.e., Trials 32–37 and Trials 62–67, respectively) the critical trials. For example, one subject showed a mean drop in S1 responding of 15 % between Trials 31–32, with a mean drop of only 5 % across each of the previous five trials and 3 % across each of the following five trials. Paired t contrasts showed a significant change in responding to S1 immediately after the first reversal compared to the trials before it (t7 = 2.69, P = 0.031, d = 1.36) and nearly significant compared to the trials after it (t7 = 2.31, P = 0.055, d = 1.35); contrasts of the change in responding to S2 after the second reversal were not significantly different from the trials preceding (t7 = 0.61, P = 0.56, d = 0.34) or trailing (t7 = 0.10, P = 0.92, d = 0.05) the reversal. At least part of the observed lack of sensitivity after the second reversal can be explained by the delay to reverse for several trials after the second reversal; it appears that the pigeons consistently underestimated the time to the second reversal.
Figure 2b illustrates the performance of the birds on probe sessions in which the start of the session was delayed by 420 s. This manipulation produced strong effects on the behavior of the birds, despite having no impact on the actual reward delivered to the animals. Notably, responding to S1 on probe sessions on Trials 1–30 (M = 15.3, SEM = 2.4) was lower than during baseline (M = 26.8, SEM = 0.60), t7 = 4.67, P = 0.002, d = 2.35. The birds appeared to respond indifferently to S2 (M = 8.4, SEM = 1.16) and S3 (M = 6.3, SEM = 1.5) during this time, t7 = 1.87, P = 0.103, d = 0.57. It should be noted that, while all eight birds showed a decrease in S1 responding on these probes during the first 30 trials, some birds were much more affected than others, for example, one bird only responded an average of 1.75 times to S1 on Trials 1–30 (compared to an average of 26.6 on baseline), while another responded an average of 21.5 times (compared to 27.6). Importantly, however, each of the eight birds did respond more to S2 and S3 on probe session Trials 1–30 than during the same period on baseline, suggesting that all of the birds were affected by the delay before the session.
Generally, pigeons trained with a visual discrimination, three-stimulus midsession reversal task used interval timing to predict when reversals would occur. While they showed some sensitivity to the first reversal, they delayed reversing their behavior from S2 to S3 until several trials after the second reversal. The vast majority of errors made were to the stimulus that, though currently incorrect, was proximally relevant (i.e., was recently rewarded or was soon to be rewarded). This suggests that reversal errors were determined by the temporal structure of the session and were not simply random responses made at each reversal point. Finally, birds in this experiment were strongly affected by a delay to start the session but were barely affected by changes in the number of trials for which S1 reward was maintained, despite the fact that the latter probe session was the only condition in which actual reward contingencies changed.
Pigeons have previously been shown to use a local reinforcement strategy on spatial discrimination midsession reversal tasks (McMillan et al. in press), and we were interested in replicating this result with a three-stimulus midsession reversal task. A new set of pigeons was used in Experiment 2, and on each trial was presented with three white keys. Responses to these keys were reinforced in trial blocks as in Experiment 1 (e.g., left for the first 30 trials, middle for the next 30 trials, and right for the last 30 trials), with left/middle/right contingencies counterbalanced across birds; all other aspects of the procedure, including probe sessions, were identical to Experiment 1.
Subjects and apparatus
Eight new pigeons, with equivalent experimental histories to those used in Experiment 1, were used in Experiment 2. All aspects of animal husbandry and experimental apparatus were the same as in Experiment 1.
During each of 50 training sessions, birds were presented with 90 trials. On each trial, all three keys were lit white. Pecks to S1 were reinforced for the first 30 trials, pecks to S2 were reinforced for Trials 31–60, and pecks to S3 were reinforced on Trials 61–90. Two subjects were rewarded in the S1–S2–S3 order of left–center–right, two subjects in the order right–center–left, two subjects in the order center–left–right, and the two remaining subjects in the order center–right–left. Reinforcement was provided on a fixed-ratio 1 schedule. A correct response turned off all three keys and was reinforced with 2 s of access to grain followed by a 4-s darkened ITI; incorrect responses led directly to a 6-s darkened ITI.
After the initial 50 sessions, pigeons received 20 sessions of Order Delay probe testing. Test sessions were identical to training, as described previously, except that on test Sessions 1, 6, 11, and 16, S1 was reinforced for the first 60 trials, while pecking either of S2 or S3 was reinforced for the remaining 30 trials of the session.
After the 20 sessions of Order Delay testing, the birds received a final 20 sessions of Interval Delay probe testing. These test sessions were identical to training, as described previously, except that on test Sessions 1, 6, 11, and 16, onset of the first trial was delayed by 420 s (i.e., the chamber was dark for 7 min before the session began).
Results and discussion
As in Experiment 1, sensitivity to the reversal was measured by comparing, between the two reversals, the change in responding across the reversal (e.g., Trials 31–32) with the five trials before and after the reversal. Pigeons appear to have been at least marginally sensitive to local reinforcement changes after the second reversal, but did not appear sensitive to the first reversal. Paired t contrasts showed no significant change in responding to S1 after the first reversal compared to the trials before it (t7 = 1.57, P = 0.16, d = 0.88) nor compared to the trials after it (t7 = 1.44, P = 0.19, d = 0.80); contrasts of the change in responding to S2 after the second reversal were significantly different from the trials preceding (t7 = 2.49, P = 0.04, d = 1.25) but not trailing (t7 = 1.61, P = 0.15, d = 0.89) the reversal. One difficulty with interpreting these results is that we observed large between-subjects variance in our data, driven by obvious individual differences between birds for sensitivity to the reversals. We have previously encountered individual differences in using a local reinforcement strategy versus a timing strategy in midsession reversal using a visual-spatial discrimination (McMillan et al. in press), similar to the individual differences noted with a spatial discrimination here.
Figure 4b illustrates the performance of the birds on probe sessions in which the start of the session was delayed by 420 s. This manipulation produced strong effects on the behavior of the animals, despite having no impact on the actual reward delivered to the animals. Notably, responding to S1 on probes on Trials 1–30 (M = 19.8, SEM = 2.6) was lower than during baseline (M = 24.8, SEM = 1.9), t7 = 2.94, P = 0.022; however, contrary to the results of Experiment 1, the birds responded more to S2 (M = 5.0, SEM = 1.9) than to S3 (M = 0.2, SEM = 0.1) during these trials on probes, t7 = 2.59, P = 0.036. Importantly, the same four birds that previously showed better-than-average sensitivity to reversals in baseline were the only four birds that showed a preference for responding to S1 during Trials 1–30 on probe sessions (M = 24.9, SEM = 2.7) relative to S2 and S3; the other four birds showed substantially fewer S1 responses during this period (M = 14.7, SEM = 2.5), which was a significant difference, t6 = 2.79, P = 0.031.
Generally, pigeons trained with a spatial discrimination, three-stimulus midsession reversal task compromised between interval timing and local reinforcement strategies to predict when reversals would occur. There were individual differences among birds in the degree to which each strategy was used. As seen in Experiment 1, the vast majority of errors made were to the stimulus that, though currently incorrect, is proximally relevant (i.e., was recently rewarded or will next be rewarded). This suggests that reversal errors were determined by the structure of the session and not simply random responses made at each reversal point. Finally, birds in this Experiment were strongly affected by changes in the number of trials for which S1 reward was maintained but were not as uniformly affected by a delay in starting the session as were the birds in Experiment 1.
Comparison between visual and spatial discriminations
The overall mean correct choices in Experiment 2 (henceforth Spatial group) for baseline training was significantly higher than the average of the birds in Experiment 1 (henceforth Visual group), t14 = 5.04, P < 0.001, d = 2.69.
We compared the average number of correct responses from the 10 trials around each reversal (i.e., Trials 26–36 and Trials 56–66, omitting Trials 31 and 61 for which the optimal response is not the rewarded response) across both groups. A 2 × 2 mixed ANOVA [Trial Block (26–36, 56–66) × Group (Visual, Spatial)] showed a significant main effect of Group (F1,14 = 31.39, P < 0.001, η2 = 0.23), no significant effect of Trial Block (F1,14 = 1.63, P = 0.22, η2 = 0.10), and an interaction which did not reach significance (F1,14 = 4.09, P = 0.063, η2 = 0.23). The spatial group showed significantly better performance (M = 67.3 %, SEM = 1.7) near the reversals than did the visual group (M = 53.5 %, SEM = 1.7).
We also compared the visual and spatial groups on performance on Trials 56–60 in the Ordinal Delay condition. The Spatial group responded significantly more in this period to the first-correct stimulus (M = 40.4 %, SEM = 9.7) than did the Visual group (M = 9.4 %, SEM = 2.9), t8 = 3.04 (degrees of freedom adjusted for unequal variances), P = 0.016, d = 1.63. Despite large individual differences, birds in the spatial group tended to respond to the first-correct stimulus until it was no longer rewarded on probe sessions, while the visual birds tended to reverse their behavior before responding to the stimulus ceased to provide reward.
Finally, we compared the visual and spatial groups on performance in the Interval Delay condition. A 2 × 2 mixed ANOVA [Trial Block (26–36, 56–66) × Group (Visual, Spatial)] for probe sessions which were delayed by 420 s showed a significant interaction (F1,14 = 6.04, P = 0.028, η2 = 0.30), as well as a significant main effect of Group (F1,14 = 17.26, P = 0.001, η2 = 0.55) but no significant effect of Trial Block (F1,14 = 2.69, P = 0.12, η2 = 0.16). At the first reversal, the spatial group showed significantly better performance (M = 58.1 %, SEM = 4.2) than did the visual group (M = 35.9 %, SEM = 4.2), but by the second reversal, the difference between the spatial (M = 54.7 %, SEM = 2.9) and visual groups (M = 53.1 %, SEM = 2.9) had narrowed.
Pigeons appeared to represent the temporal structure of sessions in the present three-stimulus midsession reversal procedures, and the interval structure of the session held strong control over behavior in a visual discrimination when local reinforcement cues were difficult to use. In Experiment 1, pigeons responded to each of the red, green, and blue stimuli according to the elapsed duration of the session, even on sessions in which the first-correct stimulus was rewarded for twice as long as it was during training or the onset of the first stimulus was delayed for 7 min. In Experiment 2, pigeons responded to each of the left, center, and right stimuli according to a compromise between session duration and local reinforcement. These birds generally showed better performance than the subjects in Experiment 1 as a result, especially on probe sessions where the stimulus–reward order or time of the session was manipulated. In both experiments, responses to the currently irrelevant stimulus (e.g., the blue stimulus during baseline Trials 1–30) were much lower than responses to the ‘incorrect but proximally correct’ stimulus, extending previous data suggesting that the errors made near the reversal point are not simply random responding to ambiguity (Cook and Rosen 2010). Instead, pigeons appear to track the probability of being reinforced on each stimulus across the duration of the session.
Interestingly, pigeons in Experiment 1 tended to underestimate the time of the second reversal point (i.e., green–blue transition) relative to the first reversal point (i.e., red–green transition) in the same subjects. Where previous work has generally shown preference for the previously rewarded key to drop quickly after the reversal point, pigeons on average took several trials to reach equivalence (50 % preference for each of the green and blue keys) after the reversal. This effect was not observed in Experiment 2 and is in the opposite direction of typical performance on ‘late’ reversals in previous research (reversals after Trial 70 in an 80-trial session: see McMillan et al. in press; Rayburn-Reeves et al. 2011), where anticipatory errors were more common than perseverative errors. It is unclear why this asymmetry in latency to response reversal occurred.
Pigeons in Experiment 2 on average showed better sensitivity to local reinforcement changes at the second reversal than at the first (though they also showed more near-optimal performance across both reversal points than did the Visual group). Although some birds showed sharper reversal performance around Trial 30 than others, the general finding of better performance around Trial 60 may indicate that the birds were compromising between interval timing and local reinforcement strategies. Because timing is scalar (Gibbon 1977), it is thus easier (i.e., there is less error) to time a ~210-s interval than a ~420-s interval; pigeons may have been relying more on time to predict the first reversal when time was a more reliable cue, and more on local reinforcement to transition through the second reversal that was more difficult to time.
A remaining question is why, when pigeons are able to use either time or local reinforcement rates to reverse their behavior on these midsession reversal tasks, do they compromise between the two, and in many cases prefer to use time? In previous single-reversal tasks, relying on timing still allowed for a relatively high rate of reinforcement; however, in the current studies, the contribution of timing led on average to missing food on roughly 21/80 trials (Experiment 1), where relying on local reinforcement alone could lead to ‘perfect’ sessions with reward on 78 out of 80 trials. One possible explanation is a working memory account, where a 6-s intertrial interval (counting both the reward and darkened delay) is long enough to lead to forgetting of the previous trial, making a local reinforcement strategy less reliable. It has been suggested that spatial (McMillan et al. in press) or visual-spatial discriminations (McMillan and Roberts 2012) allow pigeons to ‘cheat’ the memory components of the task by orienting during the delay, and this effect is strengthened (and behavior more optimal) when the delay is especially short (Laude et al. 2014; Rayburn-Reeves and Zentall 2013) or when the reversal point is unpredictable (McMillan et al. in press). Pigeons appear to rely progressively more on a local reinforcement strategy when working memory is reliable, and less when working memory for the response and outcome of the previous trial is poor.
As we have found with a previous midsession reversal procedure (McMillan et al. in press), in the present Experiment 2, we noted individual difference among birds in strategy use, with about half of subjects being more reliant on a win-stay, lose-shift strategy and the other half relying on interval time. These individual differences carried forward from baseline performance to their performance on probe sessions. One possible reason for these differences in strategy use may simply be related to the pigeons’ behavior during the ITI. We have previously suggested that orienting during the ITI is the most important determinant of optimal performance in a spatial midsession reversal task (McMillan et al. in press), and that previous observation of more quantitatively optimal performance in rats relative to pigeons in a similar task (Rayburn-Reeves et al. 2013b) may be due to an increased likelihood that rats orient during the ITI rather than wander or engage in ‘other’ behavior. Similarly, pigeons that keep still during the ITI are by definition better able to use a spatial-orienting strategy than those that engage in numerous ‘other’ behaviors during this time. Unfortunately, we have no video of our pigeons with which to verify this hypothesis, but individual differences may explain prior inability to find optimal performance in pigeons (Rayburn-Reeves et al. 2013b) and may be of interest to future studies in longitudinal individual differences (i.e., personality).
While pigeons’ responding in Experiment 1 was profoundly affected by a delay of 7 min before the start of the session, they did not respond predominantly to S3 for the first several trials (as would be expected if they used only an interval timer started from prior to Trial 1). This may indicate a compromise between several interval timers (e.g., one that begins when the chamber door is shut and another that begins with the onset of the first stimulus) or between timing and ordinal reinforcement rates (i.e., that S1 and S2 are rewarded before S3). Regardless of mechanism, responses to S3 on these sessions increased to become the dominant response well in advance of actually being reinforced. Pigeons in Experiment 2 performed more optimally than those in Experiment 1 on these probe sessions, though still not particularly well. Together with the rest of our results, choice behavior on this midsession reversal task was driven by interval time and (generally to a lesser extent) reinforcement rates; modality of stimulus discrimination only modulates the degree to which one strategy was used over the other. Pigeons may form a complex representation of the temporal structure of sessions, including the temporal order in which responses to particular stimuli are reinforced. Use of time competes with a win-stay, lose-shift strategy, dependent upon the reliability of interval time versus working memory.
Timing is often thought to be an automatic process, prevalent wherever temporal regularities exist in the environment (Macar and Vidal 2009). Given access to such a powerful tool, temporal representation may simply be a preferred cue for reversal in pigeons because it is always available. It should also be noted that, while interval timing is a ‘fuzzy’ process subject to scalar variability (Gibbon 1977), the instances which are likely to be timed in the wild are generally not fixed to the highly rigid schedules used in the lab, and reinforcement rates are unlikely to be sharply 100 or 0 % between alternatives. As noted previously, animals use both interval and circadian timing to solve time-place tasks (Crystal 2009; Pizzo and Crystal 2002), and timing has also been implicated in a number of other foraging tasks in field and laboratory settings (for a review, see Carr and Wilkie 1997). Using interval timing in conjunction with local reinforcement rates may actually be the most ecologically valid strategy to maximize obtained food from gradually depleting and replenishing foraging patches, which may in turn influence animals’ preferences and capabilities in the laboratory. Thus, in the midsession reversal procedure, when pigeons (and rats: McMillan et al. in press) are faced with a discrimination wherein they cannot ‘cheat’ the working memory component, the animals almost exclusively rely on interval timing to predict the reversal. Given that it is difficult to imagine natural fixed intervals that end with a discrete food reward, the midsession reversal procedure may actually tap into a more naturally relevant use for time (tracking the dynamic change in probability of reward for different alternatives over time) than typical interval timing studies.
This research was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada to W. A. Roberts. We thank Chelsea Kirk and Krista Macpherson for assistance in running subjects, and Jacek Majewski for animal care.
Conflict of interest
The authors declare they have no conflict of interest.
This research was conducted with the approval of the University of Western Ontario Animal Use Subcommittee, meeting the standards of the Canadian Council on Animal Care.
- Biebach H, Gordijn M, Krebs JR (1989) Time-and-place learning by garden warblers, Sylvia borin. Anim Learn Behav 14:241–248Google Scholar
- McMillan N, Kirk CR, Roberts WA (in press) Pigeon and rat performance in the midsession reversal procedure depends upon cue dimensionality. J Comp PsycholGoogle Scholar