In the Monty Hall dilemma (MHD), subjects are presented with a choice of three alternatives, one of which will result in a prize. After making a choice, but before checking to see whether the prize has been won, the subject is shown that one of the remaining alternatives does not have the prize. The subject is then asked whether he or she wants to stay with the initial choice or switch to the remaining alternative. Most people stay, in the mistaken belief that staying or switching each results in a 50 % chance of winning but they might as well stay because it is better to stay and be wrong than to switch and be wrong (an endowment or ownership effect, Gilovich, Medvec, & Chen, 1995; or a sunk cost effect, Arkes & Ayton, 1999).

In fact, when this problem has been studied experimentally, even after considerable training (50 repeated problems), few subjects switched reliably, in spite of the fact that switching results in winning a prize two thirds of the time (Granberg & Brown, 1995). Humans eventually do learn to match probabilities by switching two thirds of the time (Granberg & Brown, 1995), but the optimal strategy is to switch all of the time.

Herbranson and Schroeder (2010) asked whether suboptimal choice with this task was a general phenomenon. They created a nonverbal version of the task and gave it to human and pigeon subjects. Humans were given 200 trials with feedback to observe whether extended experience with the task would increase participants’ use of the optimal switching strategy, but the results were very similar to those in Granberg and Brown (1995), in which humans eventually learned to match probabilities. Interestingly, even though pigeons initially showed a stronger bias to stay with their initial choice than did the humans, they acquired the switching strategy and, after 30 sessions of training, used it almost exclusively. From these results, it appears that pigeons, but not humans, learn to effectively solve the task. It could be that nonhuman animals are evolutionarily prepared to encounter conditions in which outcomes following choice are probabilistic (e.g., foraging for food), whereas modern humans may have learned to overcome that tendency and search for outcomes that are more often correct.

There has been some interest in determining why humans fail to choose more optimally when performing this task. Probability matching results in reinforcement about 56 % of the time, whereas if subjects choose to switch all of the time, it will result in about 67 % reinforcement (the maximum amount of reinforcement possible under these probabilistic reinforcement conditions). Gaissmaier and Schooler (2008) have suggested that probability matching results from trying to find a complex pattern in the random sequence of stay and switch responses. However, distributing responses across stimuli in an attempt to improve reinforcement does not maximize reinforcement in human (Fantino & Asafandiari, 2002) or nonhuman (Mazur, 1981) animals. On the other hand, many studies have found that animals often learn to perform probability learning tasks nearly optimally (Shimp, 1966, 1973).

The tendency to perceive the probabilities associated with the two remaining doors in the MHD as being equal has been attributed to an equiprobability bias (Lecoutre, 1992). That is, with two alternatives, it is thought that the odds of winning for either staying or switching are equal. This classic means of probability estimation is typical of university students, whereas younger children have been found to switch at a higher level (DeNeys, 2006). It may be that education teaches us that there is a solution (that provides 100 % reinforcement) to every problem (Granberg, 1999) and this cultural experience might make solving the MHD more difficult.

In the MHD, humans may be more likely to stick with their initially chosen door because they feel some ownership of it. The effect commonly referred to as the endowment effect can be seen when people demand more to give up an object they have been told that they own than what they would pay for it if it were not theirs (Kahneman, Knetsch, & Thaler, 1986; Thaler, 1980; for related research with pigeons, see Pattison, Zentall, & Watanabe, 2012). Support for the influence of ownership on performance in the MHD was found by Granberg and Dorr (1998). In their research, participants showed a tendency to switch more often when someone else made the initial door selection. It may be that humans, but not pigeons, take ownership of their initial choice. In the present research, we asked whether pigeons that were required to “invest” more in their initial MHD choice by making 20 pecks rather than 1 peck would perform more like humans.

We considered two possible outcomes of this manipulation. First, by increasing the pecking requirement to make their original choice, we may create an endowment or ownership-like effect in the pigeons. However, having to invest more in the initial choice may also make the outcome of staying with the initial choice versus switching to the remaining alternative more discriminable. That is, increasing the pecking requirement may make it easier for the pigeons to remember their initial choice and, thus, make it easier for them to discriminate the difference in outcome following a stay and a switch response.

Experiment 1

Method

Subjects

Twelve White Carneau pigeons (Columbia livia) ranging in age from 2 to 12 years served as subjects. All pigeons had received experience in previous, unrelated studies involving simple simultaneous and successive hue discriminations but had never been exposed to a probability learning task. The pigeons were maintained at 85 % of their free-feeding weight throughout the experiment. They were individually housed in wire cages with free access to water and grit in a colony room that was maintained on a 12:12-h light:dark cycle. The pigeons were maintained in accordance with a protocol approved by the Institutional Animal Care and Use Committee at the University of Kentucky.

Apparatus

The experiment was conducted in a BRS/LVE (Laurel, MD) sound-attenuating standard operant test chamber measuring 34 cm high, 30 cm from the response panel to the back wall, and 35 cm across the response panel. Three circular response keys (2.5-cm diameter) were aligned horizontally on the response panel and separated from each other by 6.0 cm. The bottom edge of the response keys was 24 cm from the wire-mesh floor. A 12-stimulus in-line projector (Industrial Electronics Engineering, Van Nuys, CA) with 28-V, 0.1-A lamps (GE 1820) that could project blue hues (Kodak Wratten Filter No. 38) was mounted behind each response key. Mixed grain reinforcement (Purina Pro Grains, a mixture of corn, wheat, peas, kafir, and vetch) was provided from a raised and illuminated grain feeder located behind a 5.1 × 5.7 cm aperture horizontally centered and vertically located midway between the response keys and the floor of the chamber. Reinforcement consisted of 1.5-s access to mixed grain. The experiment was controlled by a microcomputer and interface located in an adjacent room.

Procedure

Pretraining

All pigeons were pretrained to peck at each of the three keys to receive reinforcement. Each session consisted of 15 trials, 3 with each key. A single response turned off the key light and resulted in 3.0 s of reinforcement. For pigeons assigned to the 20-peck group, responses were gradually increased from 1 to 20 pecks over the pretraining sessions.

Training

Each training session consisted of 96 trials. At the start of each trial, all three response keys were illuminated white. For pigeons in the single-peck group, a single peck to any key turned off all three keys for 1 s. For pigeons in the 20-peck group, a single peck to any key turned off the two unchosen keys, and 19 more pecks were required to turn off the chosen key for 1 s. At the end of the delay, two blue keys were illuminated, the key that the pigeon had initially chosen and one of the two keys (randomly selected) that the pigeon had not initially selected. If the pigeon pecked the key that it had initially chosen again (i.e., a stay response), it received 3.0 s of reinforcement with a probability of .33. If the pigeon chose the other key (i.e., a switch response), it received 3.0 s of reinforcement with a probability of .67. Trials were separated by a 5-s intertrial interval. Pigeons were trained 6 days a week for 70 sessions.

Results

Single-peck group

Herbranson and Schroeder (2010) trained their pigeons for 30 sessions, whereas we trained our pigeons for 70 sessions. To better compare our results with theirs, Fig. 1 shows the percentage of switch responses as a function of session number for sessions 1 and 30 (following Herbranson & Schroeder’s data presentation) and session 70. As can be seen in the figure, pigeons in the single-peck group switched on 44.4 % of the trials on session 1, 62.0 % on session 30, and 79.0 % on session 70. The difference in percentage switches between session 1 and session 30 was not significantly different from chance as indicated by a correlated samples t-test, t(5) = 1.25, p = .27. However, the difference in switch responses between session 1 and session 70 was statistically reliable, t(5) = 3.29, p = .02. Thus, pigeons in the single-peck group switched significantly more often on session 70 than on session 1.

Fig. 1
figure 1

Percentage of switches for pigeons in the single-peck group (gray bars) and the 20-peck group (white bars) on sessions 1 and 30 (for comparison with Herbranson & Schroeder, 2010, and Mazur & Kahlbaugh, 2012) and session 70

Twenty-peck group

Also presented in Fig. 1, pigeons in the 20-peck group switched on 38.5 % of the trials on session 1, 74.6 % on session 30, and 81.2 % on session 70. For the 20-peck group, the difference in switch responses between session 1 and session 30 was significantly different from chance, t(5) = 3.88, p = .012. There was also a greater preference to switch on session 70 than on session 1, t(5) = 5.69, p = .002.

Comparison of the single-peck and 20-peck groups

The percentage choice of switching for both groups for sessions 1–70 can be seen in Fig. 2. Although both groups reached a similar asymptotic level of switching, the 20-peck group reached it sooner. To get a sense of the difference in acquisition of the switch response for the two groups, we calculated two sessions-to-criterion scores for each pigeon, one sessions to a criterion of 70 % switches and one sessions to a criterion of 80 % switches. Using the more lax criterion, the 20-peck group reached the 70 %-switches criterion in a mean of 6.0 sessions, whereas the single-peck group reached that criterion in a mean of 38.5 sessions. An independent samples t-test indicated that the difference between groups in the number of sessions to the 70 % criterion was statistically significant, t(10) = 2.63, p = .025. Using the more stringent criterion, the 20-peck group reached the 80 %-switches criterion in a mean of 9.2 sessions, whereas the single-peck group reached that criterion in a mean of 39.3 sessions. Again, an independent samples t-test indicated that the difference between groups was also statistically significant, t(10) = 2.45, p = .03. Over the last 5 sessions of training, pigeons in the 20-peck group switched on 78.5 % of the trials, and pigeons in the single-peck group switched on 75.1 % of the trials. Although this level of switching was not significantly different from probability matching (66.7 %), 4 of the pigeons (2 in each group) switched on more than 90 % of the trials, and on session 70, 8 of the 12 pigeons switched on significantly more than 67 % of the trials (i.e., more than probability matching).

Fig. 2
figure 2

Experiment 1: Percentage choice of switches for all 70 sessions for both the single-peck group (filled circles) and the 20-peck group (open circles) when the probability of reinforcement for switching was 67 %

Patterns of responding

The pattern of choices by individual pigeons in each group was also analyzed: which key was initially chosen and whether the second choice was to stay or switch depending on which nonselected key was lit. Table 1 shows the response patterns from sessions 1, 30, and 70 for pigeons in the single-peck group, and Table 2 shows similar results for pigeons in the 20-peck group. Many of the pigeons developed an identifiable response pattern.

Table 1 Experiment 1: Proportion of each response sequence for pigeons in the single-peck group
Table 2 Experiment 1: Proportion of each response sequence for pigeons in the 20-peck group

In the single-peck group, 3 of the pigeons (19338, 3391, and 5125) chose the center key first and then switched to the alternative key that was lit, whereas 1 of the pigeons (19276) chose the right key first and then switched to whatever alternative key was lit. One pigeon (18251) always chose the center key first and switched to the right key if it was lit but stayed with the center key if the left key was lit. Finally, pigeon 10006 chose the left key first on almost half of the trials and stayed about half of the time but otherwise showed a more random response pattern.

In the 20-peck group, 3 of the pigeons (20895, 19205, and 17878) chose the center key first and then switched to the alternative key that was lit, while 1 of the pigeons (10053) chose the left key first and then switched to whatever alternative key was lit. Pigeon 11746 chose the center key first on more than half of the trials but switched on only half of those; however, it chose the left key first on the remaining trials and switched on all of those. Thus, this pigeon avoided the left key on its second choice. Finally, pigeon 19243 chose the center key first on almost half of the trials and switched whenever the left key was lit but stayed whenever it was not lit. In general, pigeons that did not show optimal switching performance had a bias to avoid one of the keys, especially on their second choice.

Discussion

By session 70, most of the pigeons in both the single-peck and 20-peck groups had learned to switch rather than stay, switching on average 79.0 % and 81.2 % of the time, respectively. This result is consistent with earlier findings that pigeons perform the MHD as well as or better than humans (Herbranson & Schroeder, 2010).

Although Herbranson and Schroeder’s (2010) pigeons learned to switch faster and reached a higher asymptotic level of switching than our single-peck pigeons, Mazur and Kahlbaugh (2012) found that comparably trained pigeons switched about 60 % of the time with 30 sessions of training (comparable to our single-peck pigeons). However, with further training, most of the pigeons in both of our groups reached a level of switching that was higher than probability matching (66.7 %). Unlike Herbranson and Schroeder’s pigeons and more like Mazur and Kahlbaugh’s pigeons, our pigeons showed considerable variability in their terminal level of switching. Thus, although their level of switching was not significantly greater then probability matching, it was for 8 of our 12 pigeons. However, whether the pigeons switched more than would be predicted by probability matching is less important than the fact that they switched significantly more than chance.

Although both groups in the present study reached a similar level of asymptotic performance, pigeons in the 20-peck group reached it sooner than pigeons in the single-peck group. Thus, it appears that the additional investment in the initial choice response by the pigeons in the 20-peck group facilitated the pigeons’ tendency to switch.

One possible reason for this facilitation is that the added pecking requirement extended each trial, thus making the second choice occur relatively closer to reinforcement for the 20-peck group than for the single-peck group. Fantino’s (1969) delay reduction theory predicts that a stimulus (or in the present case, a response pattern—switching) that comes relatively closer to reinforcement becomes a better conditioned reinforcer. That is, on 20-peck trials, the second choice (to switch or stay) comes closer to reinforcement, relative to the duration of the trial, than it does on single-peck trials. Thus, the stimuli associated with switching become better conditioned reinforcers for the 20-peck group than for the single-peck group.

A second possibility is that with trials spaced farther apart, reinforcement per unit time decreases and the consequences of differential reinforcement (for staying vs. switching) become greater for pigeons in the 20-peck group. Thus, pigeons in the 20-peck group may more quickly learn to attend to the response–outcome contingency.

A third possibility for faster acquisition by pigeons in the 20-peck group is that the higher peck requirement for this group resulted in extinction of responding to the originally chosen location. We tested this hypothesis in Experiment 2, in which we again manipulated the number of pecks required to the initially chosen location but the probabilities of reinforcement for staying and switching were equated at 50 %. If the increased peck requirement to the initially chosen location in Experiment 1 resulted in greater extinction to that location, a similar result should be found in Experiment 2. That is, pigeons in the 20-peck group should switch faster than pigeons in the single-peck group. However, if pigeons in the 20-peck group in Experiment 1 simply acquired the switching strategy more quickly because they attended more to the outcome following their effort, pigeons in neither group should show a preference for switching, because there should be no advantage to using this strategy.

Experiment 2

Method

Subjects, apparatus, and procedure

Twelve White Carneau pigeons similar to those in Experiment 1 served as subjects. The apparatus was the same as that used in Experiment 1, and the procedure was the same as that in Experiment 1, with the exception that the probability of reinforcement for switching versus staying was equated at 50 %.

Results

Both groups of pigeons began by switching on only about 35 % of the trials and showed little tendency to deviate from that percentage for the 70 sessions of training (see Fig. 3). Although it appeared that pigeons in the 20-peck group was beginning to show a greater preference for switching toward the end of training, an independent samples t-test comparing the proportion of switching responses for both the single-peck and 20-peck groups pooled over the last 5 sessions showed no significant difference between the groups, t < 1. Performance by both groups averaged across the last 5 sessions and compared with that for indifference between staying and switching (50 %) showed that there was a reliable tendency for the pigeons to stay, t(11) = 2.60, p = .025.

Fig. 3
figure 3

Experiment 2: Percentage choice of switches for all 70 sessions for both the single-peck group (filled circles) and the 20-peck group (open circles) when the probability of reinforcement for switching was 50 %

Patterns of responding

As in Experiment 1, the pattern of choices by individual pigeons in each group was also analyzed. Tables 3 and 4 show the response patterns from sessions 1, 30, and 70 for pigeons in the single-peck and 20-peck groups, respectively. Only 1 pigeon in each group switched. Pigeon 19831 generally chose the center key first and then switched to the remaining side key, whereas pigeon 19824 generally chose the left key first and then switched to the center or right key or stayed with the left key about equally often. The 6 pigeons that generally stayed chose the center key. For the 4 pigeons that were relatively indifferent, 2 started with the center key and 2 started with the right key, and they switched to the other lit key about half of the time.

Table 3 Experiment 2: Proportion of each response sequence for pigeons in the single-peck group
Table 4 Experiment 2: Proportion of each response sequence for pigeons in the 20-peck group

Discussion

The results of Experiment 2 indicate that in Experiment 1, the faster acquisition of switching by the pigeons in the 20-peck group than by pigeons in the single-peck group did not result from faster extinction of pecking at the initially chosen location. In fact, pigeons in both groups showed a significant preference for staying with their initial choice.

General discussion

In Experiment 1, there was no evidence of an ownership-like effect. That is, the increased initial response requirement for the 20-peck group did not increase the pigeons’ choice to stay with their initial investment. Although the ownership or endowment effect has been used as an explanation for why humans tend to stay with their initial choice (Lichtenstein & Slovic, 1971, 1973), apparently either pigeons do not show such a bias or the present manipulation was not appropriate to make it appear.

Given the fact that humans misperceive the probability of reinforcement for staying versus switching as being equal, the endowment effect may account for the fact that humans tend to stay rather than switch. But why do they misperceive the probability of reinforcement? The answer may be related to the fact that generally, humans do not do well with probability learning (see, e.g., Herrnstein, 1970), whereas nonhuman animals generally perform more optimally (Bitterman, 1965).

It is interesting to speculate about the difference. One hypothesis is that humans have extensive experience with tasks based on rules that work all of the time. Most puzzles and games that humans attempt have a solution that is not probabilistic. Thus, unlike other animals, when humans encounter a probabilistic solution—for example, one that works 67 % of the time, as does the MHD—they most often try to find one that works better. As was noted earlier, it has been suggested that probability matching with the MHD may result from trying to find a complex pattern in the random sequence of stay and switch responses (Edwards, 1961; Gaissmaier & Schooler, 2008). Interestingly, humans may begin to choose more optimally when they stop trying to do better than always selecting the alternative with the higher probability of being correct (Edwards, 1961). Thus, this is a case in which failing to consistently choose the more probable alternative (in the present case switching rather than staying), in an attempt to find a strategy that results in better than the programmed probability of reinforcement, typically leads to a lower probability of reinforcement.