Monty Hall three door ’anomaly’ revisited: a note on deferment in an extensive form game

The Monty Hall game is one of the most discussed decision problems, but where a convincing behavioral explanation of the systematic deviations from probability theory is still lacking. Most people not changing their initial choice, when this is beneficial under information updating, demands further explanation. Not only trust and the incentive of interestingly prolonging the game for the audience can explain this kind of behavior, but the strategic setting can be modeled more sophisticatedly. When aiming to increase the odds of winning, while Monty’s incentives are unknown, then not to switch doors can be considered as the most secure strategy and avoids a sure loss when Monty’s guiding aim is not to give away the prize. Understanding and modeling the Monty Hall game can be regarded as an ideal teaching example for fundamental statistic understandings.


Introduction
Since Friedman (1998), the Monty Hall decision problem 1 was intensively discussed. While the experimental observations appear interesting, its behavioral explanation still remains disappointing. The investigated decision frame is constructed after a television show where three doors exist of which only one conceals the winning prize, while the other two equal zero profits. After you, as the contestant, have picked a door of your choice, the show master Monty opens one of the unchosen doors which does not reveal the prize. The question then is: do you want to switch to the remaining door or do you want to stick with your original choice. In other words, what are the winning probabilities for changing and not changing doors. In the standard construction, Monty always opens one of the unchosen doors (the one without the prize if the prize has not been chosen and otherwise one of the two randomly) and offers the contestant the option to change doors. Under these simplifying specifications the undisputed consent is to always change your door as the remaining door's probability to be a winning door is (at least) larger than 1/3, 2 while the probability has not changed for the initially chosen door. Why is it then that so many of us stay with their initial choice and do not want to change to the other door with the higher winning probability?
Is it then necessary to resort to things like reverse psychology as a possibility raised by Kevin Spacey in the role of MIT Professor Micky Rosa (in the movie "21" released 2008 by Columbia Pictures)? 3 Interestingly, not only the first intuition is to stick with the initially chosen door, but experimental investigations show that many participants remain reluctant to change and do not switch to the other unopened door. Though, playing the Monty game repeatedly documents a robust learning effect toward increased switching close to or slightly above 50% (i.e. Friedman 1998). Palacios-Huerta (2003) show that incentives, ability, and social interaction 2 Various arguments were provided for the range between 2/3 and 1/2 winning probability for switching (Rosenhouse 2009, for a broader overview to the decision problem and the corresponding literature see), and most contributions agree that it is profitable to switch from the initial choice to the other unopened door. 3 The following dialog is transcribed from a scene where the Monty Hall problem is taught in class. Prof. Rosa (Kevin Spacey): "Is it in your interest to switch your choice?" Ben (Jim Sturgess): "Ja." Prof. Rosa: "Wait! Remember the host knows where the car is. So how are you knowing he is not playing a trick on you? Trying to use reverse psychology to get you to pick a goat?" 1 The Monty Hall Show was a television broadcast where participants choose between different doors with only one bearing the winning prize (i.e. sport car) and the others nothing (i.e. goats). Frequently (but definitely not always) the host opened one of the unchosen doors (always) showing that this did not contain the prize. Participants were then asked if to change from the originally chosen door to the unchosen but closed door. The emotional difficulty with changing the door or not was a key feature of the show. Defining the optimal choice appeared to be an interesting puzzle (i.e. Nalebuff 1987). After Vos Savant (1990) proclaimed that in the so called Monty Hall dilemma the probabilities are actually two to one in favor for the change iff Monty opened the other door, an academic discussion of the decision problem began (see for example Morgan et al. 1991;Gill 2011). This reached so far to develop models or simulations for people to better understand the probabilities, with for example decision tree illustrations or by increasing the number of opened doors (see for example Shaughnessy and Dick 1991;Page 1998;Franco-Watkins et al. 2003;Krauss and Wang 2003) 1 3 Monty Hall three door 'anomaly' revisited: a note on deferment… can further strengthen learning effects in the repeated game. In a similar vein, Slembeck and Tyran (2004) conclude that communication and competition between participants supports learning towards increased switching -especially over the first rounds. Repetitions seem to help, although do not lead to optimal behavior. Granberg (1999a) show in their cross-cultural comparison study that sticking with the initial choice in the Monty game is a rather universal phenomenon. Cognitive illusions (i.e. of control) or cognitive biases (i.e. status-quo) have been proposed as possible explanatory concepts for such kinds of behavior (compare Granberg and Brown 1995;Granberg 2014). Can game theory provide alternative solutions besides explanatory concepts and posthoc rationalizations?

Definitions and Solutions
The Monty game can be defined as a sequential two player constant sum game with asymmetric information and the following specific characteristics.
(i) Player 1 (i.e. you) chooses between three options with only one holding the winning prize, but you do not know which. Therefore, the probability of having chosen the winning option (W) is 1/3 and the probability of having chosen the losing option (L) is 2/3. (ii) Player 2 (i.e. Monty) has the possibility to expose (e) or not expose ( e ′ ) one of the unchosen options which is not holding the prize. (iii) Player 2 knows before deciding between e or e ′ if W or L. The prize is never exposed and revealed to player 1 only in the final stage of the game. (iv) Iff e player 1 decides between changing to the unexposed and unchosen option (c) or staying with the initial choice ( c ′ ). (v) The incentive for player 1 is to win the prize and for player 2 not to give away the prize.
Furthermore, assume fully rational players completely abiding to these rules and always acting according to purpose without error. Simplified Monty decides, as player 2, only between e and e ′ . Sophisticated Monty fully takes information under (iii) into account, and as player 2 chooses separately for e W and for e L or respective odds. First, pure and then mixed strategies are investigated. The utility structure is strongly simplified under (v). The easiest representation of individual utility is in monetary terms, here as winning or not winning the prize. Monetary rewards are not necessarily the only outcome, which is taken into account. Social considerations or anticipated feelings can determine the resulting utility as well. Plausible utility extensions for player 2 and player 1 are investigated under Monty game expansions. These additional interdependent components are introduced by stepwise adding complexity.

Simplified Monty game
The simplest representation of the Monty game as a strategic game is in normal form. This defines the full strategy space for every player and all possible strategy combination with the resulting payoff for each player. The representation of all possible strategy combinations is in the form of a static matrix, which can be a contingent representation of a sequential game. Without considering the information if it is the winning or losing option W or L, the Monty game can be considered a simultaneous move game as shown in Table 1. The solution concept here is the Nash equilibrium, where in a given situation none would be better off by switching towards an alternative strategy. With two players and two strategies for each, this simply means that a player could not increase his/her payoff by choosing the other strategy, given the current strategy of the other player. This must simply hold for both players.

Proposition 1
The only equilibrium in pure strategies is with player 2 not exposing ( e ′ , c ) and ( e ′ , c ′ ).
As a sequential game in extensive form the simplified Monty game reduces to one subgame perfect equilibrium at ( e ′ , c ) through backwards induction (see Fig. 1). Given that player 2 decides not knowing whether W or L, there is no mixed strategy equilibrium as player 2 can only improve by increasing the proportion of e ′ as e ′ weakly dominates e (if c then e ′ is better and if c ′ then e ′ is not worse). The maximum gain for player 1 is increasing the winning probability from 1 3 to 2 3 in c for e. This gain is simplified in the literature when e is given, although without further assumptions player 2 would prefer e ′ (i.e. never opens a door to expose that it is not the winning prize).

Sophisticated Monty game
In addition, in previous investigations it is stressed that player 2 knows if the winning door was chosen (W or L), and this knowledge can be acknowledged in a formal representation of the Monty game. Monty as player 2 knows if player 1 has initially picked the winning option (i.e. door with the prize behind it) or not, and it is reasonable to assume in the sequential form game two variants for e: one if it was the winning choice e W (or e ′ W ) and another one if it was the losing choice e L (or e ′ L ). Furthermore, these can be chosen with different probabilities in a mixed strategy equilibrium. A comparable differentiation between probabilities for e has been made by Morgan et al. (1991), page 286, Mueser et al. (1999), pages43-46, andWhitmeyer (2017), pages5-7. Schuller (2012) more generally stresses that with unknown expose probabilities of winning versus losing cases the safe strategy for player 1 is not to change and secure a 1/3 winning probability. As a consequence, all sophisticated Monty game equilibria restrict player 1 to c ′ .

Proposition 2
The only Nash equilibria in pure strategies are with player 1 not changing ((e W , e � L ), c � ) and ((e � W , e � L ), c � ).
Proof Player 2 is indifferent ( e = e � ) iff player 1 not changes ( c ′ ), otherwise player 2 prefers e ′ W and e L where player 1 prefers c ′ over c. Only for ((e W , e � L ), c � ) and Player 2 exposing doors dependent on the initial choice of player 1 (e conditional on W or L) is an informational advantage and does change the equilibria. With asymmetric information the game is represented in extensive form. In pure strategies it makes player 1 to choose c ′ , which is consistent with most peoples' intuition. Mixed strategies can be derived for player 1 with p for c and 1 − p for c ′ . Player 2 can mix Proof Indifference for player 2 between e W and e ′ W as well as e L and e ′ L requires p = 0 as otherwise r = 1 and s = 0 . Determining r and s so that player 1 is indifferent between c and c ′ requires All combinations of e W and e L with r = 2s (and c ′ ) are equilibria. It pays for player 1 to choose c only when r 2 > 2 , but this again would contradict player 2' interests. Player 2 keeps this combination only for c ′ , as otherwise decreasing r and increasing s would be beneficial. Naturally, player 2 can have different incentives in this game deriving for example from extending the game or from receiving something back if the prize is won.

Monty game expansions
Additional assumptions can be introduced as explanatory concepts for the observed behavior. Two game expansions are proposed here for illustration purposes. First, the process of opening a door (e) is beneficial for the host and the derived utility

Fig. 3 Monty game expansions
needs to be added for player 2. Second, social concerns like reciprocity might play a role and can be taken into account. It appears reasonable the host being fickle and alternating between e and e ′ . Furthermore, these frequencies can be chosen purposeful when enjoying the prolongation of the game per se. 4 This is represented in Fig. 3a by adding constant utility for player 2 when reaching the second stage. The only equilibrium in pure strategies would then be ((e W , e L ), c) , as e weakly dominates e ′ and for e player 1 prefers c. Note that this only holds for the value of prolonging being equal to the prize. This value can be expected to be lower and then only one mixed strategy equilibrium remains. As for player 1 the payoffs are always the same, 2r = s remains unchanged. e W is strictly preferred (i.e. r = 1 ) and e L = e � L requires More generally, for prolonging being smaller in value than the prize then p equals their relation (i.e. p = 0.5 if the value of prolonging is half the value of the prize).
Only if the values are equal does the pure strategy equilibrium result. Otherwise for player 1 the question to answer is "what is prolonging worth for player 1" to determine p. Interestingly the proclaimed advantage of c can result, but the value of simply prolonging the show can be comparably small. Another game expansion is to assume social motives in the form of reciprocity. In the setting of the Monty Hall game show this could be in the form of showing extra joy for winning after having to reconsider the choice (being valuable for the show master by increasing the number of viewers). The expanded game in Fig. 3b acknowledges this, but without taking negative reciprocity into account. Concerning pure strategy equilibria nothing changes, and mixed strategy equilibria still require p = 0 for player 2 to be indifferent. The only difference concerns the relation between s and r, which now need to be equal for a payback of 0.5 as shown in Fig. 3b. For a reasonably lower payback than 50% of the prize r < s ( 2s[1−payback] = r ). The higher the payback the lower is the proportion of r. The question for player 2 then shifts towards the question of reciprocity ("how much can I expect back") when exposing the door without the prize behind (i.e. in terms of show value). Both expansions together provide a more specific characterisation of the Monty Hall problem than its simplified representation in the literature, and which is more in line with the natural understanding of this strongly framed choice task under uncertainty.

Conclusion and discussion
Psychological expansions can rationalize the popular solution, although simply mixed strategy equilibria and conditional probabilities suffice here. An interesting psychological aspect is to take first associations or the initial intuition into account. This need not only apply for the equilibrium selection problem (i.e. focal points or prominence), but could also enrich the understanding of other behavioral regularities. Perceived risk is the fundamental characteristic investigated by the Monty Hall game. The derived results describe the (persistent) behavior of many that switching doors is more risky. This is not only true under bounded rationality of not knowing the odds, but also in a strategic setting where the host prefers not giving away the prize. Only for simplified Monty who is always opening, or if Monty is assumed to make lots of errors while revealing a losing door (i.e. opening the doors in winning and losing cases more equally), then switching doors becomes the more successful strategy.
Most controversies of the Monty Hall problem might be due to unclear player incentives (see Mueser et al. 1999). The experimental evidence of many participants not switching is robust even under experimenters explicit claim of always opening the unchosen door with no prize behind (compare Granberg 1999b). Uncertainty might prevail as this experimental promise is non-binding and the choice situation can be represented as a normal form game with two players both having two strategies, as in the Simplified Monty game. The sequential game representation, as in the Sophisticated Monty game, illustrates this uncertainty as an information set for the contestant not knowing in which state of the world W or L (s)he is in. Furthermore, bounded rationality could argue for the complexity of the task making not switching the more robust strategy, and we do not need refer to reverse psychology or other forms of psychological tricks to influence the other players behavior. If there is an additional utility from prolonging the game and this crucial utility of the host is acknowledge by the contestant, only then switching should be preferred to not switching. An alternative explanation are social preferences. In the form of sequential reciprocity this can work similar to forwards induction in the trust game (compare Kohlberg and Mertens 1986;Dufwenberg and Kirchsteiger 2004;Battigalli and Dufwenberg 2009). The (anticipated) effect of trusting or not can be seen as serious competitors to mixed strategies equilibria, but Monty's motivation mostly remains unclear. For this various Monty types have been proposed (i.e. mean, altruistic, etc.), but the general grounds for cooperative versus uncooperative behavior remain dubious. The Monty game is usually specified as a one shot game (though investigated experimentally as a repeated game). Signaling the Monty type by opening a door does not work either (compare common priors Whitmeyer 2017). Also that joy will be shown by the contestant cannot be taken for granted and would demand another decision stage. Note that not all possible incentive structures of the game are covered here and that the chosen game tree expansions are mainly introduced to illustrate corresponding shortcomings in the discussion of this choice task under uncertainty. When the specific structural component of a simultaneous choice is stressed for switching to be the dominant strategy, as if deciding before the revealing weather to switch or not, this as well seems not properly represent the strategic situation in the game. If Monty always reveals a losing door, this does not represent a free agent in a strict economic sense (i.e. for game theory an awkward definition of a social problem as one player against chance). Furthermore, the experimental results of increasing switching decisions over repetitions might as well result from experimenter demands or being a reconsidering effect, and improving behavior over repetitions does not necessarily incorporate the learning of the underlying odds.
Still, the Monty Hall game illustrates the clash between statistical thinking and observed choice behavior. Taking this discrepancy seriously asks for descriptive models that can cope with the complexity of the problem. Already different standard representations help to illustrate the problem. A formalization of choices in social settings is given by game theory that captures the strategic dependencies between players. The provided exercise of differently representing the choice situation should sharpen the understanding of the problem diversity and illustrates how the representation of a choice problem can theoretically lead to distinct outcomes. What expansions are useful to improve the general understanding of the problem can only be answered empirically. The provided expansions for the Monty Hall problem clearly need to be investigated experimentally. This theoretical approach here is to stress the importance of developing sound foundations in experimental investigations, and to help understand the behavioral facets in social settings. Behavior can be manifold. Formalizing, and thereby clearly defining the decision problem at hand, is important in all social sciences and teaching conditional probabilities and aspects of game theory serves as a nice illustrative example here.
Sometimes the initial intuition can be right. Usually the audience in the Monty Hall show perceives changing doors as more risky under unknown probabilities. This can be seen as some kind of uncertainty avoidance (similar to the Allais paradox) by people simply playing safe. For the Monty game uncertainty avoidance has been investigated as anticipated regret (Gilovich et al. 1995) or a minimax strategy (Schuller 2012), and not switching doors does not need another explanatory heuristic. If a person changes his/her initial choice this behavior demands distributional assumptions about Monty's behavior, preferences for prolonging the game, or some form of forwards induction with specific social preferences. Usually, social situations can be rather complex, but also grasped by various theoretical concepts. Grasping the statistical dependencies within the Monty Hall game is representative for the understanding of various decision problems in social sciences.
Funding Open Access funding enabled and organized by Projekt DEAL.

Conflicts of interest No conflicts to report.
Availability of data and material Not applicable.

Code availability Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.