Savage vs. Anscombe-Aumann: an experimental investigation of ambiguity frameworks

The Savage and the Anscombe–Aumann frameworks are the two most popular approaches used when modeling ambiguity. The former is more flexible, but the latter is often preferred for its simplicity. We conduct an experiment where subjects place bets on the joint outcome of an ambiguous urn and a fair coin. We document that more than a third of our subjects make choices that are incompatible with Anscombe–Aumann for any preferences, while the Savage framework is flexible enough to account for subjects’ behaviors.


Introduction
The Savage (1954) and the Anscombe and Aumann (1963) frameworks are the two most popular approaches when it comes to modeling ambiguity. The latter is a twostage model where acts are maps from states to objective lotteries over consequences. It is often preferred for its simplicity, but the Savage model provides more flexibility. Gilboa and Schmeidler (1989) and Schmeidler (1989) used the Anscombe and Aumann approach as a basis for their seminal contributions to ambiguity theory. Eichberger and Kelsey (1996) show that, for standard ambiguity models like Choquet-expected utility (CEU) and Maxmin Expected Utility, ambiguity aversion implies a strict preference for randomization when looked at in the Anscombe-Aumann framework. They also show that the same need not hold in the Savage framework. Eichberger and Kelsey (1996) argue against the plausibility of a general preference for randomization but also admit the need for further experiments on this question. 1 We implement an experiment in which some choices are inconsistent with ambiguity models that are based on the preference framework of Anscombe and Aumann (1963). We show that these choices can be consistent within a Savage framework using, e.g., a CEU model as in Eichberger and Kelsey (1996). The experiment involves subjects choosing from among six options that each relates to the outcomes of a coin flip and a draw from an ambiguous, 2-color urn. Two of the six options result in a clearly ambiguous act. Two more of the six options result in a clearly risky act. The last two options would be considered risky acts within the Anscombe-Aumann framework, but would be treated as ambiguous acts within the Savage framework. By manipulating the payoffs within the various acts, we are able to create a dominance relationship between the four risky acts using the Anscombe-Aumann framework. We find that dominated acts are still chosen by subjects more than a third of the time. The same subject choices can be explained with ambiguity models using the Savage framework, where the dominance relationship does not necessarily hold.
The two acts that highlight the differences between the two frameworks involve ambiguity hedging (see Roomets 2014, andOechssler et al. 2019). These acts are akin to betting on one color when a coin flip comes up heads, and a different color when the coin flip comes up tails. Within the Anscombe-Aumann framework, subjects making such a combination exploit the complementarity of the probabilities of the two colors of balls in the urn to arrive at a believed 50:50 chance to win the bet. Within the Savage framework, such complementarity need not to be assumed. Subjects are allowed to believe that the probabilities of the two colors depend on the coin flip. Therefore, when a subject considers choosing an act that combines bets on blue (when the coins shows heads) and yellow (when the coin shows tails), the subject could believe that blue is unlikely when the coin shows heads and also that yellow is unlikely when the coin shows tails. Therefore, while the hedge acts represent risk using the Anscombe-Aumann framework, the same acts represent ambiguity using the Savage framework.
While it may seem we are pitting one framework against the other in a fair fight, we caution readers that the way we have been able to design choices leaves Savage mostly out of harms way while placing Anscombe and Aumann in jeopardy. Some may point out that the flexibility of the Savage framework is what keeps it out of the fray, and that this flexibility should be considered an advantage. We cannot disagree, but we leave discussions of the relative flexibility of the frameworks to more theoretical papers. As a fundamentally experimental endeavor, this paper should be viewed primarily as a test of the Anscombe-Aumann framework. Our results are not supportive of the Anscombe-Aumann framework in this context. This represents our main finding and contribution. It is, of course, interesting that the Savage framework could have explained our subjects' behavior when the Anscombe-Aumann framework could not. However, this should not be considered direct support for the Savage framework as there was no way it could have failed in our experimental setting. 2

Experimental design
The experiment consisted of a single incentivized task, 3 followed by an unincentivized questionnaire. Subjects had to choose one of the six acts that depended on the outcome of a fair coin and the outcome of a draw from an Ellsberg urn. 4 The urn contained 24 blue and yellow balls in a composition that was unknown to subjects. Subjects were told that any combination from 0 blue balls (and 24 yellow balls) to 24 blue balls (and 0 yellow balls) was possible. Payoffs were chosen so as to ensure tie breaking for subjects who thought that some or all states are equally likely and to create the afore-mentioned dominance relationship within the Anscombe-Aumann framework. In treatment A, subjects chose from the six acts, as listed in Table 1. In treatment B, the payoffs of $21 and $22 were interchanged, while all other design aspects were kept constant. Interchanging the payoffs in this way helps us to identify the proportion of subjects choosing an option based on it having the highest potential payoff.
In the experiment, the acts were labeled neutrally ''Option A'' through ''Option F'' and were presented in a random order. Here, we have given them names that highlight their nature. The ''heads'' act, for example, will win if the coin shows heads, regardless of the ball draw. The ''hedge yb'' act would win if the ball drawn 2 Here, one should also mention the intriguing thought experiments of Machina (2009Machina ( , (2014 and the experiment of L'Haridon and Placido (2010), which make complementary but different points to our paper. These papers point out problems with CEU that are independent of whether the Savage or the Anscombe-Aumann framework is being used. 3 Having several tasks with some probabilistic or fixed payment rule would run the risk of confounding ambiguity with hedging motives or with attitudes towards compound lotteries (see, e.g., Halevy 2007). 4 In the actual experiment, we used a non-transparent bag and blue and yellow marbles. For expositional reasons, we employ the more customary urns and balls in the text.
is yellow and the coin shows ''heads'' or if the ball drawn is blue and the coin shows ''tails''.
At the end of the experiment, subject volunteers drew a ball from the urn and tossed the fair coin. Importantly, the ball was drawn first (and shown to subjects), and then, the coin was tossed. 5 This timing was explained in the instructions.
After the acts were chosen, but before the random variables were determined, subjects filled out a questionnaire. The questionnaire included unincentivized questions about how subjects chose their bet in the elicitation task, a hypothetical three-color Ellsberg experiment, demographics, a hypothetical two-color Ellsberg urn, and beliefs about the random variables in the elicitation task (see the appendix for the questionnaire).
Experiments were conducted using pen and paper at the Economics Science Laboratory at the University of Arizona. Subjects were students at the university. There were 93 subjects in treatment A (57% female) and 31 subjects in treatment B (48% female). The experiment took roughly 30 min, and subjects received an average of $19.91 including a $10 show-up fee. Decisions and payments were made privately (with respect to other subjects).
Instructions (see Appendix) were distributed on paper and read aloud at the beginning of the experiment. Urns were on display during the entire experiment, so that subjects could be certain that the urns' contents could not be manipulated. Subjects were allowed to verify the urns' contents after the experiment, and some did.

Hypothesis
The two standard approaches to model uncertainty, the Anscombe and Aumann (1963) and the Savage (1954) framework, differ in the way they model a randomization device like a fair coin (see, e.g., Eichberger and Kelsey 1996, or Klibanoff 2001). In the Savage framework, the outcomes of a randomizing device must be modeled explicitly as part of the description of a state. The state space is the Cartesian product S ¼ U Â R; where U ¼ fb; yg is the outcome of the draw from an urn (ambiguous) and R ¼ fH; Tg is the outcome of a fair coin flip (objective randomization device). Hence, e.g., s 1 ¼ bH denotes the state where the drawn ball was blue and the coin flip produced heads. Thus, in our experiment, we have the state space S ¼ fs 1 ; :::s 4 g, as listed in Table 1, and a finite set of consequences X ¼ f0; 20; 21; 22g. An act is a map f : S ! X and preferences are defined as binary relations on F , the set of all acts. In the experiment, there were the six acts, as listed in Table 1. Figure 1 illustrates the three types of acts available, the ''hedge'' acts (by, yb), the ''color'' acts (bb, yy), and the ''coin '' acts (h, t). The tree to the left shows the ''hedge by'' act, the tree in center shows the act ''blue'', and the tree to the right shows the act ''heads''. 6 In the Anscombe-Aumann framework, randomization devices are incorporated into the consequence space. The state space would consist only of S AA ¼ fb; yg. Consequences would be all simple lotteries (probability distributions) on X, denoted by DðXÞ. Acts in the Anscombe-Aumann world are maps f : S AA ! DðXÞ and are listed in Table 2.
The crucial thing to note is that in an Anscombe and Aumann framework, both the ''hedge'' acts and the ''coin'' acts yield objective 50:50 lotteries. However, the hedge acts yield lotteries that pay out $22 ($21 in Treatment B) when successful, while the coin acts only pay out $20. Thus, any decision-maker should strictly prefer either of the hedge acts to the coin acts. 7 Hypothesis In the Anscombe-Aumann framework, no decision-maker should choose a coin act in either of the treatments. This hypothesis need not hold in a Savage framework (see Eichberger and Kelsey 1996). To construct a counter-example, consider a Choquet-Expected Utility (CEU) maximizer with the following capacity vðÁÞ and linear utility function u: M N ) vðMÞ vðNÞ vð£Þ ¼0 vðSÞ ¼1: Following Eichberger and Kelsey (1996, Assumption 3.1), we assume that the capacity on S respects the probability of the coin flip for events that exclusively depend on the outcome of the coin flip. Under this assumption, vðfs 1 ; s 2 gÞ ¼ vðfs 3 ; s 4 gÞ ¼ 0:5 and, therefore, coin acts are not ambiguous. Now, suppose that vðfs i gÞ ¼ 0:1; 8i; vðfs 1 ; s 3 gÞ ¼ vðfs 1 ; s 4 gÞ ¼ vðfs 2 ; s 3 gÞ ¼ vðfs 2 ; s 4 gÞ ¼ 0:2 and vðfs 1 ; s 2 ; s 3 gÞ ¼ vðfs 1 ; s 2 ; s 4 gÞ ¼ vðfs 2 ; s 3 ; s 4 gÞ ¼ 0:6: In this case: for all non-coin acts f. 8 Thus, a CEU maximizer need not satisfy the above hypothesis.

E-capacities
In fact, the above capacity is an example of the parametric capacity model of Eichberger and Kelsey (1999). 9 Their model offers a tractable way to incorporate an exogenous probability distribution into subjective (ambiguous) beliefs. For r 2 R, let E r :¼ frg Â U be the event referring to the outcome of the coin flip with known probability pðE r Þ ¼ 1 2 (i.e., E H ¼ fs 1 ; s 2 g and E T ¼ fs 3 ; s 4 g). An agent has information consistent probabilities pðsÞ if P s2E r pðsÞ ¼ pðE r Þ; r 2 R: An example would be a uniform probability distribution p on S, such that pðsÞ ¼ 1 4 for all s 2 S. The agent is confident that pðE H Þ ¼ pðE T Þ ¼ 1 2 describes the likelihoods of a fair coin. However, he is less confident about probabilities of states in E H and E T , respectively. That is, the agent distorts probabilities of states by his degree of confidence q t 2 ½0; 1 and q h 2 ½0; 1 that may vary across the known probability events E H and E T (i.e., q t 6 ¼ q h ). (Alternatively, q t and q h measure perceived ambiguity of states). Formally, the EK capacity m EK on 2 S is defined as follows. For each A 2 2 S : : When the degree of confidence is constant (i.e., q t ¼ q h ), the capacity is called the Ellsberg capacity. When q ¼ 1, the EK capacity coincides with the probability measure p on S. Notice that the EK capacity is convex (hence, ambiguity aversion). Consider a Choquet-Expected Utility preference with respect to an EK capacity, a strictly monotonic utility function u, and pðsÞ ¼ 1 4 for all s 2 S. Note that p A \ E r ð Þ¼ 1 4 and bðAÞ ¼ 0, for all A 2 ffs 1 ; s 3 g; fs 1 ; s 4 g; fs 2 ; s 3 g; fs 2 ; s 4 gg. Then, the agent prefers a coin act to any non-coin act f as long as q t þ q h \2 uð20Þ uð22Þ , since: If q t ¼ q h , then the preference for a coin ticket holds for any q\ uð20Þ uð22Þ .

Results
Subject decisions in our experiment are presented in Table 3. The left-hand side presents how many subjects chose the various acts, while the right-hand side combines acts of the same type and includes the percent of subjects choosing each type of act. The most important thing to notice is that there are many more coin act decisions than our main hypothesis would suggest. In fact, coin acts were the most popular choice when combining the data from both treatments. Statistically, this is a clear rejection of our main hypothesis. However, this hypothesis is very strict in that Table 3 Decision results by treatment a single coin act could be used to justify rejection. Therefore, it is worth considering whether coin acts could plausibly be explained as mistakes. If coin acts are a result of mistakes by subjects, otherwise, consistent with the Anscombe-Aumann framework, this would mean that (by a conservative estimate) around 1/3 of subjects made mistakes in our experiment. However, it would be more reasonable to assume that mistakes were randomly distributed over the choices subjects did not intend to make. Suppose that a share k of subjects makes mistakes and deviates from their actually preferred act. If they make a mistake, they choose one of the remaining five acts with equal probability. Since a coin act should theoretically not be a preferred act, the two coin acts are always among the five remaining acts, yielding 2/5. Thus, to reproduce the share of coin acts of about one-third in the data, we need to have the share of mistakes k solving 1=3 ¼ k 2 5 or k ¼ 5=6. We believe that it is unlikely that 5/6 of our subjects made mistakes when indicating their preferred act, and so, we view our results as a strong rejection of our main hypothesis, even when allowing for some measurement error. Furthermore, results from the questionnaire, discussed further in Sect. 4.1, reject the notion that subjects were choosing at random.

Who chose the coin acts?
While our main hypothesis and results concern the proportion of subjects that chose the various acts, we can also employ the questionnaire data to help explain why certain acts were chosen. For example, we look at what might have led subjects to choose a coin act, which is inconsistent with the Anscombe-Aumann framework. For each type of act, we estimate a linear probability model with a left-hand-side variable equal to ''1'' if the subject bet on that type of act and equal to ''0'' otherwise. 10 For explanatory variables, we use ambiguity attitude as measured separately by hypothetical two-and three-color Ellsberg urn questions in the questionnaire. We then use data from a written explanation of the original incentivized decision, which we asked for in the questionnaire. 11 To translate subjects' written free-format explanations into a usable format, we employed three additional student coders who were asked to read through the questionnaire responses and identify whether certain topics were discussed. The topics included the relative ''risk / safety'' 12 and ''known / unknown likelihood'' of the different options, the idea that all options are equally likely, the relative payoffs of different options, and others. 13 These student coders entered a ''1'' if a topic was 10 Logit and probit models yield similar conclusions. 11 We asked subjects the following question immediately after choosing their incentivized bets and gave them a full page to respond: ''What was your thought process when you made your decision?'' 12 This codes statements like, ''option A seemed riskier than option B'' or, ''the safest choice was option D''. 13 A full list of topics and the instructions given to the student coders is available as an appendix. Coders had access to the experimenters while working to ask clarifying questions about the topics, but the experimenters declined to answer questions about how to code specific responses. discussed, and a ''0'' otherwise. The three codings were averaged to create our final measure of topics discussed, which we use in our regressions. 14 Since payoffs of the various acts differ across treatments, we also use a treatment dummy and a treatment dummy interacted with a questionnaire response related to payoff comparisons. Subjects who were choosing based on payoff maximization potential would have chosen a hedge act in treatment A, but a color act in treatment B. The interaction term, therefore, allows for a reasonable interpretation of the ''relative payoff'' coding across treatments, particularly when looking at the color or hedge columns. 15 Finally, we also include a gender dummy, since the gender composition of the two treatments varied slightly.
Regression results are available in Table 4. We find that choosing a coin act seemed to be preferred by subjects that expressed ambiguity averse preferences in the hypothetical 3-color urn question, 16 cited the relative risk and/or the relatively known likelihoods of various outcomes, and did not comment on all outcomes being equally likely. Relative payoff discussion is negatively correlated to choosing coin acts (which had the lowest expected payoff) but not significantly so.
As one might suspect, choosing a color act was negatively correlated to ambiguity aversion measures (both three-color and two-color measures). 17 Choosing a color act was also negatively correlated with the discussion of payoff differences in treatment A, when ''Hedge betting'' had the highest winning payoff. In treatment B, when the color acts had the highest winning payoff, this effect is cancelled out.
Choosing a hedge act seemed to be preferred by subjects that discussed that acts were equally likely, and focused on payoff differences. This led to much more frequent hedge act choices in treatment A, where the hedge acts had the highest winning payoff. Unlike what would be expected according to theory, choosing hedge acts did not appear particularly related to ambiguity aversion.
It is also worth pointing out that the systematic and intuitively consistent differences highlighted in this section provide evidence that subjects both understood the options they were given in the main experiment and made their choices based on intuitively reasonable criteria. Indeed, it is unlikely that subjects made their decision randomly due to either misunderstanding or indifference.
14 Codings of the three coders were unanimous about 92% of the time. 15 The coefficient for ''Relative Payoffs'' can be interpreted as the effect in treatment A, while the sum of the coefficients for ''Relative Payoffs'' and ''Relative Payoff Â Treatment'' can be interpreted as the effect in treatment B. 16 Preferences with respect to the 2-color urn were positively correlated with coin acts, but this relationship was not significant. 17 The two-color measure is only marginally significant. This may be because there is always the possibility of misclassifying ambiguity averse subjects if they have asymmetric priors.

Conclusion
Based on our hypothesis, subjects in our experiment should not have chosen coin acts according to the Anscombe-Aumann framework. However, more than onethird of our subjects did. Given that coin acts made up precisely one-third of the options available to subjects, attributing these choices to measurement error would imply that practically all subjects erred in their selection or were indifferent between options (despite the payoff asymmetry). The latter seems particularly unlikely given the results from the questionnaire that evidence a sensible pattern of preferences; Ambiguity averse subjects chose the coin acts more often than ambiguity neutral/ loving subjects. Therefore, we are left to assume that subjects expressed a meaningful preference for the coin acts, contradicting our hypothesis. Many subjects, it seems, did not view both the coin and hedge acts as 50/50 propositions, or, if they did, there was some other factor that affected preferences, but was not Adjusted R 2 0.170 0.267 0.349 *,**,*** Significant at the 10%, 5%, and 1% level, respectively. Standard errors in parentheses modeled. Either way, models using the Anscombe-Aumann framework were unable to correctly explain a large portion of subject decisions in our setting. While it may seem then that we are left endorsing the Savage framework, we stop short of such an endorsement. While we do not find violations within the choice data, since any choice is plausibly supported in the Savage framework, neither do we find positive support for the framework in subjects' written explanations of their act choices. When asked to explain the act they chose, fewer than 5% of subjects were coded as having discussed the combined outcomes that make up the state space in the Savage framework. Instead, subjects tended to reference the coin flip and the ball draw independently. Of course, subjects need not express the particulars of a framework in writing to employ that framework in their decision-making. Therefore, we see our results as neutral with respect to the Savage framework, and leave the door open to the possibility that subjects adhere to a framework which we failed to consider in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.
Funding Open Access funding enabled and organized by Projekt DEAL.