Resolving Rabin’s paradox

We present a theoretical model of Rabin’s famous calibration paradox that resolves confusions in the literature and that makes it possible to identify the causes of the paradox. Using suitable experimental stimuli, we show that the paradox truly violates expected utility and that it is caused by reference dependence. Rabin already showed that utility curvature alone cannot explain his paradox. We, more strongly, do not find any contribution of utility curvature to the explanation of the paradox. We find no contribution of probability weighting either. We conclude that Rabin’s paradox underscores the importance of reference dependence.


Introduction
Imagine that you turn down a 50-50 gamble of losing $10 or gaining $11, and you happen to be an expected utility maximizer. Then you will find yourself (absurdly) turning down any 50-50 gamble where you may lose $100, no matter how large the amount you stand to win. This was Rabin's (2000) paradox, which demonstrated how an innocuous preference has a surprising implication that strongly challenges the empirical validity of expected utility.
Rabin's paradox, abbreviated RP henceforth, is a thought experiment designed for the purpose of thinking through the consequences of expected utility theory. It demonstrates that these consequences are absurd. On one level, RP shows a problem with the logic of expected utility theory. Yet, on another more important level, the paradox reflects a peculiarity of human behavior-the way that people are both risk averse "in the small" and "in the large." The question of why people behave this way has not been definitively answered. While Rabin's thought experiment is useful, it must be distinguished from a real experiment, which is the only way to discover a plausible answer as to why it is people behave the way they do empirtically. Such an answer is of fundamental importance to economics as a science that makes predictions about human behavior. Therefore in this paper, we do not treat Rabin's seeming contradiction as a technical problem, but rather as a psychological one to be solved empirically.
Rabin's thought-provoking paradox at first led to theoretical discussions about whether it truly violates expected utility and, if so, what might explain this violation. Rabin suggested that his paradox may provide an argument not only against expected utility but, more generally, against reference independence and thus against all traditional decision models. Several authors (referenced later) tried to rescue reference independence by suggesting other theoretical explanations, such as probability weighting, disappointment aversion, background risks, or utility of income. The main purpose of our paper is to resolve RP empirically. We show that his suggestion is right, and reference dependence explains his paradox. Other deviations from expected utility, while useful in many contexts, do not contribute empirically to explaining RP.
The theoretical debate of RP was complicated by differences in terminology: (a) Rubinstein (2006) suggested that the term "expected utility" incorporate reference dependence; 1 (b) utility of income was an alternative term for reference dependence (see Figure 1). Wakker (2010 pp. 244-245) reviewed early debates.
Our §VII gives recent references and further details. The abundance of theoretical debates and semantic confusions have been barriers to the resolution of RP. Now, 17 years after its appearance, RP has turned into a classic and its meaning should be settled, theoretically and empirically. We will introduce a theoretical model that can disentangle the various potential causes of RP, and then the experimental stimuli that allow to identify the real cause. Cox et al. (2013), Csvd hereafter, were the first to provide empirical evidence of the assumed preference patterns in RP. They also provided theoretical results showing exactly when Rabin's calibration paradox refutes various reference-independent theories, including expected utility. Thus, they were the first to conclusively show that RP is a genuine violation of expected utility. However, they did not identify the causes of RP. Our study does so. Rabin (2000) already showed that utility curvature cannot completely explain RP. We show that utility curvature does not play any empirical role at all. Several authors showed that other deviations from expected utility, primarily probability weighting may explain RP theoretically, although these deviations have their own problems. 2 Csvd's data did not provide conclusive evidence on probability weighting, and their formal and empirical analyses (of the traditional RP) did not involve reference dependence. We show that probability weighting, like utility curvature, does not play any empirical role at all in explaining RP, and neither do other reference-independent deviations from expected utility. Rabin (2000) conjectured that loss aversion, necessarily involving reference dependence, is the main cause: Indeed, what is empirically the most firmly established feature of risk preferences, loss aversion, is a departure from expected-utility theory that provides a direct explanation for modest-scale risk aversion. Loss aversion says that people are significantly more averse to losses relative to the status quo than they are attracted by gains, and more generally that people's utilities are determined by changes in wealth rather than absolute levels. (p. 1288) Other authors also suggested loss aversion as an explanation (Csvd p. 307;Lindsay 2013;Park 2016;Wakker 2010 p. 244), but no analysis to date formalized or tested this conjecture. 3 We do so by incorporating reference dependence in our theoretical model and by empirical tests. 4 Thus we can settle the case. We thus also confirm that utility of income explains RP, in agreement with suggestions by Cox and Sadiraj (2006) and others. We have thus demonstrated that RP shows a genuine deviation from basic classical economic principles, providing one of the strongest arguments for the modern behavioral approach to economics.

Notation and Definitions
We consider only two-outcome prospects. By we denote a prospect yielding outcome with probability and outcome with probability 1 − .
Outcomes are money amounts. In reference-independent models, outcomes refer to final wealth and are denoted in bold by Greek letters or real numbers. The initial wealth, which is the final wealth level when subjects enter the laboratory in our experiment, is denoted 0, as has been customary in classical referenceindependent models. It is fixed throughout the analysis and experiment.
By ≽ we denote a preference relation over prospects. A utility function maps outcomes to the reals and is strictly increasing and continuous. The expected utility ( ) of a prospect is Expected utility holds if there exists a utility function such that preferences maximize EU.
We next define the most general theory considered in this paper, prospect theory (Tversky and Kahneman 1992), and then specify other theories as special restricting the small-scale risk aversion choices and the background risks assumed. Their footnote 8 points out that the empirical measurement of Neilson's weighting function remains as a problem. We will solve this problem.
cases. Prospect theory assumes that for every choice situation subjects perceive a particular final wealth level as their reference point, which we denote .
Commonly, the reference point is the status quo, but it can change within the analysis, for instance due to different framings. This is the crucial difference between the reference point and initial wealth, which is fixed throughout the  (2) The parameters , + , − , and can in principle depend on the reference point . However, they will be stable under small changes of such as in our experiment, and we therefore assume that they are independent of . 5 The loss aversion parameter can be incorporated into utility by writing ( ) = ( ) for ≥ 0 and ( ) = ( ) for ≤ 0.
will typically have a kink at 0. We usually denote the reference point as a subscript of the preference symbol rather than of the outcomes. If the reference point has been specified, we may therefore write instead of . Utility of income is the special case where there is no probability weighting, i.e., + ( ) = − ( ) = . Thus, it generalizes expected utility by incorporating reference dependence, maintaining expected utility given a fixed reference point.
We now turn to reference independent special cases of PT. The first special case we consider is rank-dependent utility (RDU). It assumes + ( ) = 1 − − (1 − ) = ( ) and = 1 (so that = ). The main restriction is that, following EU, RDU assumes reference independence: outcomes are described in terms of final wealth. This can be formalized by assuming that the reference point is fixed at 0. 6 We get Probability weighting under RDU is sign-independent. For gains we have ( ) = + ( ) but for losses we have a dual ( ) = 1 − − (1 − ). is the special case where + ( ) = − ( ) = ( ) = . Then the weighting functions are identical to their duals.
For two-outcome prospects as used in our experiment, nearly all existing reference-and sign-independent nonexpected utility theories are special cases of RDU and, consequently, of PT (Wakker 2010 §7.11). Such theories include the reference-independent version of original prospect theory (Kahneman and Tversky 1979), the second-most cited paper in economics (Coupé 2003), and disappointment aversion theory (Gul 1991). Hence, the analysis of this paper covers all risk theories that are popular today.

Dependent versus Reference-Independent Modeling
Although the formalization of reference-dependence defined in the preceding section has been used in many contexts, it has not yet been used to analyze RP, probably because of the controversial discussions of this paradox.
This section shows how, using this formalization, we can identify and isolate potential causes of the paradox. Figure 1 displays the choices in RP.
Rabin assumed that people reject a 50-50 prospect of winning 11 or losing 10 ( Fig. 1a: basic (final-wealth) preference). With the natural status quo of 0, this assumption is empirically plausible for different subjects at different wealth levels; that is, in a "between"-subject sense. It then is also plausible in a "within"-subject sense, i.e., for one subject at different wealth levels. For instance, if for a given subject in our experiment, the basic preference holds for most subjects €11 richer than her, then it is likely to also hold for this subject if she were €11 richer. We call this argument the between-within argument. This way, Rabin's claims can be confirmed without implementing, experimentally : chance nodes-e.g., Fig. 1a displays a preference 11 0.5 (10)  0. , − , , : final wealth. 11, −10, 0, : changes with respect to reference points. ≼ : is reference point Reference-depen-dent models equating them also with the wealth-change preference in Fig. 1b. This is indicated by the brace in the figure below these three figures. It explains EU's "between-within" move from the basic preference to the wealth-change preference. Such a move, via the equivalence between Fig. 1d1 and Fig. 1d2, leads to highly risk averse preferences that cannot be accommodated by EU.
Theoretically, many explanations of the RP have been considered. Under theories that maintain reference independence, one potential cause is that not only utility curvature but also probability weighting contributes to risk aversion (for instance under RDU). Under other theories, such as prospect theory, reference dependence is a potential cause. Then people treat reference-change preferences and outcome-change preferences differently. Then the move from Fig. 1d1 to We used a brace below Figs 1a and 1c to indicate that reference-independent theories do not distinguish between these two figures, similarly as they do not distinguish between Figs 1b, 1d1, and 1d2. In particular, background risks play no role if they are incorporated into the reference point as in Fig. 1d1 rather than in outcomes as in Fig. 1d2. The impossibility to distinguish between figures above one brace has hampered the debates in the literature using referenceindependent theories.

Rabin's Paradox as a Violation of Expected Utility
Because framing is central to the resolution of RP, we discuss the different frames that constitute our experimental stimuli jointly with our theoretical analyses. The stimuli were devised based on our theoretical predictions, which is why we present the stimuli and predictions successively.
We use the framing in Figure 2 to test Rabin's basic preference (Figs 1a and   1c). We use an accept-reject ("Yes-No" in the stimuli) formulation because this leads to most reference dependence and loss aversion (Ert and Erev 2013), and gives the strongest possible test of classical theories. Our prediction, in agreement with common views on risk attitudes (Tversky and Kahneman 1992) and Csvd's findings, is: PREDICTION 1. A strong majority will reject (choose "no") in Figure 2.
IMPLICATION. Expected utility with concave utility is falsified.
EXPLANATION. As explained in §II, if the prediction holds true, then the preferences in Fig. 1d1 are also plausible and, hence, the preferences in Would you play the following prospect?
would imply rejection of the prospect 0.5 (− ) if the wealth-change preferences ( Fig. 1b) hold for all ∈ [− , ] (Rabin 2000(Rabin p. 1282). This is absurd. It therefore entails a violation of expected utility. Factors other than utility curvature are needed to explain the rejection in Figure 2. 

Nonexpected Utility Theories as Failed Attempts to Preserve Reference Independence
The main attempt to save reference independence from RP came from explanations based on probability weighting, the other component in prospect theory to deviate from expected utility. That is, RDU was used to explain RP.
Crucial for Rabin's calibration in §III is that the weight of the gain 11 is the same as the weight of the loss − . To achieve these equal weights under RDU, for each subject we measured the probability such that
No, I don't.
Would you play the following prospect?  €10 €11 (− ) to each subject, where r was their individual value measured in Eq. 7. This gives the desired equal weighting of outcomes under RDU. 10 The offered prospect was more favorable than Rabin's prospect if > 0.5, which was the typical case. Figure 3 displays the framing used for a subject with = 0.63. The crucial point here is to use a framing that induces the right reference point and loss aversion. For this purpose we again use the accept-reject framing. 11 Hence we have: PREDICTION 3. A majority will reject ("No") in Figure 3.
IMPLICATION. RDU is fails as an explanation of Rabin's Paradox.
EXPLANATION. Under RDU with linear or slightly concave utility, subjects should accept the prospect offered, contrary to Prediction 3. This shows that RDU's correction for probability weighting does not remove all risk aversion. Neilson (2001) showed that utility curvature cannot explain the remaining risk 10 The condition in Footnote 14 of Csvd is now satisfied and, according to their Corollary 1.1, calibration implications for utility are possible. 11 Previously, one of us missed this point when he did not distinguish between the reference changes used in our experiment and the outcome changes he had in mind (Wakker 2010 p. 245 2nd para). Such confusions are likely to happen if authors think too much in terms of traditional reference independent models. aversion by deriving utility calibration paradoxes for RDU. 12 There must be factors beyond RDU.  On our domain of two-outcome prospects, nearly all reference-independent nonexpected utility theories agree with RDU (see end of §I). Hence, none of those theories can explain RP either. We therefore turn to reference-dependent theories in the next section, where we will also allow probability weighting to be different for gains and losses, which is empirically desirable. Our experiment will later show that probability weighting plays no empirical role in RP.
To avoid misunderstanding, we clarify here that our study does not claim that probability weighting would be unimportant. Many studies have demonstrated its importance (Barseghyan et al. 2013;Fehr-Duda and Epper 2012;Tversky and Kahneman 1992;Wakker 2010). We claim only that probability weighting plays no role in RP. To further illustrate our point, consider an alternative paradox, similar to RP and with similar calibration implications for utility. It could be constructed if subjects had preferences 21 0.5 0 ≼ 10 at all or many wealth levels, while perceiving all outcomes as gains. Then loss aversion could play no role and probability weighting would drive the paradox. We will in fact test this preference later (Fig. 5b) and find that it may exist, but is considerably weaker than with Rabin's stimuli. Our only claim about probability weighting is that for the focus of this paper, RP, probability weighting plays no role. This claim is not our main purpose, but only serves as an intermediate tool for what is our main and positive purpose: to show the importance of reference dependence. 12 Our choice of rules out the theoretical possibility discussed in §III for the second Indian group in Csvd that strong probability weighting could still explain the risk aversion.

Rabin's Paradox
Many studies have confirmed reference and sign dependence, entailing violations of RDU 13 , although there continue to be debates (Isoni, Loomes, and Sugden 2011;Plott and Zeiler 2005). Sign dependence means that risk attitudes are different for losses than for gains. Whereas probability weighting is mostly pessimistic for gains, with prevailing underweighting of the probabilities of best outcomes, for losses the opposite holds, with prevailing optimism and underweighting of the probabilities of worst outcomes. This is called reflection and it falsifies RDU. It also implies that the correction for probability weighting under RDU in Figure 3 is not correct. To obtain Rabin's calibration argument for utility, which involves the same decision weights for the two outcomes, we should, according to prospect theory, measure for each subject the probability such that Details are in the Appendix. Because RDU is a special case of prospect theory, it predicts = . Under RDU, Eq. 8 can be used as an alternative way to find the required (= ) of Eq. 7. However, based on the common findings of reflection we predict:  Loss aversion thus leads to strong risk aversion and can readily explain the preference in Figure 3 and the strong preferences in Figures 2 and 4 for any plausible probability weighting and utility curvature. Outside of prospect theory, deviations from expected utility proposed in the literature usually have not considered sign dependence. For our stimuli they mostly agree with RDU. Thus, they concern Prediction 3 in the preceding section and were discussed there.
To obtain direct support for reference dependence, we tested the referencechange and outcome-change preferences. In Fig. 5b, the outcome-change preference cannot be formulated as an accept-reject decision and was formulated as a binary choice. To have a clean test of reference dependence, we therefore also framed the reference-change question in Fig. 5a as a binary choice. This change in framing will probably reduce loss aversion and, hence, risk aversion somewhat. To make the framings and procedures as similar as possible, we also added the prior endowment of €1 in Fig. 5b, which by normative standards should be negligible. Finally, we used the probabilities of Eq. 8 instead of 0.5 to neutralize probability weighting and focus on reference dependence. By Prediction 4 these probabilities will not have a systematic effect on risk aversion and Figs. 5a and 5b also test Figs. 1d1 versus 1d2.
The two figures differ only in the way that final outcomes are split into reference point and change with respect to reference point. Our analysis is based on the assumption that: (a) the reference point in Fig. 5a has the additional payment incorporated; (b) accordingly, the outcome −€10 in Fig. 5a is perceived as a loss; (c) in Fig. 5b, the status quo of €0 is the reference point so that no losses are perceived. Our assumption is the most common one for reference points and for If this question is selected to be played out for real, you will get an additional payment of €11 in your bank account. If this question is selected to be played out for real, you will get an additional payment of €1 in your bank account. ways to induce them in experiments (de Martino et al. 2006;Fehr-Duda et al. 2010;Kuhberger 1998;Tversky and Kahneman 1992). It is crucial for the common incentivization of losses with prior endowments (Vieider et al. 2015), and for endowment effects such as underlying WTP-WTA discrepancies (Sayman and Öncüler 2005). In the well-known model of Köszegi and Rabin (2006), future expectations serve as reference points, but only if choices have been anticipated sufficiently far ahead in time, and not if they come as a surprise. In our experiment, subjects did not know beforehand what the choices would be.
Our assumption will, of course, not hold for all subjects, and several subjects will perceive various other reference points, such as the sure outcome €10 depicted in Fig. 5b. It suffices that our assumption holds for most subjects.
In Fig. 5b, loss aversion does not play a role for most subjects and, therefore, risk aversion will be lower. Yet risk aversion can still be expected because of probability weighting which is pessimistic for gains. 14 Most subjects will take Fig. 5a as Fig. 1d1, and they will be as strongly risk averse as in the basic preferences in Figure 2. Some subjects will integrate payments and take Fig. 5a as Fig. 1d2, which reduces risk aversion. We summarize our claims: PREDICTION 6. A majority of subjects will reject (choose the sure Prospect B) in Figs. 5a and 5b, but fewer than in Figure 2, and the fewest in Fig. 5b.
IMPLICATION. The difference in risk aversion between Figs. 5a and 5b falsifies reference independence. 

Our Experimental Findings
Subjects: N = 77 students (29 female; average age 22) from Erasmus University Rotterdam participated, in four sessions. Most were finance bachelor students.
Incentives: Each subject received a €10 participation fee. In addition, we randomly (by bingo machine) selected two subjects in each session and for each played out one of their randomly selected choices for real consequences. The selections were implemented in public by a volunteer. The payoff was paid immediately after the experiment. The experiment lasted about 45 minutes and the average payment per subject was €15.70.

Procedure:
The experiment was computerized. Subjects sat in cubicles to avoid interactions. They could ask questions at any time during the experiment.
Training questions familiarized subjects with the stimuli. Subjects could only start after they had correctly answered two comprehension questions.
Stimuli: Probabilities were generated by throwing two 10-sided dice. Details are in the Online Appendix 15 . We first measured the probability (Eq. 7). Then we asked the two accept-reject questions of Figures 2 and 3, followed by the measurement of (Eq. 8). We finally asked the accept-reject question of Figure   4 and the two questions of Figs. 5a and 5b, with the order of these three questions counterbalanced.
As a byproduct in the measurement of , we also measured utility. We found linear utility, which is plausible for the moderate amounts in our experiment.
Thus, whereas Implication 1 shows that utility curvature cannot entirely explain RP, we find that it does not contribute to explaining RP at all. PREDICTION 6 [reference-versus outcome-change preference]: 78% rejected in Fig. 5a (p-value < 0.001; binomial test), and 62% rejected in Fig. 5b (p-value = 0.08; binomial test). The latter is smaller than the former (p-value = 0.04; McNemar test).

Discussion of Experimental Details
Our experiment involved some adaptive (chained) stimuli, where answers given to some questions affected later stimuli, for instance regarding the probabilities and in Figures 3 and 4. It was practically impossible for subjects to see through this procedure. Further, even if the procedure were seen through, it would be practically impossible to then also see if and how manipulation could be beneficial. Hence, manipulation is, in the terminology of Bardsley et al. (2010 pp. 265, 285), only a theoretical possibility but is practically impossible.
Counterbalancing is commonly used to avoid order effects, but can complicate a design for subjects and the analyses done after, and can increase noise. Hence, it is used only to avoid the major risks of order effects. We felt that Figs 5a and 5b were most vulnerable here. We therefore counterbalanced their presentation, combined with Figure 4. For the other stimuli, we saw no concrete reason to expect biases due to order effects, and we did not involve them in counterbalancing. We could also have avoided order effects by using betweensubject designs, rather than the within-subject design as used. The pros and cons of these two designs have often been debated Camerer (1989 p. 85), where a between-subject design avoids order effects but a within-subject design gives more statistical power and can test more hypotheses. In our case, there were many practical difficulties for a between-subject design. If it had been embedded in sessions with other experiments, then those other experiments could have induced spillover effects similar to the order effects to be avoided. If we had implemented a between-subject design in isolation in, then, necessarily short experiments, the payoff per subject's time unit would have exceeded the upper bound imposed in our labs to avoid negative externalities for other experiments. Markowitz (1952) was among the first to propose reference dependence.

Preceding Literature
Other early works include Shackle (1949 Ch. 2 on sign-dependence) and Edwards (1954 p. 395 & p. 405). Edwards later influenced the young Tversky. Arrow (1951 p. 432) discussed reference dependence, pointing out that it plays no role when outcomes refer to final wealth, and criticizing it for this reason. An early appearance of loss aversion is in Robertson (1915 p. 135). Markowitz did not incorporate probability weighting and made empirically invalid conjectures about utility curvature. Prospect theory corrected these points and was the first reference-dependent theory that could work empirically. Wakker (2010 pp. 244-245) surveyed early discussions of RP. Since then, Johansson-Stenman (2010) presented a theoretical analysis of RP for life-time consumption, Barseghyan et al. (2013Barseghyan et al. ( pp. 2526Barseghyan et al. ( -2527 discussed an explanation based on probability weighting, and Golman and Loewenstein (2015) suggested a cognitive model to explain it. Csvd investigated RP systematically, following up on their theoretical analysis in Cox and Sadiraj (2006). Csvd were the first to confirm RP empirically and establish it as another falsification of expected utility.
They also provided a detailed theoretical analysis under RDU (their Eq. NL-1), with probability weighting as the deviation from EU. Outcomes were taken reference-independent, in terms of final wealth; i.e., they were changes w.r.t. the wealth level upon entering the lab. Csvd pointed out that RDU is a special case of prospect theory (fixed reference point; sign-independent probability weighting), so that this special case of PT is also covered by their analysis.
Csvd provided theorems that exactly identify the utility functions and probability weighting functions that lead to Rabin's calibration paradoxes under RDU for various potential empirical preferences. They thus showed exactly what more is needed to analyze the role of probability weighting in future studies. We followed up on their results. In particular, we measured and corrected for probability weighting in RDU to find out to what extent it accommodates RP empirically.
In their experiments, Csvd used large outcomes, incentivized through an arrangement with a casino with small but positive probabilities of actual implementation. For 41 German students they found majority preferences , . , . , and . Thus they overwhelmingly confirmed preferences as in Fig. 1d2 for a wide enough range of wealth levels to imply RP for expected utility and thus establish it as a genuine empirical violation. part of the answer to this general question. As regards normative implications, there is wide, though not universal, agreement that reference dependence-taken as a framing effect-is irrational, and that it is more irrational than probability weighting. Probability weighting only violates the von Neumann-Morgenstern independence axiom as in Allais' paradox. Such violations are considered to be rational by Machina (1982) and many others. Hence, RP provides a more serious deviation from classical rationality assumptions than previously thought. This conclusion is supported by Benjamin, Brown, and Shapiro (2013), who found a negative relation between RP choices and cognitive ability. Proper behavioral risk models are therefore warranted to analyze and predict the behavioral consequences of human risk attitudes (Dohmen et al. 2011). Rabin's (2000) paradox is one of the most famous paradoxes in the modern economic literature. It is commonly, although not universally, accepted as negative evidence against classical expected utility (Kahneman 2003 p. 164). Its cause had not yet been identified, so that no positive inference had been derived yet. We identify this cause and provide a positive inference: RP proves that we need reference dependent generalizations of classical models, and it does so more strongly than any other paradox did before. Other deviations from expected utility do not contribute to explaining Rabin's paradox. This confirms that utility of income does explain the paradox.

Appendix. Measurement of and
We derived all indifferences in our experiment from choices through bisection procedures (Online Appendix). To measure the probability in Eq. 7 and obtain an estimate of utility curvature, we iteratively elicited four indifferences, 0.5~− 1 0.5 ( = 1, … ,4), where we chose = 3, = 16, and 0 = 25. 0.58, and 0.63. By a Friedman test their differences are significant (p-value = 0.045), which can be taken as a rejection of RDU. For we took the average of these three .