Risk and rationality: The relative importance of probability weighting and choice set dependence

The literature suggests that probability weighting and choice set dependence influence risky choices. However, their relative importance remains an open question. We present a joint test that uses binary choices between lotteries provoking Common Consequence and Common Ratio Allais Paradoxes and manipulates their joint payoff distribution. We show non-parametrically that probability weighting and choice set dependence both play a role at describing aggregate choices. To parsimoniously account for heterogeneity, we also estimate a structural model using a finite mixture approach. The model uncovers substantial heterogeneity and classifies subjects into three types: 38% Prospect Theory types whose choices are predominantly driven by probability weighting, 34% Salience Theory types whose choices are predominantly driven by choice set dependence, and 28% Expected Utility Theory types. The model predicts type-specific differences in the frequency of preference reversals out-of-sample, i.e., in choices with a different context than the ones used for estimating the model. Moreover, the out-of-sample predictions indicate that the choice context shapes the influence of choice set dependence.


Introduction
The past decades of economic research on choice under risk have revealed systematic violations of expected utility theory (EUT;von Neumann & Morgenstern, 1953). As exposed in the famous Allais Paradoxes, subjects frequently violate EUT's independence axiom as they exhibit both risk loving and risk averse behavior (Allais, 1953). For example, many individuals are risk loving when buying state lottery tickets and risk averse when buying damage insurance (Cicchetti & Dubin, 1994;Forrest et al., 2002;Garrett & Sobel, 1999;Sydnor, 2010). Moreover, subjects often reverse their choice when they have to choose between two lotteries or evaluate them in isolation (Lichtenstein & Slovic, 1971;Lindman, 1971). Some of these preference reversals violate EUT's transitivity axiom (Cox & Epstein, 1989;Loomes et al., 1991). These and other systematic violations of EUT have spurred the development of various alternative decision theories.
These alternative decision theories fit into two major classes. The first major class is based on probability weighting and postulates that subjects systematically overweight small probabilities and underweight large probabilites. The most prominent example is Prospect Theory (Kahneman & Tversky, 1979), subsequently generalized to Cumulative Prospect Theory (CPT; Tversky & Kahneman, 1992). CPT is the best-fitting model for aggregate choices in this class (Starmer, 2000;Wakker, 2010). 1 According to CPT, subjects are risk loving when buying a state lottery ticket because they overweight the small probability of winning and, conversely, risk averse when buying damage insurance because they underweight the large probability of not suffering any damage. However, CPT fails to explain preference reversals, since subjects always attach the same value to lotteries, regardless whether they have to choose among them or evaluate them in isolation.
The other major class of decision theories postulates that the evaluation of lotteries is choice set dependent. 2 Prominent members of this class are Salience Theory (ST; Bordalo et al., 2012b) and Regret Theory (RT; Loomes & Sugden, 1982). 3 We focus on ST as the primary example of a choice set dependent theory because it is becoming the main contender to CPT (Dertwinkel-Kalt & Koster, 2020). According to ST, individuals focus their limited attention on states of the world with large payoff differences between the alternatives. Hence, a lottery's value is choice set dependent as the weight attached to a state depends on the payoffs of the alternatives in that state. ST can also explain why individuals are often both risk loving and risk averse at the same time. They buy state lottery tickets because they overweight the state where they win the big prize due to the large payoff difference between buying the ticket and winning versus not buying the ticket; at the same time, they buy damage insurance, because they overweight the state in which the damage occurs due to the large payoff difference between being insured and uninsured in that particular state. In contrast to theories based on probability weighting, choice set dependent theories are able to describe preference reversals as they allow for violations of the transitivity axiom.
In this paper, we perform a joint test of the relative importance of probability weighting and choice set dependence. This test is relevant in various applications. On the one hand, ST can naturally explain several behavioral phenomena in consumer choice -such as the endowment effect - (Bordalo et al., 2012a(Bordalo et al., , 2013bDertwinkel-Kalt et al., 2017), the counter-cyclicality of risk premia on financial markets (Bordalo et al., 2013a), and how legally irrelevant information affects judicial decisions (Bordalo et al., 2015). But on the other hand, and in contrast to CPT, ST can describe the Allais Paradox only in certain choice sets.
We conduct a laboratory experiment to perform the joint test. Subjects face a series of binary choices between lotteries provoking three versions of the Allais Paradox. To discriminate between the different decision theories, we manipulate the joint payoff distribution of the lotteries. That is, every subject faces the lotteries of each binary choice twice. In one case, the lotteries' payoffs are independent of each other, while in the other, they depend on each other. This manipulation affects the joint payoff distribution of the lotteries but leaves their marginal payoff distributions unchanged. If risky choices are driven by probability weighting, the predicted frequency of Allais Paradoxes is the same, as subjects evaluate each lottery in isolation and focus exclusively on its marginal payoff distribution. Hence, theories based on probability weighting, such as CPT, can explain the Allais Paradox regardless of whether lotteries' payoffs are independent or dependent. However, if risky choices are driven by choice set dependence, the predicted frequency of Allais Paradoxes is positive with independent payoffs and zero with dependent payoffs. Thus, choice set dependent theories, such as ST, cannot explain Allais Paradoxes when payoffs depend on each other. We can also control for EUT preferences, as EUT can never explain the Allais Paradox.
To obtain our first main result, we analyze the importance of probability weighting and choice set dependence non-parametrically at the aggregate level, i.e., at the level of a representative decision maker. At the aggregate level, both choice set dependence and probability weighting play a role. Probability weighting plays a role, because Allais Paradoxes occur regardless whether lotteries' payoffs are independent or dependent. However, choice set dependence plays a role too, because Allais Paradoxes occur more than twice as often when lotteries' payoffs are independent than when they are dependent. This result holds for all three versions of the Allais Paradox.
As a next step, we estimate a structural model which follows a finite mixture approach. That is, it assumes that there exist three types of subjects: CPT-types whose behavior is mostly driven by probability weighting, ST-types whose behavior is primarily driven by choice set dependence, and EUT-types. The model characterizes each of these types by its relative size and average type-specific parameters. Moreover, it provides a classification of every subject into the type that best fits her choices. The finite mixture approach has three advantages. First, it allows us to take heterogeneity in risk preferences parsimoniously into account Conte et al., 2011), as it does not require estimating individual-specific parameters (Harless & Camerer, 1994;Hey & Orme, 1994;Starmer, 2000). Second, it can inform us about the specification best suited for representing each major class of decision models. Third, it yields type-specific preference parameters which can be used to calibrate theoretical models and make quantitative predictions about behavior across various choice contexts.
The structural model yields the second main result. It uncovers vast heterogeneity in subjects' risk preferences and classifies them into 38% CPT-types, 34% ST-types, and 28% EUT-types. The result shows that probability weighting and choice set dependence both play a similarly important role in describing the non-EUT-types' choices. It also highlights that the mix of types can be decisive for understanding and predicting aggregate behavior in applied contexts -such as consumer, investor, and judicial choice.
Finally, we use out-of-sample predictions to test the structural model's power to predict behavior across choice contexts. This is a crucial test since one can only use a structural model to predict behavior across contexts if the estimated preferences are stable. To perform the out-of-sample predictions, we expose subjects to additional lotteries that may trigger preference reversals. Subjects always first choose between two of these additional lotteries and, later, evaluate each of them in isolation. By analyzing the frequency of preference reversals in these additional lotteries, we can assess the model's power to predict behavior in choices with a different context than the ones we use for estimating the model.
The out-of-sample-predictions provide the third main result. The ST-types exhibit roughly 1.5 times more preference reversals than the CPT-and EUT-types, confirming that the ST-types' choices are mostly driven by choice set dependence. The structural model also predicts quantitative differences in the average frequency of preference reversals accurately across the three types. Moreover, the predictions suggest that choice set dependence is stronger when subjects trade off a sure amount against a lottery than when they have to choose between two lotteries.
The paper contributes to the literature that tests the performance of the major classes of decision theories. This literature has tested probability weighting and choice set dependence separately and found support for both. On the one hand, there is considerable evidence suggesting that risky choices depend on outcome probabilities irrespective of the choice set (for examples, see Camerer & Ho, 1994;Fehr-Duda & Epper, 2012;Kahneman & Tversky, 1979;Loomes & Segal, 1994;Starmer, 2000). On the other hand, the literature also recognizes that risky choices depend on the choice set. Early studies, qualitatively testing the predictions of RT, find "juxtaposition effects", i.e., that systematic manipulations of the lotteries' joint payoff distribution that leave their marginal distributions unchanged can affect subjects' choices (Battalio et al., 1990;Harless, 1992;Loomes & Sugden, 1987;Loomes, 1988;Starmer & Sugden, 1989). More recent studies, testing ST with the same paradigm, confirm the role of choice set dependence in non-incentivized choices on Mturk (Bordalo et al., 2012b), in choices between a lottery and a sure amount (Booth & Nolen, 2012), and in choices varying the lotteries' positive skewness (Dertwinkel-Kalt & Koster, 2020). The most closely related and complementary study to ours is by Frydman and Mormann (2018). They find that the frequency of Allais Paradoxes varies with three levels of dependency between the lotteries' payoffs and that the evaluation of lotteries changes if one adds an additional decoy lottery which subjects can see but not choose. They also estimate a structural model of ST separately for each subject and find that only half of their subjects exhibit ST's local thinking. This suggests that there is substantial heterogeneity and the different classes of decision theories should be tested jointly. Our paper contributes to this literature, as it is one of the first to perform such a joint test and assess the relative importance of probability weighting and choice set dependence in a heterogeneous population. 4 The paper also contributes to the strand of literature that uses finite mixture models to classify subjects into types. This literature has focused mostly on discriminating EUT from non-EUT preferences in a single choice context Conte et al., 2011;Fehr-Duda et al., 2010;Santos-Pinto et al., 2015). 5 Our second main result enhances this strand of literature by uncovering the relative importance of probability weighting and choice set dependence within the group of non-EUT subjects.
Finally, the paper performs out-of-sample predictions, allowing it to establish a link between the aforementioned papers about Allais Paradoxes and the literature on preference reversals (Cox & Epstein, 1989;Grether & Plott, 1979;Lichtenstein & Slovic, 1971;Lindman, 1971;Loomes et al., 1991;Pommerehne et al., 1982;Reilly, 1982). The out-of-sample predictions confirm that both phenomena result from choice set dependence. Moreover, the paper shows that a parsimonious structural model has predictive power across different choice contexts -which is essential for modeling and predicting subjects' behavior across different applications of choice under risk.
The paper has the following structure. Section 2 explains the strategy for discriminating between the different decision theories. Section 3 introduces the experimental design. Section 4 presents the non-parametric results at the aggregate level, while Sect. 5 discusses the structural model, its results, and its power to predict preference reversals in a different choice context. Finally, Sect. 6 concludes.

Discriminating between decision theories
This section describes our empirical strategy for discriminating between EUT, probability weighting, and choice set dependence. We focus on the two most descriptive behavioral theories, i.e., CPT representing probability weighting and ST 4 A concurrent study by Königsheim et al. (2019), uses finite mixture models to classify decisions instead of subjects. Despite this difference in methodology and the use of a substantially smaller set of lottery choices, this study also finds evidence for both CPT and ST, as well as a similar degree of local thinking in ST-decisions. 5 Harrison and Rutström (2009) also apply finite mixture models in order to distinguish EUT from non-EUT behavior. However, they classify decisions instead of subjects. Other studies have also used finite mixture models to analyze strategic decision making in various domains (for examples see Bruhin et al., 2019;El-Gamal & Grether, 1995;Fischbacher et al., 2013;Houser & Winter, 2004). representing choice set dependence. The empirical strategy (i) relies on a series of binary choices between lotteries that may trigger Common Consequence and Common Ratio Allais Paradoxes and (ii) manipulates the choice set by making the two lotteries' payoffs either independent or dependent of each other.
We explain the empirical strategy with the following binary choice between lotteries X and Y, taken from Kahneman and Tversky (1979), which may trigger the Common Consequence Allais Paradox: 6 Note that the two lotteries have a common consequence, i.e., a payoff z which occurs with probability p 2 in both lotteries. In this example, the Common Consequence Allais Paradox refers to the robust empirical finding that if z = 2400 , most individuals prefer Y over X, whereas if z = 0 , most individuals prefer X over Y.
Next, we show that EUT can never describe the Allais Paradox, CPT can always describe it, and ST can only describe the Allais Paradox when the payoffs of the two lotteries are independent but not when they are dependent.

EUT
According to EUT, the decision maker evaluates any lottery L with non-negative payoffs x = (x 1 , … , x J ) and associated probabilities p = (p 1 , … , p J ) as: where v is an increasing utility function over monetary payoffs with v(0) = 0. 7 Note that the value V EUT (L) only depends on the attributes of lottery L and not on the attributes of the other lotteries in the choice set. EUT cannot explain the Common Consequence Allais Paradox since, when comparing the values of the two lotteries V EUT (X) and V EUT (Y) , the term involving the common consequence, p 2 v(z) , cancels out. Hence, the decision maker's choice between X and Y does not depend on the value of the common consequence z.

CPT
According to CPT, the decision maker ranks the non-negative monetary payoffs of any lottery L such that x 1 ≥ … ≥ x J and evaluates the lottery as: where j is the decision weight attached to the value of payoff x j . As in EUT, the value V CPT (L) only depends on the attributes of lottery L, i.e., the decision maker evaluates the lottery in isolation. The decision weights are given by: where p k is payoff x k 's probability and w is the probability weighting function. Typically, the probability weighting function in CPT exhibits three properties (Fehr-Duda & Epper, 2012;Kahneman & Tversky, 1979;Prelec, 1998;Wakker, 2010): 1. Strictly increasing and satisfying w(0) = 0 and w(1) = 1 . This ensures that decision weights are non-negative and sum to one. 2. Inverse S-shape. The probability weighting function is concave for small probabilities and convex for large probabilities. This property ensures the decision maker overweights small probabilities and underweights large probabilities. It is necessary for CPT to be able to explain the Common Consequence Allais Paradox, as explained further below. 3. Subproportionality. For the probabilities 1 ≥ q > p > 0 and the scaling factor 0 < < 1 , the inequality w(q) w(p) > w( q) w( p) holds. Subproportionality is needed for CPT to be able to explain the Common Ratio Allais Paradox, as shown in Appendix A.
We now explain how CPT can describe the Common Consequence Allais Paradox in the choice between lotteries X and Y. When z = 2400 , the choice is: In this case, the decision maker tends to prefer Y over X. Due to the decision maker's tendency to overestimate small probabilities and underestimate large probabilities, the decision weight attached to the lowest payoff of X, 1 − w(0.99) , is larger than its objective probability p 3 = 0.01 , which renders X unattractive.
In contrast, when z = 0 , the choice is: In this case, the decision maker tends to prefer X over Y. Now, the decision weights of the two lotteries' highest payoffs, w(0.33) and w(0.34), are very close and, therefore, the decision is driven by the difference in utilities between v(2500) and v(2400) rather than the difference in probabilities. In sum, CPT can always explain the Allais Paradox because the decision weights depend non-linearly on the marginal payoff distribution of the lottery under consideration, which remains unchanged regardless whether the lotteries' payoffs are independent or dependent.

ST
According to ST, cognitive limitations cause the decision maker to be a local thinker who focuses her attention on states of the world in which one payoff stands out relative to the payoffs of the alternative. The decision maker overweights these salient states relative to the others. As the salience of a state directly depends on the payoffs of the alternative, a lottery's value is choice set dependent and -in contrast to EUT and CPT -lotteries are no longer evaluated in isolation.
Formally, if the decision maker has to choose between two lotteries L 1 and L 2 , she ranks each possible state s ∈ {1, … , S} according to its salience (x 1 s , x 2 s ) , where x 1 s and x 2 s are the payoffs of L 1 and L 2 , respectively, in state s. The salience function satisfies four properties: Ordering implies that states with bigger differences in payoffs are more salient. 2. Diminishing Sensitivity. For any > 0 , (x 1 s , x 2 s ) > (x 1 s + , x 2 s + ) . Diminishing sensitivity implies that, for states with a given difference in payoffs, salience diminishes the further away from zero the difference in payoffs is. 3. Symmetry: (x 1 s , x 2 s ) = (x 2 s , x 1 s ) . Symmetry implies that permutations of payoffs between lotteries leave the salience of a state unchanged. 4. Zero Contrast. For two states s and s where x 1 s = x 2 s and x 1 s ≠ x 2 s , (x 1 s , x 2 s ) < (x 1 s , x 2 s ) . Zero contrast implies that if two lotteries offer the same payoff in a particular state, this state is the least salient.
The decision weight of each state s depends on the state's salience-rank, r s ∈ {1, … , S} with lower values being associated with higher salience: where p s is the probability that state s is realized, and 0 < ≤ 1 is the decision maker's degree of local thinking. For = 1 , the decision maker weights states by their objective probabilities, whereas, for < 1 , the decision maker is a local thinker and overweights salient states. This yields the following values for lotteries L 1 and L 2 : Journal of Risk and Uncertainty (2022) 65:139-184 and Note that the value of each lottery depends on both lotteries in the choice set {L 1 , L 2 }. We now explain how ST can describe the Common Consequence Allais Paradox in the choice between lotteries X and Y when their payoffs are independent of each other. When z = 2400 , there are three states of the world which rank in salience as follows: (0, 2400) > (2500, 2400) > (2400, 2400) . The decision maker prefers Using v(0) = 0 and the decision weights given by Eq. (2), the condition for preferring Y over X becomes: Intuitively, lottery X provides the lowest payoff in the most salient state which makes lottery Y relatively attractive despite having a lower expected payoff. Hence, when the common consequence is z = 2400 and the degree of local thinking is severe enough, the decision maker prefers Y over X.
Now, lottery X provides the highest payoff in the most salient state. Hence, when the common consequence is z = 0 and the degree of local thinking is severe enough, the decision maker prefers X over Y.
We now turn to the case in which the two lotteries' payoffs are dependent of each other. In that case, ST can no longer describe the Common Consequence Allais Paradox. When the two lotteries' payoffs are dependent of each other, there are just the following three states of the world: The ranking in terms of salience of these three states, (0, 2400) > (2500, 2400) > (z, z) , is independent of the common consequence z. Hence, regardless of the common consequence, the decision maker tends to prefer Y over X, and the Common Consequence Allais Paradox can no longer be described by ST when the lotteries' payoffs are dependent of each other.
In sum, ST can explain the Allais Paradox only when the lotteries' payoffs are independent but not when they are dependent of each other. This is because decision weights depend on the joint payoff distribution of the two lotteries in the choice set, which changes when we manipulate the lotteries' joint payoff distribution. Table 1 summarizes the empirical strategy to discriminate between EUT, probability weighting, and choice set dependence. EUT can never explain the Allais Paradox. In contrast, probability weighting -represented by CPT -can explain the Allais paradox regardless whether the lotteries' payoffs are independent or dependent. Finally, choice set dependence -represented by ST -can explain the Allais paradox only when the lotteries' payoffs are independent but not when they are dependent of each other.

Experimental design
This section presents the experimental design which consists of two parts. In the main part, subjects make choices that may trigger three versions of the Allais Paradox: the classical and a generalized version of the Common Consequence Allais Paradox as well as the Common Ratio Allais Paradox. Based on these choices, we discriminate between EUT, probability weighting, as well as choice set dependence, and classify subjects into EUT-, CPT-, and ST-types, respectively. In the additional part, subjects make choices that could lead to preference reversals which allow us to validate our results in a different choice context.

Main part
We now present the main part of the experiment. First, we explain how we construct the series of binary choices. Subsequently, we describe the two formats which we use to present the binary choices to the subjects.

Choices between lotteries
Every subject goes through two blocks of binary choices between lotteries that may trigger the Allais Paradoxes. Both blocks feature the same lotteries, except that in one block the lotteries' payoffs are independent while in the other they are dependent.
The binary choices within each block feature lotteries that vary systematically in payoffs and probabilities. This systematic variation not only allows us to estimate the parameters of each decision theory in the structural model but also ensures that our results are not driven by a particular set of lotteries. To avoid order effects, we randomize (i) the order of the binary choices within each of the two blocks across subjects and (ii) counterbalance the order of the two blocks across subjects.
The binary choices that may trigger the classical and the generalized version of the Common Consequence Allais Paradox are based on a 3 × 3 × 2 design. The design uses the following three different payoff levels: Varying the payoffs across these three levels while keeping probabilities constant identifies the curvature of the utility function, v. Similarly, the design features three different probability distributions, p = (p 1 , p 2 , p 3 ) , over the lotteries' payoffs: Varying the probability distributions while keeping the lotteries' payoffs constant identifies the shape of probability weighting function, w, in CPT and the degree of local thinking, , in ST. Finally, the design uses the following levels of the common consequence, z, to trigger the two versions of the Common Consequence Allais Paradox: 1. z = x 3 , i.e., the common consequence is equal to the lowest payoff of lottery X.
In this case, lotteries X and Y offer two payoffs each. 2. z = y 1 , i.e., the common consequence is equal to the first payoff of lottery Y. In this case, lottery X offers three payoffs and lottery Y is a sure amount. 3. z is different from any other payoff of the two lotteries and slightly below the first payoff of lottery Y. 8 In this case, lottery X offers three payoffs and lottery Y offers two payoffs.
The first two levels of the common consequence, 1 and 2a, trigger the classical version of the Common Consequence Allais Paradox, as described in the previous section. The first and the third levels, 1 and 2b, trigger a generalized version of the Common Consequence Allais Paradox. The advantage of this generalized version is that the lottery Y does not degenerate into a sure amount which could lead to a specific certainty effect. However, the disadvantage of this generalized version is that, if lottery payoffs are independent, subjects have to consider 2 × 3 = 6 possible states of the world resulting in higher cognitive load. To expose subjects to an even broader variety of decision situations, the design also includes binary choices that may trigger the Common Ratio Allais Paradox. These choices are based on a similar 3 × 3 × 2 design, as shown in Appendix B. To provoke the Common Ratio Allais Paradox, the design scales down probability levels but keeps the lotteries' payoffs unchanged. Moreover, as before, it manipulates the lotteries' joint payoff distribution to discriminate between the different classes of decision theories. While CPT can describe the Common Ratio Allais Paradox regardless of the lotteries' joint payoff distribution, ST can describe it only when payoffs are independent but not when they are dependent (see Appendix A for details). The mechanism in ST works as follows: when payoffs are independent, the decision maker's evaluation of the lotteries depends on the salience of the states as well as their objective probabilities. However, when payoffs are dependent, her evaluation no longer depends on the objective probabilities. This mechanism is arguably Probability Distribution 1: p = (0.33, 0.66, 0.01) Probability Distribution 2: p = (0.30, 0.65, 0.05) Probability Distribution 3: p = (0.25, 0.60, 0.15).
more subtle than the one behind the Common Consequence Allais Paradox, as the lotteries' payoffs in each binary choice remain unchanged.

Presentation format
We present the binary choices between lotteries in two formats, the "canonical presentation" and the "states of the world presentation", exposing half of the subjects to either of them. The two formats differ in the way they present the choices between lotteries with independent payoffs to the subjects. In the canonical presentation, as shown by the screenshot in Panel (a) of Fig. 1, the lotteries X and Y are presented side by side as separate lotteries with independent payoff distributions. In the states of the world presentation, as shown by the screenshot in Panel (b) of Fig. 1, the lotteries are presented in a table displaying their joint payoff distribution. For choices between lotteries with dependent payoffs, the two presentation formats are identical and display the lotteries' joint payoff distribution.
The two presentation formats have distinct advantages and disadvantages. The main advantages of the canonical presentation are that it emphasizes the difference between lotteries with independent vs. dependent payoffs and that subjects are probably more used to the canonical presentation of lotteries with independent payoffs. However, the main disadvantage of the canonical presentation is that between the two blocks not only the lotteries' joint payoff distribution changes but also their visual presentation. In contrast, the states of the world presentation keeps the visual presentation constant across the two blocks, but presents lotteries with independent payoffs in an unfamiliar way. Ideally, our results should remain valid under both presentation formats.

Additional part
To validate the classification of subjects into types in a different choice context, we perform out-of-sample predictions about the frequency of preference reversals. To trigger preference reversals we first expose subjects to six binary choices between additional lotteries and, subsequently, let them evaluate these lotteries in isolation by stating their certainty equivalent. We added the six binary choices to the main part of the experiment but used these choices neither for estimating the subject's preferences nor for classifying them into types.
Each of the six binary choices consists of a relatively safe lottery X with a low payoff-variance and a more risky lottery Ỹ with high payoff-variance. The two lotteries have the following format: with a scaling factor t ∈ {2, 4, 16} . All six binary choices can be found in Appendix C. As Bordalo et al. (2012b) discuss in detail, subjects tend to prefer the relatively safe lottery X over the risky lottery Ỹ in a pairwise choice but, at the same time, indicate a higher certainty equivalent for Ỹ than for X when evaluating the lotteries in isolation. Bordalo et al. (2012b) also explain that ST can describe these so called preference reversals due to the change in the choice set while EUT and CPT can never describe them. 9 Section 2 of the Online Appendix derives for each of the six binary choices the conditions under which ST describes a preference reversal.
To elicit the certainty equivalents in the additional part of the experiment, we present each of the lotteries L ∈ {X,Ỹ} in a choice menu in which the subject has to indicate whether she prefers the lottery or a certain payoff z r . Figure 2 shows an example fo such a choice menu. The certain payoff increases from the lottery's lowest payoff, z 1 = 0 , to its highest payoff z 21 in 21 equal increments. The point where the subject switches from preferring the certain payoff to preferring the lottery allows us to approximate the certainty equivalent by CE(L k ) = (z k + z k+1 )∕2 for k ∈ {1, … , 20}. 10 We randomize the order in which we elicit the certainty equivalents of the additional lotteries across subjects. Moreover, since the six binary choices between the Fig. 2 Elicitation of certainty equivalents in the additional part of the experiment. This screenshot shows an example of the choice menu we used for eliciting the subjects' certainty equivalents, when they had to evaluate lotteries in isolation during the additional part of the experiment. The example is translated from French. Payoff are reported in points with 100 points corresponding to 1 Swiss Franc or 1.04 USD additional lotteries appear in the main part of the experiment, subjects should not recall the additional lotteries when stating their certainty equivalents. 11 By comparing the binary choices between the additional lotteries and their certainty equivalents, we can detect the number of preference reversals of each subject. Since there are six binary choices, each subject can exhibit between 0 and 6 preference reversals.

Number of choices
Subjects in the canonical presentation go through a total of 93 binary choices, while subjects in the states of the world presentation go through 84 binary choices. The number of binary choices differs between the presentation formats since the 9 binary choices designed for triggering the Common Consequence Allais Paradox in which lottery X has three payoffs and lottery Y is a sure amount look identical regardless whether the lotteries' payoffs are independent or dependent. Table 4 in Appendix D decomposes the number of choices in each presentation format. Regardless of the presentation format, each subject also evaluates 9 lotteries in isolation during the additional part of the experiment.

Implementation in the lab and incentives
We conducted the experiment in the computer lab at the University of Lausanne (LABEX) using an application based on PHP and MySQL. Most subjects were students of the University of Lausanne and the École Polytechnique Fédérale de Lausanne (EPFL), recruited via ORSEE (Greiner, 2015). The experiment consisted of 14 sessions with 283 subjects in total.
To incentivize subjects' choices in both parts of the experiment, we applied the prior incentive system (Johnson et al., 2021). This avoids violations of isolation, which may otherwise arise with a random incentive system, as pointed out by Holt (1986). In each part, every subject had to draw a sealed envelope from an urn before making any choices. The envelope contained one of the choices the subject was going to make in that part and which later was used for payment. At the very end of the experiment, the subject went to another room where she opened the envelopes together with an assistant. To determine her payment, which she received in cash at the end, she rolled two dice if the choice in the corresponding envelope involved two lotteries with independent payoffs and only one die if the choice involved two lotteries with dependent payoffs.
At the beginning of the experiment, subjects received general instructions informing them about the structure of the experiment, their anonymity, the show up fee, and the conversion rate of points into Swiss Francs. 12 At the beginning of each part, subjects received additional printed instructions. These additional instructions comprised a description of the choices and the payment procedure for that part. They carefully described the difference between lotteries with independent and dependent payoffs. They also explained that, at the end of the experiment, the subject will open the envelope to determine her payoff and roll one one or two dice, depending on whether the lotteries' payoffs in the payment relevant choice are independent or dependent. The instructions also contained several comprehension questions whose answers the assistants verified before subjects could begin. The additional instructions differed depending on whether a subject was exposed to the canonical presentation or the states of the world presentation. All instructions were written in French. English translations are available in Sect. 8 of the Online Appendix.
After making their choices, but before determining and receiving their payments, subjects filled in a demographic questionnaire, completed a short version of the Big 5 personality questionnaire, and a cognitive ability test with 12 questions based on Raven's matrices. The instructions were shown on screen at the beginning of each task. The cognitive ability test was also incentivized and subjects received 50 points per correct answer. 13 Each subject received a show-up fee of 10 Swiss Francs. Total earnings varied between 12.00 and 142.50 Swiss Francs with a mean of 57.66 and a standard deviation of 26.39 Swiss Francs. Each session lasted approximately 90 minutes.

Non-parametric results
In this section, we present the non-parametric results. We start by summarizing the systematic patterns in the frequency of Allais Paradoxes before discussing whether they can be described by EUT, CPT, and ST. Figure 3 shows the average frequency of Allais Paradoxes relative to their maximum possible number separately for lotteries with independent and dependent payoffs. Panel (a) exhibits the frequency of Allais Paradoxes in the expected direction, that is, the direction predicted by CPT and ST. Regardless of the presentation format, Allais Paradoxes in the expected direction occur often with both independent and dependent payoffs. However, they are substantially more frequent with independent payoffs than with dependent payoffs. For example, for both presentation 12 Payoffs were shown in points. 100 points corresponded to one Swiss Franc. At the time of the experiment, one Swiss Franc corresponded to roughly 1.04 USD. 13 We find that the classification of subjects into types is neither related to their individual characteristics nor to their average decision time. Results and a brief discussion of this finding are available in Sect. 3 of the Online Appendix. formats combined, the frequency of Allais Paradoxes in the expected direction is 28.3% with independent payoffs and 16.9% with dependent payoffs.
Panel (b) exhibits the frequency of Allais Paradoxes in the inverse direction, that is, the direction none of the theories can explain. Regardless of the presentation format, Allais Paradoxes in the inverse direction not only are much less frequent than those in the expected direction but also occur with the same frequency across The two bars on the left pool the choices from subjects exposed to the canonical presentation with those from subjects exposed to the states of the world presentation. The two bars in the middle and on the right separate the choices by presentation format independent and dependent payoffs. 14 Given that neither theory can describe these Allais Paradoxes in the inverse direction and given that their frequency is constant across presentation formats as well as across independent and dependent payoffs, we interpret them as the result of decision noise. This interpretation is in line with the literature which acknowledges the existence and relevance of decision noise (e.g. Hey, 2005).
Panel (c) exhibits the difference in the relative frequency of Allais Paradoxes in the expected and in the inverse directions. Under the assumption that the level of decision noise is the same in both directions, we can interpret this difference as the frequency of Allais Paradoxes net of decision noise. These net frequencies confirm that, regardless of the presentation format, Allais Paradoxes occur often and are more than twice as frequent with independent than with dependent payoffs. More specifically, the ratio of Allais Paradoxes between independent and dependent payoffs is 2.127 for both presentation formats combined, 2.755 for the canonical presentation, and 1.648 for the states of the world presentation.
We now discuss which of the three theories is able to describe the above patterns. EUT fails to describe the patterns as it never predicts any Allais Paradoxes and, thus, their net frequencies should always be zero. CPT and ST can each describe some but not all of the above patterns. While CPT can describe the occurrence of Allais Paradoxes for both independent and dependent payoffs, it cannot describe that their net frequency is higher with independent payoffs than with dependent payoffs. In contrast, ST can describe that Allais Paradoxes are more frequent with independent payoffs than with dependent payoffs. However, it cannot describe the occurrence of Allais Paradoxes with dependent payoffs. In sum, none of the three theories alone can explain all of the above patterns in the aggregate frequency of Allais Paradoxes. However, CPT and ST each describe some of the patterns and, thus, both of them play a role. This non-parametric evidence yields our first main result.
Result 1 Probability weighting and choice set dependence both play a role in explaining aggregate choices.
1. Subjects exhibit a high frequency of Allais Paradoxes no matter if payoffs are independent or dependent. 2. However, subjects exhibit a higher frequency of Allais Paradoxes with independent than with dependent payoffs. 3. The result is robust across the three versions of the Allais Paradox and the two presentation formats.  1 for details). The bars on the right show the net frequency of the Common Ratio Allais Paradox. Allais Paradoxes occur often and are more frequent with independent than with dependent payoffs. However, the difference in the net frequencies between independent and dependent payoffs is less pronounced for the Common Ratio than for the Common Consequence Allais Paradox. This is probably because the mechanism in ST behind the Common Ratio Allais Paradox is more subtle than the one behind the Common Consequence Allais Paradox, as mentioned earlier in Sect. 3.1.1. 15 Result 1 is also robust across the two presentation formats. This robustness allows us to addresses two concerns about the relevance of choice set dependence.
First, it confirms the results by Loomes and Sugden (1987), Loomes (1988), and Starmer and Sugden (1989) who found juxtaposition effects in common ratio lottery choices using the states of the world presentation format. It also rules out the concern that by making the common consequence more obvious, the states of the world presentation influences the frequency of Allais Paradoxes (Birnbaum, 2004 Table 7 in Appendix F presents an even more detailed look at the frequency of choices of lotteries X and Y, disaggregated by independent and dependent payoffs as well as by each version of the Allais Paradox. As explained in Sect. 2, opting for Y in the first choice and for X in the second corresponds to the expected direction of the Allais Paradox. Birnbaum et al., 2017;Keller, 1985;Leland, 2010) or decision times (for details, see Appendix A). Second, the robustness across presentation formats confirms that manipulating the lotteries' joint payoff distribution affects the frequency of Allais Paradoxes primarily through choice set dependence and not through event-splitting effects. An event-splitting occurs when an event with a given payoff is split into sub-events that offer the event's payoff in multiple states of the world. Earlier papers raised the concern that event-splitting could influence subjects' choices if splitting an event into sub-events increases its weight (Humphrey, 1995;Starmer & Sugden, 1993). If such event-splitting effects played a role in our data, their strength would depend on the presentation format: they would be stronger in the states of the world presentation where the number of sub-states in which a payoff appears is particularly prominent. However, our first main result holds across the two presentation formats in our data. Thus, we conclude that event-splitting effects play no significant role in our results (for details, see Appendix E.2).
Next, we analyze the distribution of the net frequency of Allais Paradoxes to get a first glimpse at the potential heterogeneity that may be behind Result 1. Figure 5 depicts the corresponding histograms separately for lotteries with independent and dependent payoffs. Not surprisingly, the distribution for lotteries with independent payoffs is located to the right of the distribution for lotteries with dependent payoffs. However, interestingly, both distributions appear to be bimodal. They both exhibit one mode at the lowest bin, corresponding to a net frequency of Allais Paradoxes between 0 and 5%, and another mode at a bin corresponding to a higher net frequency. This multimodality suggests that Result 1 may be driven by considerable heterogeneity in subjects' risk preferences. In particular, the choices of some subjects may be predominantly influenced by probability weighting whereas the choices of others may be primarily driven by choice set dependence. There may also exist a minority of EUT-subjects who display no or only few Allais Paradoxes. We examine this possibility with the structural model which we present in the next section.

Structural model
In this section, we discuss the set-up and the results of the structural model. We also validate the model's power to predict preference reversals in a different choice context.

Set-up
The structural model is based on a finite mixture model (see McLachlan & Peel, 2000, for an overview) and uses a random utility approach for discrete choices (McFadden, 1981). It discriminates between subjects whose preferences are best described by EUT, subjects whose preferences display probability weighting and are best described by CPT, and subjects whose preferences display choice set dependence and are best described by ST. Controlling for the presence of EUT subjects is important, as the behavior of a minority of our subjects may still be best described by EUT, as previously found by other studies Conte et al., 2011).

Random utility approach
The random utility approach allows the structural model to explicitly take decision noise into account. Consider a subject i ∈ {1, … , N} whose preferences are best described by decision model M in the set of decision models M = {EUT, CPT, ST} . She prefers lottery X g over Y g in binary choice g ∈ {1, … , G} when the random utility of choosing X g , V M (X g , M ) + X , is higher than the random utility of choosing Y g , V M (Y g , M ) + Y . The random errors, X and Y , are realizations of an extreme value 1 distribution with scale parameter 1∕ M , and the vector M comprises decision model M's preference parameters. This implies that the probability of subject i choosing X g , i.e., C ig = X , is given by: The parameter M governs the choice sensitivity with respect to differences in the lotteries' deterministic value. If M is 0, the subject chooses each lottery with probability 50% regardless of the deterministic value it provides. If M is arbitrarily large, the probability of choosing the lottery with the higher deterministic value approaches 1. Subject i's contribution to the density function of the random utility model corresponds to the product of the choice probabilities over all G binary decisions, i.e., where I(C ig = X) is 1 if subject i chooses lottery X g and 0 otherwise.

Finite mixture model
Since risk preferences may be heterogeneous, we do not directly observe which model best describes subject i's preferences. In other words, we do not know ex-ante whether subject i is an EUT-, CPT-, or ST-type. Hence, we have to weight i's typespecific density contributions by the corresponding ex-ante probabilities of typemembership, M , in order to obtain her contribution to the likelihood of the finite mixture model: where the vector Ψ = ( EUT , CPT , ST , EUT , CPT , ST , EUT , CPT ) comprises all parameters that need to be estimated, and ST = 1 − EUT − CPT . 16 Note that the ex-ante probabilities of type-membership are the same across all subjects and correspond to the relative sizes of the types in the population. Once we estimated the parameters of the finite mixture model, we can classify each subject into the type she most likely belongs to, given her choices and the estimated parameters Ψ . To do so, we apply Bayes' rule and obtain subject i's individual ex-post probabilities of type-membership: Based on these individual ex-post probabilities of type-membership, we can also assess the ambiguity in the classification of subjects into types. If the finite mixture model classifies subjects cleanly into types, most iM should be either close to 0 or to 1. In contrast, if the finite mixture model fails to come up with a clean classification of subjects into distinct types, many iM will be in the vicinity of 1/3.
. 16 Since i's likelihood contribution is highly non-linear, we apply the expectation maximization (EM) algorithm to obtain the model's maximum likelihood estimates Ψ (Dempster et al., 1977). The EM algorithm proceeds iteratively in two steps: In the E-step, it computes the individual ex-post probabilities of type-membership given the actual fit of the model (see equation (7)). In the subsequent M-step, it updates the fit of the model by using the previously computed ex-post probabilities to maximize each types' log likelihood contribution separately. To ensure that we find the global maximum of the log likelihood function despite the EM algorithm's property to converge towards the nearest maximum, we apply it over a large grid of start values.

Specification of functional forms
To keep the model parsimonious and yet flexible in fitting the data, we specify the following functional forms. In all three decision models, we use a power specification for the utility function v, i.e., which has a convenient interpretation, since measures v's concavity. Specifying the same utility function across all three models puts them on equal footing regarding their flexibility to encompass potential non-linearities in utility. Moreover, this specification turned out to be a neat compromise between parsimony and goodness of fit (Stott, 2006). In CPT, we follow the proposal by Prelec (1998) and specify the probability weighting function as: where 0 < measures likelihood sensitivity and reflects the shape of the probability weighting function. When = 1 , w is linear in probabilities. When gets closer to zero, w becomes more inversely S-shaped. When gets larger than one, w becomes more S-shaped. This specification of the probability weighting function satisfies the three properties discussed in Sect. 2.2. We also tested the two-parameter version of Prelec's probability weighting function. However, as the second parameter measuring the function's net index of convexity is estimated to be almost 1, results remain virtually unchanged (see Appendix G). Hence, we opt for the one-parameter version to keep the total number of parameters the same for CPT and ST. In ST, the decision weights depend on the degree of local thinking 0 < ≤ 1 which we estimate based on the functional form given in Eq.
(2). In all binary choices we use for triggering Allais Paradoxes, the salience ranking of the states of the world is fully determined by ordering, diminishing sensitivity, symmetry, and zero contrast (Sect. 1 of the Online Appendix shows this for every binary choice we use). Hence, we do not need to specify a particular salience function.

Parameter recovery and discriminatory power
We conduct a series of Monte Carlo Simulations to assess the structural model's power to recover a wide range of parameters and discriminate between the three preference types. The Monte Carlo Simulations also allow us to test the structural model's robustness against potential serial correlation in the subjects' errors. In these simulations, we impose a vector of true parameters which we use to simulate the subjects' choices in each type. Subsequently, we try to recover these true parameters and the subjects' individual type-membership by estimating the structural model on the simulated choices. This allows us to calculate the potential bias in the estimated parameters, their overall precision in terms of Mean Squared Errors, and the fraction of correctly classified subjects. Each simulation is based on 1,000 simulation runs. Section 4 of the Online Appendix discusses the set-up and the results of these Monte Carlo Simulations in detail. The simulations reveal that the structural model's power to recover a wide range of parameters and discriminate between the different types is remarkably high. In particular, the structural model provides unbiased and precise estimates even in a situation where discriminating between EUT-, CPT-, and ST-types is extremely hard -that is, when the simulated subjects have an identical utility function and the same choice sensitivity across the three types and only differ slightly in their degrees of likelihood sensitivity and local thinking. In such a situation, the structural model still classifies the vast majority of simulated subjects into the correct type. We suppose that the model's ability to detect even slight behavioral differences between the types is mainly because, instead of estimating at the individual level, the model efficiently exploits the choices of all subjects simultaneously. This allows it to recover the true parameters with high precision as individual noise averages out. 17

Robustness against serially correlated errors
Moreover, the simulations confirm the structural model's robustness against potential serial correlation in the subjects' errors. This robustness is an important feature, since we cannot rule out that the errors subjects make when evaluating lotteries are serially correlated across the binary choices. However, the simulations reveal that the estimated parameters remain unbiased and precise, even if the subjects' errors, X and Y , follow an AR(1) process with a high degree of serial correlation of = 0.6 . Also, the vast majority of subjects remains classified in the correct type. The reason for this robustness against serially correlated errors is likely because we randomized the order of choices across subjects. Since the finite mixture model exploits the choices of all subjects simultaneously, their random order causes the impact of serial correlation to average out. Furthermore, to ensure that inferences about the estimated parameters are valid even with serially correlated errors, we report clusterrobust standard errors.

Robustness against heterogeneity within types
Finally, the simulations reveal that the structural model is robust against heterogeneity within types. This robustness is important as there is likely heterogeneity not only between but also within the three types. The simulations reveal that the estimated parameters remain representative for the average preferences in each type when the subjects' individual parametersi and i or i , respectively -are heterogeneous 1 3 and drawn from a uniform distribution within each type. The intuition is that the structural model discriminates between the types primarily based on the frequency of Allais Paradoxes and estimates parameters that are representative for each type. Thus, random individual deviations within a type form these representative parameters cancel each other out.
In sum, the simulations highlight that the experimental choices contain rich information about subjects' risk preferences and type-membership. By taking the choices of all subjects simultaneously into account, the structural model exploits this information efficiently and is robust against potential serial correlation in the errors and heterogeneity within the types.

Structural model results
We now present and interpret the results of the structural model. When classifying subjects into types using their ex-post probabilities of type-membership, we obtain a clean classification of subjects into 80 EUT-types, 108 CPT-types, and 95 ST-types. Most of the ex-post probabilities of individual type-membership are either close to 0 or 1, confirming that almost all subjects can be unambiguously classified into one of these three types. Figure 6 shows histograms with the ex-post probabilities of type-membership. Table 2 exhibits the type-specific parameter estimates of the finite mixture model. The results show that there is substantial heterogeneity in subjects' risk preferences. The choices of 28.4% of subjects are best described by EUT, the choices of 37.9% by CPT, and those of the remaining 33.7% by ST. This classification confirms Result 1 obtained non-parametrically at the aggregate level. The majority of subjects is best described by either CPT or ST, while -consistent with previous evidence Conte et al., 2011) -only a minority is best described by EUT.
On average, the 80 EUT-types display an almost linear utility function which makes them essentially risk neutral. 18 Although the estimated concavity of ̂= 0.080 is

Probability of Being an EUT−Type
Wi,EUT  (7)). The resulting classification of subjects into types is clean as for nearly all subjects these ex-post probabilities of type-membership are either close to 0 or 1 statistically significant, it is negligible in economic magnitude. Moreover, among the three types, the EUT-types exhibit the highest level of decision noise which translates into a relatively low estimated choice sensitivity.
The 108 CPT-types exhibit, on average, a concave utility function with ̂= 0.572 and a strongly inverse S-shaped probability weighting function with ̂= 0.469 . This confirms that the CPT-types' choices are strongly influenced by probability weighting. With these parameter estimates, the average CPT-type displays the Common Consequence Allais Paradox discussed in the motivating example in Sect. 2.
The 95 ST-types display, on average, a strongly concave utility function with ̂= 0.870 and a seemingly low but statistically significant degree of local thinking corresponding to ̂= 0.924 . Note that, although the average ST-type's degree of Table 2 Type-specific parameter estimates of the finite mixture model Subject cluster-robust standard errors are reported in parentheses and based on 1,000 bootstrap replications. Significantly different from 0 (1) at the 1% level: * * * ( ••• ); at the 5% level: * * ( •• ) a The relative group sizes are not tested against zero, since under the null hypothesis that a type's relative size is zero, the preference parameters are meaningless. Consequently, the test statistic would exhibit an unknown distribution (for a more detailed discussion see (McLachlan & Peel, 2000). The distribution of the test statistic could be bootstrapped under the null hypothesis. However, since the estimates of the relative group sizes are sufficiently far from the edges of the parameter space, this is not done here b Subjects are assigned to the best-fitting model according to their ex-post probabilities of type-membership (see Eq. (7)). The number of assigned subjects is not tested against zero, for the same reason as the relative group sizes are not tested against zero c Choices are predicted by using the subjects' classification into types and by calculating the lotteries' values, V M (X g ,̂M) and V M (Y g ,̂M) , for the type-specific parameter estimates ̂M local thinking appears to be low, she still exhibits the Common Consequence Allais Paradox discussed in the motivating example in Sect. 2. The reason is that with a strongly concave utility function, even a low degree of local thinking is sufficient to generate the Common Consequence Allais Paradox. 19 Next, we analyze how well the structural model fits the subjects' choices compared to aggregate models that neglect any heterogeneity and assume a representative decision maker. The estimation results of the aggregate models can be found in Appendix G. The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) both indicate that the structural model fits the subjects' choices considerably better than any of the aggregate models. Moreover, when we use the subjects' classification into types and the type-specific parameter estimates to predict choices in-sample, the structural model yields 75.0% correct predictions overall, that is, 75.0% of the predicted choices coincide with the subjects' empirical choices. This more than the 50% correct predictions we would expect from purely random behavior and -not surprisingly due to the finite mixture model's higher number of parameters -also more than any of the aggregate models. The percentage of correctly predicted choices per type confirms that the ST-and CPT-types' choices are less noisy than the EUT-types' choices: while the model correctly predicts 83.0% of the ST-types' and 75.9% of the CPT-types' choices, the percentage of correctly predicted choices is just 64.1% for the EUT-types. In sum, the analysis reveals that the structural model's parsimonious way of taking heterogeneity into account leads to a superior fit compared to the aggregate models. 20 Finally, we also analyze how the structural model's fit compares to models with only two types -i.e., finite mixture models (i) with only EUT-and CPT-types, (ii) with only EUT-and ST-types, and (iii) with only CPT-and ST-types. Detailed results are available in Sect. 6 of the Online Appendix. Overall, the structural model with three types fits the data much better than each of the models with two types. Two results from the models with two types stand out. First, in the model with only EUT-and CPT-types, the relative sizes of the types and their parameters are in line with previous studies discriminating between EUT-and CPT-types Conte et al., 2011). Second, in the two models featuring ST-types, the share of ST-types is 44% in the model mixing them with EUT-types and 38% in the model mixing them with CPT-types. Moreover -as in the model with three types -the STtypes' degree of local thinking is mild with an estimated above 0.9 and their utility function is strongly concave with an estimated above 0.8.
Overall, the structural estimations and the subjects' type-specific behavior yield our second main result.
Result 2 There is vast heterogeneity in the subjects' risk preferences. Subjects are parsimoniously classified into 28% EUT-types, 38% CPT-types, and 34% ST-types according to the decision theory best describing their behavior.
Finally, we carried out robustness checks to ensure that this result does not depend on a particular specification of our model. They reveal that the result remains virtually unchanged if we use a Fechner-type error directly affecting subjects' choices instead of the random utility approach. However, the random utility approach yields a superior fit of the structural model. Similarly, modeling choice set dependence with ST yields a superior fit compared to modeling it with RT, the other major choice set dependent theory. 21

Predictions of preference reversals
Next, we assess how well the structural model predicts preference reversals in the additional part of the experiment (see Sect. 3.2). These out-of-sample predictions across choice contexts represent a particularly stringent test of the structural model for two reasons. First, the model needs to predict behavior in choices that differ in payoffs and probabilities from the ones it was estimated on. Second, the model also needs to predict behavior across choice contexts, i.e., in choices that trigger preference reversals instead of Allais Paradoxes.
We start by describing and interpreting the net frequencies of preference reversals for each of the three types. Subsequently, we use the structural model's random utility approach to make quantitative predictions about these net frequencies. Comparing the empirical and the predicted net frequencies of preference reversals will reveal the aspects of behavior the structural model predicts well and potential other aspects which the model does not capture. Such other aspects of behavior which our structural model does not capture would be particularly interesting, as they may provide hints about the instability of certain preference components across choice contexts. Figure 7 shows the average net frequency of preference reversals for each of the three preference types, i.e., the difference in the frequencies of preference reversals in the expected and in the inverse direction. It reveals that with a net frequency of 43.3% the ST-types exhibit substantially more preference reversals than the EUT-and the CPT-types whose net frequencies are just 32.6% and 25.8%, respectively. Moreover, the EUT-and CPT-types net frequencies of preference reversals are not significantly different. This evidence is in line with our expectation that choice set dependence mainly drives the ST-types' choices and generates their preference reversals. However, the positive net frequencies of preference reversals of the EUT-and CPT-types indicate that choice set dependence might influence their choices too, although to a lesser extent than the ST-types' choices.

Quantitative predictions
We now use the structural model's random utility approach to make quantitative predictions about the frequency of preference reversals for each preference type. We start by predicting the probability that a subject belonging to type M with estimated parameters ̂M and ̂M indicates a higher certainty equivalent for lottery X than for lottery Ỹ when she evaluates the two lotteries separately in a choice menu (see Fig. 2). First, we predict for each of the two lotteries L ∈ {X,Ỹ} and for each of the 21 rows r of the corresponding choice menu the probability that the subject prefers the lottery over the sure amount z r , z 1 < … < z 21 :  Table 3. The numbers in parentheses indicate the number of subjects in each of the three types. 34 of the 283 subjects (12.0%) are excluded from the analysis because they exhibit more than one switch-point in at least one of the choice menus used for eliciting the certainty equivalents. Exhibiting more than one switch-point is independent of type-membership ( 2 -test of independence: p-value = 0.534) Journal of Risk and Uncertainty (2022) 65:139-184 Second, by assuming a unique switch-point, we use these predicted probabilities to infer the probability distribution over the k ∈ {1, … , 20} possible certainty equivalents for each lottery. 22 The predicted probability that the certainty equivalent CE(L) = (z k + z k+1 )∕2 ≡z k , corresponds to: Since we assume a unique switch-point, we need to normalize these predicted prob- to obtain a proper probability distribution which sums up to one. Third, by combining the probability distributions over the possible certainty equivalents of the two lotteries, we obtain the joint probability distribution over the 20 × 20 = 400 states in which either the certainty equivalent of lottery X or the one of lottery Ỹ is higher. Knowing this joint probability distribution allows us to predict the probability that the subject indicates a higher certainty equivalent for X than Ỹ . Subsequently, we evaluate Eq. (5) to predict the probability that the subject will choose X over Ỹ in the pairwise choice. By applying this procedure to all 6 choices of the additional part (see Appendix C), we can predict the type-specific frequencies of preference reversals in the expected and inverse directions using the structural model and the estimated parameters ̂M and ̂M . Finally, the predicted net frequency of preference reversals is the difference between the predicted frequency of preference reversals in the expected direction and those in the inverse direction. Table 3 compares the empirical to the predicted net frequencies of preference reversals using OLS regressions. We start by interpreting the estimated coefficients of the regression in the first column, which uses the type-specific parameters ̂M and ̂M to predict the subjects' net frequencies of preference reversals. The coefficient on the predicted net frequencies of preference reversals is 0.907 and not significantly different from one. This indicates that the structural model captures the behavioral differences between the types remarkably well as, on average, a given change in the predicted frequencies of preference reversals translates nearly one to one into a change in the corresponding empirical frequencies. In other words, the structural model predicts the average differences in the empirical net frequencies of preference reversals between the three types almost perfectly.
However, the estimated intercept is positive, revealing that the structural model consistently underestimates the frequency of preference reversals across all three types by 27.4 percentage points. This can be seen when we visualize the type-specific predictions in Fig. 7: as indicated by the black dots, the predicted net frequencies of preference reversals are consistently too low across all three types. Moreover, as 22 Assuming a unique switch-point is consistent with our approach of excluding the 34 (12.0%) subjects who switched multiple times. indicated by the white circles, when we adjust the predicted net frequencies by the estimated intercept of 0.274, they all match the empirical frequencies almost perfectly and fall well within the 95% confidence intervals.
This evidence suggests not only that choice set dependence plays a role across all three types but also that its influence is stronger in the choices of the additional part than in the choices of the main part. We hypothesize that this could be because the influence of choice set dependence may be shaped by the choice context. More specifically, in the additional part, subjects fill out choice menus that always offer choices between a lottery with two payoffs and a series of sure amounts. This specific choice context may shift the subjects' focus of attention towards differences in payoffs and, thus, may inflate the role of choice set dependence. In contrast, in main part, subjects always face binary choices between two lotteries with up to three payoffs. This choice context may shift the subjects' focus of attention towards differences in probabilities and, thus, may dampen the influence of choice set dependence. Overall, the evidence gained from the out-of-sample predictions suggests that exploring how the choice context shapes the role of choice set dependence is an important avenue for future research.
Next, we interpret the fraction of the variance in the empirical net frequencies of preference reversals which the structural model manages to predict. At first glance, the fraction of the predicted variance is disappointingly low with an R 2 of just 0.054. The low R 2 indicates that the structural model is not well suited for predicting individual net frequencies of preference reversals since there is apparently a considerable amount of heterogeneity within each of the three preference types. However, as discussed above, the structural model predicts the average differences in the net frequencies of preference reversals between the types remarkably well and, in fact, the R 2 of the corresponding regression amounts to 0.936. Finally, we investigate whether the heterogeneity within each of the three preference types results from systematic differences in individual preferences or rather from noise. To do so, we take the finite mixture model's classification of subjects into types and estimate the parameters of the corresponding decision models separately for each subject i. This yields a distinct set of parameter estimates for every subject, ̂M i and ̂M i , which we use for predicting individual-specific net frequencies of preference reversals. If the heterogeneity within the types results mainly from systematic differences in individual preferences, the predictions based on the individual-specific estimates would pick up these differences and, thus, would exhibit a superior out-ofsample performance than the predictions based on the type-specific estimates. In contrast, if the heterogeneity within the types results mainly from noise, which randomly changes across the two parts of the experiment, the predictions based on the individual-specific estimates would pick up this random noise and, thus, their out-of-sample performance would fall short of the predictions based on the type-specific estimates.
The second column of Table 3 reveals that the performance of the predictions based on the individual-specific estimates falls short of those based on the more parsimonious type-specific estimates in all relevant dimensions. First, the estimated coefficient of the predictions based on the individual-specific estimates (0.149) is far below one, indicating that they severely underestimate differences in the net frequencies of preference reversals across types. With an intercept of 0.297, they also consistently underestimate the level of preference reversals. Second, and even more striking, the predictions based on the individual-specific estimates explain a fraction of just R 2 = 0.025 of the variance in the empirical net frequencies of preference reversals -much less than the predictions based on the more parsimonious typespecific estimates. They are also worse at predicting the average differences across types as the corresponding R 2 is just 0.122. Overall, these results indicate not only that the individual heterogeneity within the preference types primarily results from noise but also that, despite their parsimony, the structural model's type-specific estimates pick up most of the relevant heterogeneity across the types.
The analysis of the structural model's power to predict preference reversals yields the third main result.

Result 3
The structural model has power to predict type-specific behavioral differences across choice contexts.
1. Subjects classified as ST-types exhibit significantly more preference reversals than the other subjects. 2. The structural model accurately predicts the quantitative differences in the average frequencies of preference reversals across types. 3. Due to their parsimony, the structural model's type-specific parameter estimates outperform noisy individual estimates.
Furthermore, the predictions of preference reversals suggest that the choice context shapes the relative importance of choice set dependence.

Conclusion
The paper assesses the relative importance of probability weighting and choice set dependence both non-parametrically and with a structural model. This represents one of the first joint tests of the two most descriptive behavioral theories of choice under risk. There are three main results. First, for aggregate choices, both choice set dependence and probability weighting matter. This result does not rely on specific functional forms and is robust across the three versions of the Allais Paradox as well as across the two presentation formats. Second, there is substantial heterogeneity in risk preferences which can be parsimoniously characterized by three types: 38% CPT-types, 34% ST-types, and 28% EUT-types. Finally, this classification of subjects is valid in a different choice context, as the subjects classified as ST-types exhibit significantly more preference reversals than their peers.
These results are directly relevant for the literature that aims at identifying the main behavioral drivers of risky choices. Knowing about the relative importance of probability weighting and choice set dependence could inspire new decision theories taking both concepts into account and lead to better predictions in various domains of risk taking behavior, such as investment, asset pricing, insurance, and health behavior.
The conclusions also open up avenues for future research. First, our methodology could be used to study how the relative importance of probability weighting and choice set dependence varies with educational background, cognitive ability, and other socio economic characteristics in the general population. Studying this variation in the general population could lead to new explanations for the observed variation in socio-economic outcomes as the different types may fall prey to distinct behavioral traps during their lives. Second, to improve the decision models' predictive power, it would be important to explore how the choice context shapes the role of choice set dependence. Knowing how the choice context influences the role of choice set dependence could lead to more accurate predictions in other domains such as consumer, investor, and judicial choice.

Appendix A. Common ratio Allais Paradox
We now use an example of two lotteries, X and Y, that may induce the Common Ratio Allais Paradox: In this example, the Common Ratio Allais Paradox refers to the empirical finding that if p is high most individuals prefer Y over X, whereas if p is scaled down by a factor 0 < < 1 individuals prefer X over Y for a sufficiently small .

A.1 EUT
EUT cannot describe the Common Ratio Allais Paradox in the above example. The decision maker evaluates lottery X as V EUT (X) = p v(6000) . The decision maker chooses lottery X over Y if: Hence, the choice does not depend on the value of the probability p.

A.2 CPT
CPT can describe the Common Ratio Allais Paradox in the above example. The decision maker prefers lottery Y over X if: Note that when p is scaled down by the factor , the right hand side of the above inequality remains unchanged, while the left hand side decreases due to the probability weighting function's subproportionality, i.e., w(q) w(p) > w( q) w( p) . Hence, for a sufficiently low the sign of the above inequality may change, and the decision maker prefers X to Y and exhibits the Common Ratio Allais Paradox.

A.3 ST
ST can describe the Common Ratio Allais Paradox in the above example when the two lotteries' payoffs are independent. In this case, there are four states of the world which rank in salience as follows: (6000, 0) > (0, 3000) > (6000, 3000) > (0, 0) . Hence, the decision maker evaluates lottery X as: and Using v(0) = 0 and the decision weights given by Eq.
(2), the decision maker prefers Y over X when: Note that when p is scaled down, the right hand side of the above inequality remains unchanged, while the left hand side decreases. Hence, for a sufficiently low the sign of the above inequality may change, and the decision maker prefers X to Y and exhibits the Common Ratio Allais Paradox. However, when the payoffs of the two lotteries depend of each other, ST can no longer describe the Common Ration Allais Paradox. In this case, there are just three states of the world: The ranking in terms of salience of these three states is as follows: (0, 3000) > (6000, 3000) > (0, 0) . Hence, the decision maker evaluates lottery X as: and evaluates lottery Y as Using v(0) = 0 and the decision weights given by Eq. (2), the decision maker prefers X over Y when: Hence, regardless of the value of p, the decision maker always prefers X over Y when the above inequality holds, and otherwise always prefers Y over X. Consequently, the .
The first three binary choices are similar to the ones stated in Bordalo et al. (2012b). The last three binary choices are based on Payoff Level 1 of the 3 × 3 × 2 design used for generating choices that may trigger the Common Ratio Allais Paradox (see Appendix B).  Table 4 Number of binary choices by presentation format and type of Allais Paradox a Three of the 3 × 3 × 2 = 18 binary choices to trigger the Common Ratio Allais Paradox were used to make out-of-sample predictions of preference reversals. These three binary choices were left out in the calculation of the frequencies of Allais Paradoxes and the structural estimations (see Appendices B and C) b In the states of the world presentation, the nine binary choices where lottery X has three possible payoffs and lottery Y is a sure amount look identical regardless whether the lotteries' payoffs are independent or dependent. Since we did not want to present the same choices twice, subjects exposed to the states of the world presentation had to go through nine binary choices less than those exposed to the canonical presentation

Appendix E. Comparison between the two presentation formats
This section of compares the subjects' choice patterns across the two presentation formats to rule out two concerns regarding the importance of choice set dependence.

E.1 Prominence of common consequence
The first concern is that the states of the world presentation makes the common consequence more prominent, which could influence the number of Allais Paradoxes and the subjects' decision times. When comparing the frequencies of Allais Paradoxes across the two presentation formats, we find one statistically significant but small difference. Table 5 exhibits all differences in the frequencies of Allais Paradoxes. With independent payoffs, the frequency in the expected direction is 4.8 percentage points higher in the canonical presentation than in the states of the world presentation (t-test: p-value = 0.007). However, this difference is much smaller than in Birnbaum et al. (2017) who argue that the states of the world presentation makes the common consequence more prominent and, thus, may lower the frequency of Allais Paradoxes. Perhaps the difference is small in our case because, when presenting the choices to the subjects, Birnbaum et al. (2017) place the common consequence always in the first column while we place it in a random column. This random placement may lower the common consequence's prominence. With dependent payoffs, the frequency of Allais Paradoxes in the expected direction is 1.4 percentage points lower in the canonical presentation than in the states of the world presentation (t-test: p-value = 0.313). In the inverse direction, it is 1.4 percentage points lower in the canonical presentation than in states of the world presentation with independent payoffs (t-test: p-value = 0.169), and 0.8 percentage points higher with dependent payoffs (t-test: p-value = 0.379).
In terms of average decision times, subjects exposed to the canonical presentation needed 12.03 seconds per binary choice, while those exposed to the states of the world presentation needed 14.74 seconds. The difference of 2.71 seconds is significant (t-test: p-value < 0.001) and may reflect the higher complexity of the states of the world presentation. Subjects also required more time with independent than with dependent payoffs: 0.48 seconds more in the canonical presentation (t-test: p-value = 0.034) and 6.50 seconds in the states of the world presentation (t-test: p-value < 0.001).

E.2 Event-splitting effects
The second concern is that the change in the frequency of Allais Paradoxes in response to a manipulation of the lotteries' joint payoff distribution may be driven by event-splitting effects rather than choice set dependence. Event-splitting occurs when, depending on the lotteries' joint payoff distribution, some payoffs appear in multiple states of the world. This might affect subjects' choices if they weight payoffs by the number of states in which they appear.
If event-splitting effects play a role in our data, they should be particularly strong in the states of the world presentation of choices provoking the generalized version of the Common Consequence Allais Paradox. For instance, the states of the world presentation of the choice in Eq. (1) with a common consequence z = 0 and independent payoffs is while with dependent payoffs it is We see that lottery X offers the highest payoff, 2500, in two states with independent payoffs but in just one state with dependent payoffs. This may render X more attractive with independent payoffs than with dependent payoffs. In addition, lottery Y offers the lowest payoff, 0, in two states with independent payoffs but in just one state with dependent payoffs. This may render Y more attractive with dependent payoffs than with independent payoffs. In contrast, in the canonical presentation, event-splitting effects are absent as the two lotteries are presented side by side as two separate options when payoffs are independent.
To test whether event-splitting effects play a role in our data, we compare the subjects' choice patterns between the two presentation formats. To maximize the chance of finding event-splitting effects, we only look at the choices provoking the generalized version of the Common Consequence Allais Paradox where these effects should be strongest. However, as Table 6 reveals, the choice patterns are virtually identical across the two presentation formats. Hence, we conclude that, in our data, the differences in the frequencies of Allais Paradoxes in response to manipulations of the lotteries' joint payoff distribution are driven by choice set dependence and not by event-splitting effects.  Table 8 reveals that, at the aggregate level, all decision models fit the subjects' choices considerably worse than the finite mixture model (Table 2) which accounts for heterogeneity in a parsimonious way. Compared to the estimations at the aggregate level, the finite mixture model not only achieves a higher log likelihood but also lower values of the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).  Subject cluster-robust standard errors are reported in parentheses. Significantly different from 0 (1) at the 1% level: * * * ( ••• ); at the 5% level: * * ( •• ); at the 10% level: * ( • ) a CPT2 is a specification also based on Cumulative Prospect Theory but uses the more flexible, twoparameter version of the probability weighting function by Prelec (1998): w(p) = exp(− (− ln(p)) ) , where is the net index of concavity b Choices are predicted by calculating the lotteries' values, V M (X g ,̂M) and V M (Y g ,̂M) , for the parameter estimates ̂M of the corresponding model Moreover, the alternative specification of Cumulative Prospect Theory, CPT2, using the more flexible, two-parameter version of Prelec's probability weighting function exhibits only a negligibly better fit than the baseline specification of CPT. This is because the estimated net index of concavity, ̂= 0.962 , is very close to one. Thus, we opt for the baseline specification of CPT, as it exhibits the same number of parameters as ST.
Funding Open access funding provided by University of Lausanne.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.