1 Introduction

The past decades of economic research on choice under risk have revealed systematic violations of expected utility theory (EUT; von Neumann & Morgenstern, 1953). As exposed in the famous Allais Paradoxes, subjects frequently violate EUT’s independence axiom as they exhibit both risk loving and risk averse behavior (Allais, 1953). For example, many individuals are risk loving when buying state lottery tickets and risk averse when buying damage insurance (Cicchetti & Dubin, 1994; Forrest et al., 2002; Garrett & Sobel, 1999; Sydnor, 2010). Moreover, subjects often reverse their choice when they have to choose between two lotteries or evaluate them in isolation (Lichtenstein & Slovic, 1971; Lindman, 1971). Some of these preference reversals violate EUT’s transitivity axiom (Cox & Epstein, 1989; Loomes et al., 1991). These and other systematic violations of EUT have spurred the development of various alternative decision theories.

These alternative decision theories fit into two major classes. The first major class is based on probability weighting and postulates that subjects systematically overweight small probabilities and underweight large probabilites. The most prominent example is Prospect Theory (Kahneman & Tversky, 1979), subsequently generalized to Cumulative Prospect Theory (CPT; Tversky & Kahneman, 1992). CPT is the best-fitting model for aggregate choices in this class (Starmer, 2000; Wakker, 2010).Footnote 1 According to CPT, subjects are risk loving when buying a state lottery ticket because they overweight the small probability of winning and, conversely, risk averse when buying damage insurance because they underweight the large probability of not suffering any damage. However, CPT fails to explain preference reversals, since subjects always attach the same value to lotteries, regardless whether they have to choose among them or evaluate them in isolation.

The other major class of decision theories postulates that the evaluation of lotteries is choice set dependent.Footnote 2 Prominent members of this class are Salience Theory (ST; Bordalo et al., 2012b) and Regret Theory (RT; Loomes & Sugden, 1982).Footnote 3 We focus on ST as the primary example of a choice set dependent theory because it is becoming the main contender to CPT (Dertwinkel-Kalt & Koster, 2020). According to ST, individuals focus their limited attention on states of the world with large payoff differences between the alternatives. Hence, a lottery’s value is choice set dependent as the weight attached to a state depends on the payoffs of the alternatives in that state. ST can also explain why individuals are often both risk loving and risk averse at the same time. They buy state lottery tickets because they overweight the state where they win the big prize due to the large payoff difference between buying the ticket and winning versus not buying the ticket; at the same time, they buy damage insurance, because they overweight the state in which the damage occurs due to the large payoff difference between being insured and uninsured in that particular state. In contrast to theories based on probability weighting, choice set dependent theories are able to describe preference reversals as they allow for violations of the transitivity axiom.

In this paper, we perform a joint test of the relative importance of probability weighting and choice set dependence. This test is relevant in various applications. On the one hand, ST can naturally explain several behavioral phenomena in consumer choice – such as the endowment effect – (Bordalo et al., 2012a, 2013b; Dertwinkel-Kalt et al., 2017), the counter-cyclicality of risk premia on financial markets (Bordalo et al., 2013a), and how legally irrelevant information affects judicial decisions (Bordalo et al., 2015). But on the other hand, and in contrast to CPT, ST can describe the Allais Paradox only in certain choice sets.

We conduct a laboratory experiment to perform the joint test. Subjects face a series of binary choices between lotteries provoking three versions of the Allais Paradox. To discriminate between the different decision theories, we manipulate the joint payoff distribution of the lotteries. That is, every subject faces the lotteries of each binary choice twice. In one case, the lotteries’ payoffs are independent of each other, while in the other, they depend on each other. This manipulation affects the joint payoff distribution of the lotteries but leaves their marginal payoff distributions unchanged. If risky choices are driven by probability weighting, the predicted frequency of Allais Paradoxes is the same, as subjects evaluate each lottery in isolation and focus exclusively on its marginal payoff distribution. Hence, theories based on probability weighting, such as CPT, can explain the Allais Paradox regardless of whether lotteries’ payoffs are independent or dependent. However, if risky choices are driven by choice set dependence, the predicted frequency of Allais Paradoxes is positive with independent payoffs and zero with dependent payoffs. Thus, choice set dependent theories, such as ST, cannot explain Allais Paradoxes when payoffs depend on each other. We can also control for EUT preferences, as EUT can never explain the Allais Paradox.

To obtain our first main result, we analyze the importance of probability weighting and choice set dependence non-parametrically at the aggregate level, i.e., at the level of a representative decision maker. At the aggregate level, both choice set dependence and probability weighting play a role. Probability weighting plays a role, because Allais Paradoxes occur regardless whether lotteries’ payoffs are independent or dependent. However, choice set dependence plays a role too, because Allais Paradoxes occur more than twice as often when lotteries’ payoffs are independent than when they are dependent. This result holds for all three versions of the Allais Paradox.

As a next step, we estimate a structural model which follows a finite mixture approach. That is, it assumes that there exist three types of subjects: CPT-types whose behavior is mostly driven by probability weighting, ST-types whose behavior is primarily driven by choice set dependence, and EUT-types. The model characterizes each of these types by its relative size and average type-specific parameters. Moreover, it provides a classification of every subject into the type that best fits her choices. The finite mixture approach has three advantages. First, it allows us to take heterogeneity in risk preferences parsimoniously into account (Bruhin et al., 2010; Conte et al., 2011), as it does not require estimating individual-specific parameters (Harless & Camerer, 1994; Hey & Orme, 1994; Starmer, 2000). Second, it can inform us about the specification best suited for representing each major class of decision models. Third, it yields type-specific preference parameters which can be used to calibrate theoretical models and make quantitative predictions about behavior across various choice contexts.

The structural model yields the second main result. It uncovers vast heterogeneity in subjects’ risk preferences and classifies them into 38% CPT-types, 34% ST-types, and 28% EUT-types. The result shows that probability weighting and choice set dependence both play a similarly important role in describing the non-EUT-types’ choices. It also highlights that the mix of types can be decisive for understanding and predicting aggregate behavior in applied contexts – such as consumer, investor, and judicial choice.

Finally, we use out-of-sample predictions to test the structural model’s power to predict behavior across choice contexts. This is a crucial test since one can only use a structural model to predict behavior across contexts if the estimated preferences are stable. To perform the out-of-sample predictions, we expose subjects to additional lotteries that may trigger preference reversals. Subjects always first choose between two of these additional lotteries and, later, evaluate each of them in isolation. By analyzing the frequency of preference reversals in these additional lotteries, we can assess the model’s power to predict behavior in choices with a different context than the ones we use for estimating the model.

The out-of-sample-predictions provide the third main result. The ST-types exhibit roughly 1.5 times more preference reversals than the CPT- and EUT-types, confirming that the ST-types’ choices are mostly driven by choice set dependence. The structural model also predicts quantitative differences in the average frequency of preference reversals accurately across the three types. Moreover, the predictions suggest that choice set dependence is stronger when subjects trade off a sure amount against a lottery than when they have to choose between two lotteries.

The paper contributes to the literature that tests the performance of the major classes of decision theories. This literature has tested probability weighting and choice set dependence separately and found support for both. On the one hand, there is considerable evidence suggesting that risky choices depend on outcome probabilities irrespective of the choice set (for examples, see Camerer & Ho, 1994; Fehr-Duda & Epper, 2012; Kahneman & Tversky, 1979; Loomes & Segal, 1994; Starmer, 2000). On the other hand, the literature also recognizes that risky choices depend on the choice set. Early studies, qualitatively testing the predictions of RT, find “juxtaposition effects”, i.e., that systematic manipulations of the lotteries’ joint payoff distribution that leave their marginal distributions unchanged can affect subjects’ choices (Battalio et al., 1990; Harless, 1992; Loomes & Sugden, 1987; Loomes, 1988; Starmer & Sugden, 1989). More recent studies, testing ST with the same paradigm, confirm the role of choice set dependence in non-incentivized choices on Mturk (Bordalo et al., 2012b), in choices between a lottery and a sure amount (Booth & Nolen, 2012), and in choices varying the lotteries’ positive skewness (Dertwinkel-Kalt & Koster, 2020). The most closely related and complementary study to ours is by Frydman and Mormann (2018). They find that the frequency of Allais Paradoxes varies with three levels of dependency between the lotteries’ payoffs and that the evaluation of lotteries changes if one adds an additional decoy lottery which subjects can see but not choose. They also estimate a structural model of ST separately for each subject and find that only half of their subjects exhibit ST’s local thinking. This suggests that there is substantial heterogeneity and the different classes of decision theories should be tested jointly. Our paper contributes to this literature, as it is one of the first to perform such a joint test and assess the relative importance of probability weighting and choice set dependence in a heterogeneous population.Footnote 4

The paper also contributes to the strand of literature that uses finite mixture models to classify subjects into types. This literature has focused mostly on discriminating EUT from non-EUT preferences in a single choice context (Bruhin et al., 2010; Conte et al., 2011; Fehr-Duda et al., 2010; Santos-Pinto et al., 2015).Footnote 5 Our second main result enhances this strand of literature by uncovering the relative importance of probability weighting and choice set dependence within the group of non-EUT subjects.

Finally, the paper performs out-of-sample predictions, allowing it to establish a link between the aforementioned papers about Allais Paradoxes and the literature on preference reversals (Cox & Epstein, 1989; Grether & Plott, 1979; Lichtenstein & Slovic, 1971; Lindman, 1971; Loomes et al., 1991; Pommerehne et al., 1982; Reilly, 1982). The out-of-sample predictions confirm that both phenomena result from choice set dependence. Moreover, the paper shows that a parsimonious structural model has predictive power across different choice contexts – which is essential for modeling and predicting subjects’ behavior across different applications of choice under risk.

The paper has the following structure. Section 2 explains the strategy for discriminating between the different decision theories. Section 3 introduces the experimental design. Section 4 presents the non-parametric results at the aggregate level, while Sect. 5 discusses the structural model, its results, and its power to predict preference reversals in a different choice context. Finally, Sect. 6 concludes.

2 Discriminating between decision theories

This section describes our empirical strategy for discriminating between EUT, probability weighting, and choice set dependence. We focus on the two most descriptive behavioral theories, i.e., CPT representing probability weighting and ST representing choice set dependence. The empirical strategy (i) relies on a series of binary choices between lotteries that may trigger Common Consequence and Common Ratio Allais Paradoxes and (ii) manipulates the choice set by making the two lotteries’ payoffs either independent or dependent of each other.

We explain the empirical strategy with the following binary choice between lotteries X and Y, taken from Kahneman and Tversky (1979), which may trigger the Common Consequence Allais Paradox:Footnote 6

$$\begin{aligned} X = \left\{ \begin{array}{rr} 2500 &{} p_1=0.33 \\ z &{} p_2=0.66 \\ 0 &{} p_3=0.01 \\ \end{array} \right. \qquad \text {vs.} \qquad Y = \left\{ \begin{array}{rr} 2400 &{} p_1+p_3=0.34 \\ z &{} p_2=0.66 \\ \end{array} \right. \end{aligned}.$$
(1)

Note that the two lotteries have a common consequence, i.e., a payoff z which occurs with probability \(p_2\) in both lotteries. In this example, the Common Consequence Allais Paradox refers to the robust empirical finding that if \(z=2400\), most individuals prefer Y over X, whereas if \(z=0\), most individuals prefer X over Y.

Next, we show that EUT can never describe the Allais Paradox, CPT can always describe it, and ST can only describe the Allais Paradox when the payoffs of the two lotteries are independent but not when they are dependent.

2.1 EUT

According to EUT, the decision maker evaluates any lottery L with non-negative payoffs \(x=(x_1, \ldots , x_J)\) and associated probabilities \(p=(p_1, \ldots , p_J)\) as:

$$\begin{aligned} V^{EUT}(L) = \sum _{j=1}^{J} {p_j \, v(x_j)} \,, \end{aligned}$$

where v is an increasing utility function over monetary payoffs with \(v(0)=0\).Footnote 7 Note that the value \(V^{EUT}(L)\) only depends on the attributes of lottery L and not on the attributes of the other lotteries in the choice set. EUT cannot explain the Common Consequence Allais Paradox since, when comparing the values of the two lotteries \(V^{EUT}(X)\) and \(V^{EUT}(Y)\), the term involving the common consequence, \(p_2 \, v(z)\), cancels out. Hence, the decision maker’s choice between X and Y does not depend on the value of the common consequence z.

2.2 CPT

According to CPT, the decision maker ranks the non-negative monetary payoffs of any lottery L such that \(x_1 \ge \ldots \ge x_J\) and evaluates the lottery as:

$$\begin{aligned} V^{CPT}(L) = \sum _{j=1}^{J} {\omega _j^{CPT} (p) \, v(x_j)} \,, \end{aligned}$$

where \(\omega _j\) is the decision weight attached to the value of payoff \(x_j\). As in EUT, the value \(V^{CPT}(L)\) only depends on the attributes of lottery L, i.e., the decision maker evaluates the lottery in isolation. The decision weights are given by:

$$\begin{aligned} \omega _j^{CPT} (p) = \left\{ \begin{array}{ll} w(p_1) - w(0) &{} \text{ for }\;\;j=1 \\ w\left( \sum _{k=1}^{j}{p_k}\right) - w\left( \sum _{k=1}^{j-1}{p_k}\right) &{} \text{ for }\;\;2\le j\le J-1 \\ w(1) - w \left( \sum _{k=1}^{J-1}{p_k}\right) &{} \text{ for }\;\;j = J \end{array} \right. \,, \end{aligned}$$

where \(p_k\) is payoff \(x_k\)’s probability and w is the probability weighting function. Typically, the probability weighting function in CPT exhibits three properties (Fehr-Duda & Epper, 2012; Kahneman & Tversky, 1979; Prelec, 1998; Wakker, 2010):

  1. 1.

    Strictly increasing and satisfying \(w(0)=0\) and \(w(1)=1\). This ensures that decision weights are non-negative and sum to one.

  2. 2.

    Inverse S-shape. The probability weighting function is concave for small probabilities and convex for large probabilities. This property ensures the decision maker overweights small probabilities and underweights large probabilities. It is necessary for CPT to be able to explain the Common Consequence Allais Paradox, as explained further below.

  3. 3.

    Subproportionality. For the probabilities \(1\ge q> p > 0\) and the scaling factor \(0< \lambda < 1\), the inequality \(\frac{w(q)}{w(p)}>\frac{w(\lambda q)}{w(\lambda p)}\) holds. Subproportionality is needed for CPT to be able to explain the Common Ratio Allais Paradox, as shown in Appendix A.

We now explain how CPT can describe the Common Consequence Allais Paradox in the choice between lotteries X and Y. When \(z=2400\), the choice is:

$$\begin{aligned} X = \left\{ \begin{array}{rr} 2500 &{} p_1=0.33 \\ 2400 &{} p_2=0.66 \\ 0 &{} p_3=0.01 \\ \end{array} \right. \qquad \text{ vs. } \qquad Y = 2400 \,. \end{aligned}$$

In this case, the decision maker tends to prefer Y over X. Due to the decision maker’s tendency to overestimate small probabilities and underestimate large probabilities, the decision weight attached to the lowest payoff of X, \(1 - w(0.99)\), is larger than its objective probability \(p_3=0.01\), which renders X unattractive.

In contrast, when \(z=0\), the choice is:

$$\begin{aligned} X = \left\{ \begin{array}{rr} 2500 &{} p_1=0.33 \\ 0 &{} p_2+p_3=0.67 \\ \end{array} \right. \qquad \text{ vs. } \qquad Y = \left\{ \begin{array}{rr} 2400 &{} p_1+p_3=0.34 \\ 0 &{} p_2=0.66 \\ \end{array} \right. \,. \end{aligned}$$

In this case, the decision maker tends to prefer X over Y. Now, the decision weights of the two lotteries’ highest payoffs, w(0.33) and w(0.34), are very close and, therefore, the decision is driven by the difference in utilities between v(2500) and v(2400) rather than the difference in probabilities.

In sum, CPT can always explain the Allais Paradox because the decision weights depend non-linearly on the marginal payoff distribution of the lottery under consideration, which remains unchanged regardless whether the lotteries’ payoffs are independent or dependent.

2.3 ST

According to ST, cognitive limitations cause the decision maker to be a local thinker who focuses her attention on states of the world in which one payoff stands out relative to the payoffs of the alternative. The decision maker overweights these salient states relative to the others. As the salience of a state directly depends on the payoffs of the alternative, a lottery’s value is choice set dependent and – in contrast to EUT and CPT – lotteries are no longer evaluated in isolation.

Formally, if the decision maker has to choose between two lotteries \(L^1\) and \(L^2\), she ranks each possible state \(s \in \{1, \ldots , S\}\) according to its salience \(\sigma (x^1_s, x^2_s)\), where \(x^1_s\) and \(x^2_s\) are the payoffs of \(L^1\) and \(L^2\), respectively, in state s. The salience function \(\sigma\) satisfies four properties:

  1. 1.

    Ordering. For two states s and \(\tilde{s}\), we have that if \([x_s^{\min }, x_s^{\max }]\) is a subset of \([x_{\tilde{s}}^{\min }, x_{\tilde{s}}^{\max }]\), then \(\sigma (x_{\tilde{s}}^1,x_{\tilde{s}}^2) > \sigma (x_s^1, x_s^2)\). Ordering implies that states with bigger differences in payoffs are more salient.

  2. 2.

    Diminishing Sensitivity. For any \(\epsilon > 0\), \(\sigma (x_s^1, x_s^2) > \sigma (x_s^1 + \epsilon , x_s^2 + \epsilon )\). Diminishing sensitivity implies that, for states with a given difference in payoffs, salience diminishes the further away from zero the difference in payoffs is.

  3. 3.

    Symmetry: \(\sigma (x^1_s, x^2_s) = \sigma (x^2_s, x^1_s)\). Symmetry implies that permutations of payoffs between lotteries leave the salience of a state unchanged.

  4. 4.

    Zero Contrast. For two states s and \(\tilde{s}\) where \(x^1_s = x^2_s\) and \(x^1_{\tilde{s}} \ne x^2_{\tilde{s}}\), \(\sigma (x^1_s, x^2_s) < \sigma (x^1_{\tilde{s}}, x^2_{\tilde{s}})\). Zero contrast implies that if two lotteries offer the same payoff in a particular state, this state is the least salient.

The decision weight of each state s depends on the state’s salience-rank, \(r_s \in \{1,\ldots , S\}\) with lower values being associated with higher salience:

$$\begin{aligned} \omega ^{ST}_s (x^1, x^2) = p_s \, \frac{\delta ^{r_s}}{\sum _{m \in S}{\delta ^{r_m} \, p_m}}\,, \end{aligned}$$
(2)

where \(p_s\) is the probability that state s is realized, and \(0<\delta \le 1\) is the decision maker’s degree of local thinking. For \(\delta =1\), the decision maker weights states by their objective probabilities, whereas, for \(\delta < 1\), the decision maker is a local thinker and overweights salient states. This yields the following values for lotteries \(L^1\) and \(L^2\):

$$\begin{aligned} V^{ST}(L^1 ) = \sum _{s=1}^{S} {\omega ^{ST}_s (x^1, x^2) \, v(x^1_s)} \end{aligned}$$

and

$$\begin{aligned} V^{ST}(L^2 ) = \sum _{s=1}^{S} {\omega ^{ST}_s (x^1, x^2) \, v(x^2_s)} \,. \end{aligned}$$

Note that the value of each lottery depends on both lotteries in the choice set \(\{L^1,L^2\}\).

We now explain how ST can describe the Common Consequence Allais Paradox in the choice between lotteries X and Y when their payoffs are independent of each other. When \(z=2400\), there are three states of the world which rank in salience as follows: \(\sigma (0,2400)> \sigma (2500,2400) > \sigma (2400,2400)\). The decision maker prefers lottery Y over X if \(V^{ST}(Y ) > V^{ST}(X )\), where:

$$\begin{aligned} V^{ST}(Y ) = v(2400) \,, \end{aligned}$$

and

$$\begin{aligned} V^{ST}(X ) &= \omega _2^{ST}(2500,2400)\, v(2500) \\&+ \omega _3^{ST}(2400,2400)\, v(2400) \\&+ \omega _1^{ST}(0,2400)\, v(0) \,. \end{aligned}$$

Using \(v(0)=0\) and the decision weights given by Eq. (2), the condition for preferring Y over X becomes:

$$\begin{aligned} \delta < \frac{0.01}{0.33} \, \frac{v(2400)}{v(2500)-v(2400)} \,. \end{aligned}$$
(3)

Intuitively, lottery X provides the lowest payoff in the most salient state which makes lottery Y relatively attractive despite having a lower expected payoff. Hence, when the common consequence is \(z=2400\) and the degree of local thinking is severe enough, the decision maker prefers Y over X.

In contrast, when \(z=0\), there are four states of the world which rank in salience as follows: \(\sigma (2500,0)>\sigma (0,2400)>\sigma (2500,2400)>\sigma (0,0)\). The decision maker prefers lottery X over Y if \(V^{ST}(X) > V^{ST}(Y)\), where:

$$\begin{aligned} V^{ST}(X) &= \left[ \omega _1^{ST}(2500,0) + \omega _3^{ST}(2500,2400) \right] \, v(2500) \\&+ \left[ \omega _2^{ST}(0,2400) + \omega _4^{ST}(0,0) \right] \, v(0)\,, \end{aligned}$$

and

$$\begin{aligned} V^{ST}(Y) &= \left[ \omega _2^{ST}(0,2400) + \omega _3^{ST}(2500,2400) \right] \, v(2400) \\&+ \left[ \omega _1^{ST}(2500,0) + \omega _4^{ST}(0,0) \right] \, v(0)\,. \end{aligned}$$

Using \(v(0)=0\) and the decision weights given by Eq. (2), the decision maker prefers X over Y when:

$$\begin{aligned} \nonumber (0.33) \, (0.66) \, v(2500) - \delta \, (0.67) \, (0.34) \, v(2400)&\\ + \delta ^2 \,(0.33) \, (0.34) \, \left[ v(2500) -v(2400) \right]> & {} \;0 \,. \end{aligned}$$
(4)

Now, lottery X provides the highest payoff in the most salient state. Hence, when the common consequence is \(z=0\) and the degree of local thinking is severe enough, the decision maker prefers X over Y.

We now turn to the case in which the two lotteries’ payoffs are dependent of each other. In that case, ST can no longer describe the Common Consequence Allais Paradox. When the two lotteries’ payoffs are dependent of each other, there are just the following three states of the world:

figure a

The ranking in terms of salience of these three states, \(\sigma (0,2400)> \sigma (2500,2400) > \sigma (z,z)\), is independent of the common consequence z. Hence, regardless of the common consequence, the decision maker tends to prefer Y over X, and the Common Consequence Allais Paradox can no longer be described by ST when the lotteries’ payoffs are dependent of each other.

In sum, ST can explain the Allais Paradox only when the lotteries’ payoffs are independent but not when they are dependent of each other. This is because decision weights depend on the joint payoff distribution of the two lotteries in the choice set, which changes when we manipulate the lotteries’ joint payoff distribution.

2.4 Empirical strategy

Table 1 summarizes the empirical strategy to discriminate between EUT, probability weighting, and choice set dependence. EUT can never explain the Allais Paradox. In contrast, probability weighting – represented by CPT – can explain the Allais paradox regardless whether the lotteries’ payoffs are independent or dependent. Finally, choice set dependence – represented by ST – can explain the Allais paradox only when the lotteries’ payoffs are independent but not when they are dependent of each other.

Table 1 When can the Allais Paradox occur?

3 Experimental design

This section presents the experimental design which consists of two parts. In the main part, subjects make choices that may trigger three versions of the Allais Paradox: the classical and a generalized version of the Common Consequence Allais Paradox as well as the Common Ratio Allais Paradox. Based on these choices, we discriminate between EUT, probability weighting, as well as choice set dependence, and classify subjects into EUT-, CPT-, and ST-types, respectively. In the additional part, subjects make choices that could lead to preference reversals which allow us to validate our results in a different choice context.

3.1 Main part

We now present the main part of the experiment. First, we explain how we construct the series of binary choices. Subsequently, we describe the two formats which we use to present the binary choices to the subjects.

3.1.1 Choices between lotteries

Every subject goes through two blocks of binary choices between lotteries that may trigger the Allais Paradoxes. Both blocks feature the same lotteries, except that in one block the lotteries’ payoffs are independent while in the other they are dependent.

The binary choices within each block feature lotteries that vary systematically in payoffs and probabilities. This systematic variation not only allows us to estimate the parameters of each decision theory in the structural model but also ensures that our results are not driven by a particular set of lotteries. To avoid order effects, we randomize (i) the order of the binary choices within each of the two blocks across subjects and (ii) counterbalance the order of the two blocks across subjects.

The binary choices that may trigger the classical and the generalized version of the Common Consequence Allais Paradox are based on a \(3\times 3\times 2\) design. The design uses the following three different payoff levels:

$$\begin{aligned} \begin{array}{l} \text {Payoff Level 1:} \qquad X = \left\{ \begin{array}{rr} 2500 &{} p_1 \\ z &{} p_2 \\ 0 &{} p_3 \\ \end{array} \right. \qquad \text {vs.} \qquad Y = \left\{ \begin{array}{rr} 2400 &{} p_1+p_3 \\ z &{} p_2 \\ \end{array} \right. \\ \text {Payoff Level 2:} \qquad X = \left\{ \begin{array}{rr} 5000 &{} p_1 \\ z &{} p_2 \\ 0 &{} p_3 \\ \end{array} \right. \qquad \text {vs.} \qquad Y = \left\{ \begin{array}{rr} 4800 &{} p_1+p_3 \\ z &{} p_2 \\ \end{array} \right. \\ \text {Payoff Level 3:} \qquad X = \left\{ \begin{array}{rr} 3000 &{} p_1 \\ z &{} p_2 \\ 500 &{} p_3 \\ \end{array} \right. \qquad \text {vs.} \qquad Y = \left\{ \begin{array}{rr} 2600 &{} p_1+p_3 \\ z &{} p_2 \\ \end{array} \right. \end{array}. \end{aligned}$$

Varying the payoffs across these three levels while keeping probabilities constant identifies the curvature of the utility function, v. Similarly, the design features three different probability distributions, \(p=(p_1,p_2,p_3)\), over the lotteries’ payoffs:

$$\begin{aligned} \begin{array}{ll} \text {Probability Distribution 1:} &{} p=(0.33, 0.66, 0.01) \\ \text {Probability Distribution 2:} &{} p=(0.30, 0.65, 0.05) \\ \text {Probability Distribution 3:} &{} p=(0.25, 0.60, 0.15). \end{array} \end{aligned}$$

Varying the probability distributions while keeping the lotteries’ payoffs constant identifies the shape of probability weighting function, w, in CPT and the degree of local thinking, \(\delta\), in ST. Finally, the design uses the following levels of the common consequence, z, to trigger the two versions of the Common Consequence Allais Paradox:

  1. 1.

    \(z=x_3\), i.e., the common consequence is equal to the lowest payoff of lottery X. In this case, lotteries X and Y offer two payoffs each.

  2. 2.

    \(z=y_1\), i.e., the common consequence is equal to the first payoff of lottery Y. In this case, lottery X offers three payoffs and lottery Y is a sure amount.

  3. 3.

    z is different from any other payoff of the two lotteries and slightly below the first payoff of lottery Y.Footnote 8 In this case, lottery X offers three payoffs and lottery Y offers two payoffs.

The first two levels of the common consequence, 1 and 2a, trigger the classical version of the Common Consequence Allais Paradox, as described in the previous section. The first and the third levels, 1 and 2b, trigger a generalized version of the Common Consequence Allais Paradox. The advantage of this generalized version is that the lottery Y does not degenerate into a sure amount which could lead to a specific certainty effect. However, the disadvantage of this generalized version is that, if lottery payoffs are independent, subjects have to consider \(2 \times 3 = 6\) possible states of the world resulting in higher cognitive load.

To expose subjects to an even broader variety of decision situations, the design also includes binary choices that may trigger the Common Ratio Allais Paradox. These choices are based on a similar \(3 \times 3 \times 2\) design, as shown in Appendix B. To provoke the Common Ratio Allais Paradox, the design scales down probability levels but keeps the lotteries’ payoffs unchanged. Moreover, as before, it manipulates the lotteries’ joint payoff distribution to discriminate between the different classes of decision theories. While CPT can describe the Common Ratio Allais Paradox regardless of the lotteries’ joint payoff distribution, ST can describe it only when payoffs are independent but not when they are dependent (see Appendix A for details). The mechanism in ST works as follows: when payoffs are independent, the decision maker’s evaluation of the lotteries depends on the salience of the states as well as their objective probabilities. However, when payoffs are dependent, her evaluation no longer depends on the objective probabilities. This mechanism is arguably more subtle than the one behind the Common Consequence Allais Paradox, as the lotteries’ payoffs in each binary choice remain unchanged.

3.1.2 Presentation format

Fig. 1
figure 1

Presentation of a lottery with independent payoffs (translated from French)

We present the binary choices between lotteries in two formats, the “canonical presentation” and the “states of the world presentation”, exposing half of the subjects to either of them. The two formats differ in the way they present the choices between lotteries with independent payoffs to the subjects. In the canonical presentation, as shown by the screenshot in Panel (a) of Fig. 1, the lotteries X and Y are presented side by side as separate lotteries with independent payoff distributions. In the states of the world presentation, as shown by the screenshot in Panel (b) of Fig. 1, the lotteries are presented in a table displaying their joint payoff distribution. For choices between lotteries with dependent payoffs, the two presentation formats are identical and display the lotteries’ joint payoff distribution.

The two presentation formats have distinct advantages and disadvantages. The main advantages of the canonical presentation are that it emphasizes the difference between lotteries with independent vs. dependent payoffs and that subjects are probably more used to the canonical presentation of lotteries with independent payoffs. However, the main disadvantage of the canonical presentation is that between the two blocks not only the lotteries’ joint payoff distribution changes but also their visual presentation. In contrast, the states of the world presentation keeps the visual presentation constant across the two blocks, but presents lotteries with independent payoffs in an unfamiliar way. Ideally, our results should remain valid under both presentation formats.

3.2 Additional part

To validate the classification of subjects into types in a different choice context, we perform out-of-sample predictions about the frequency of preference reversals. To trigger preference reversals we first expose subjects to six binary choices between additional lotteries and, subsequently, let them evaluate these lotteries in isolation by stating their certainty equivalent. We added the six binary choices to the main part of the experiment but used these choices neither for estimating the subject’s preferences nor for classifying them into types.

Each of the six binary choices consists of a relatively safe lottery \(\tilde{X}\) with a low payoff-variance and a more risky lottery \(\tilde{Y}\) with high payoff-variance. The two lotteries have the following format:

$$\begin{aligned} \tilde{X} = \left\{ \begin{array}{rr} x &{} p \\ 0 &{} 1-p \\ \end{array} \right. \qquad \text {vs.} \qquad \tilde{Y} = \left\{ \begin{array}{rr} t x &{} p/t \\ 0 &{} 1-p/t \\ \end{array} \right. \,, \end{aligned}$$

with a scaling factor \(t\in \{2,4,16\}\). All six binary choices can be found in Appendix C. As Bordalo et al. (2012b) discuss in detail, subjects tend to prefer the relatively safe lottery \(\tilde{X}\) over the risky lottery \(\tilde{Y}\) in a pairwise choice but, at the same time, indicate a higher certainty equivalent for \(\tilde{Y}\) than for \(\tilde{X}\) when evaluating the lotteries in isolation. Bordalo et al. (2012b) also explain that ST can describe these so called preference reversals due to the change in the choice set while EUT and CPT can never describe them.Footnote 9 Section 2 of the Online Appendix derives for each of the six binary choices the conditions under which ST describes a preference reversal.

Fig. 2
figure 2

Elicitation of certainty equivalents in the additional part of the experiment. This screenshot shows an example of the choice menu we used for eliciting the subjects’ certainty equivalents, when they had to evaluate lotteries in isolation during the additional part of the experiment. The example is translated from French. Payoff are reported in points with 100 points corresponding to 1 Swiss Franc or 1.04 USD

To elicit the certainty equivalents in the additional part of the experiment, we present each of the lotteries \(\tilde{L} \in \{\tilde{X}, \tilde{Y}\}\) in a choice menu in which the subject has to indicate whether she prefers the lottery or a certain payoff \(z_r\). Figure 2 shows an example fo such a choice menu. The certain payoff increases from the lottery’s lowest payoff, \(z_1=0\), to its highest payoff \(z_{21}\) in 21 equal increments. The point where the subject switches from preferring the certain payoff to preferring the lottery allows us to approximate the certainty equivalent by \(CE(\tilde{L}_{k}) = (z_{k} + z_{k+1}) / 2\) for \(k \in \{1,\ldots ,20\}\).Footnote 10

We randomize the order in which we elicit the certainty equivalents of the additional lotteries across subjects. Moreover, since the six binary choices between the additional lotteries appear in the main part of the experiment, subjects should not recall the additional lotteries when stating their certainty equivalents.Footnote 11

By comparing the binary choices between the additional lotteries and their certainty equivalents, we can detect the number of preference reversals of each subject. Since there are six binary choices, each subject can exhibit between 0 and 6 preference reversals.

3.3 Number of choices

Subjects in the canonical presentation go through a total of 93 binary choices, while subjects in the states of the world presentation go through 84 binary choices. The number of binary choices differs between the presentation formats since the 9 binary choices designed for triggering the Common Consequence Allais Paradox in which lottery X has three payoffs and lottery Y is a sure amount look identical regardless whether the lotteries’ payoffs are independent or dependent. Table 4 in Appendix D decomposes the number of choices in each presentation format. Regardless of the presentation format, each subject also evaluates 9 lotteries in isolation during the additional part of the experiment.

3.4 Implementation in the lab and incentives

We conducted the experiment in the computer lab at the University of Lausanne (LABEX) using an application based on PHP and MySQL. Most subjects were students of the University of Lausanne and the École Polytechnique Fédérale de Lausanne (EPFL), recruited via ORSEE (Greiner, 2015). The experiment consisted of 14 sessions with 283 subjects in total.

To incentivize subjects’ choices in both parts of the experiment, we applied the prior incentive system (Johnson et al., 2021). This avoids violations of isolation, which may otherwise arise with a random incentive system, as pointed out by Holt (1986). In each part, every subject had to draw a sealed envelope from an urn before making any choices. The envelope contained one of the choices the subject was going to make in that part and which later was used for payment. At the very end of the experiment, the subject went to another room where she opened the envelopes together with an assistant. To determine her payment, which she received in cash at the end, she rolled two dice if the choice in the corresponding envelope involved two lotteries with independent payoffs and only one die if the choice involved two lotteries with dependent payoffs.

At the beginning of the experiment, subjects received general instructions informing them about the structure of the experiment, their anonymity, the show up fee, and the conversion rate of points into Swiss Francs.Footnote 12 At the beginning of each part, subjects received additional printed instructions. These additional instructions comprised a description of the choices and the payment procedure for that part. They carefully described the difference between lotteries with independent and dependent payoffs. They also explained that, at the end of the experiment, the subject will open the envelope to determine her payoff and roll one one or two dice, depending on whether the lotteries’ payoffs in the payment relevant choice are independent or dependent. The instructions also contained several comprehension questions whose answers the assistants verified before subjects could begin. The additional instructions differed depending on whether a subject was exposed to the canonical presentation or the states of the world presentation. All instructions were written in French. English translations are available in Sect. 8 of the Online Appendix.

After making their choices, but before determining and receiving their payments, subjects filled in a demographic questionnaire, completed a short version of the Big 5 personality questionnaire, and a cognitive ability test with 12 questions based on Raven’s matrices. The instructions were shown on screen at the beginning of each task. The cognitive ability test was also incentivized and subjects received 50 points per correct answer.Footnote 13

Each subject received a show-up fee of 10 Swiss Francs. Total earnings varied between 12.00 and 142.50 Swiss Francs with a mean of 57.66 and a standard deviation of 26.39 Swiss Francs. Each session lasted approximately 90 minutes.

4 Non-parametric results

In this section, we present the non-parametric results. We start by summarizing the systematic patterns in the frequency of Allais Paradoxes before discussing whether they can be described by EUT, CPT, and ST.

Fig. 3
figure 3

Relative frequency of Allais Paradoxes. The figure shows the average frequency of Allais Paradoxes relative to their maximum possible number for lotteries with independent and dependent payoffs. Panel (a) depicts the relative frequency of Allais Paradoxes in the expected direction. Panel (b) shows the relative frequency of Allais Paradoxes in the inverse direction and reflects noise. Panel (c) shows the difference between the relative frequencies of Allais Paradoxes in the expected and inverse directions, i.e., net of noise. The two bars on the left pool the choices from subjects exposed to the canonical presentation with those from subjects exposed to the states of the world presentation. The two bars in the middle and on the right separate the choices by presentation format

Figure 3 shows the average frequency of Allais Paradoxes relative to their maximum possible number separately for lotteries with independent and dependent payoffs. Panel (a) exhibits the frequency of Allais Paradoxes in the expected direction, that is, the direction predicted by CPT and ST. Regardless of the presentation format, Allais Paradoxes in the expected direction occur often with both independent and dependent payoffs. However, they are substantially more frequent with independent payoffs than with dependent payoffs. For example, for both presentation formats combined, the frequency of Allais Paradoxes in the expected direction is 28.3% with independent payoffs and 16.9% with dependent payoffs.

Panel (b) exhibits the frequency of Allais Paradoxes in the inverse direction, that is, the direction none of the theories can explain. Regardless of the presentation format, Allais Paradoxes in the inverse direction not only are much less frequent than those in the expected direction but also occur with the same frequency across independent and dependent payoffs.Footnote 14 Given that neither theory can describe these Allais Paradoxes in the inverse direction and given that their frequency is constant across presentation formats as well as across independent and dependent payoffs, we interpret them as the result of decision noise. This interpretation is in line with the literature which acknowledges the existence and relevance of decision noise (e.g. Hey, 2005).

Panel (c) exhibits the difference in the relative frequency of Allais Paradoxes in the expected and in the inverse directions. Under the assumption that the level of decision noise is the same in both directions, we can interpret this difference as the frequency of Allais Paradoxes net of decision noise. These net frequencies confirm that, regardless of the presentation format, Allais Paradoxes occur often and are more than twice as frequent with independent than with dependent payoffs. More specifically, the ratio of Allais Paradoxes between independent and dependent payoffs is 2.127 for both presentation formats combined, 2.755 for the canonical presentation, and 1.648 for the states of the world presentation.

We now discuss which of the three theories is able to describe the above patterns. EUT fails to describe the patterns as it never predicts any Allais Paradoxes and, thus, their net frequencies should always be zero. CPT and ST can each describe some but not all of the above patterns. While CPT can describe the occurrence of Allais Paradoxes for both independent and dependent payoffs, it cannot describe that their net frequency is higher with independent payoffs than with dependent payoffs. In contrast, ST can describe that Allais Paradoxes are more frequent with independent payoffs than with dependent payoffs. However, it cannot describe the occurrence of Allais Paradoxes with dependent payoffs. In sum, none of the three theories alone can explain all of the above patterns in the aggregate frequency of Allais Paradoxes. However, CPT and ST each describe some of the patterns and, thus, both of them play a role. This non-parametric evidence yields our first main result.

Result 1

Probability weighting and choice set dependence both play a role in explaining aggregate choices.

  1. 1.

    Subjects exhibit a high frequency of Allais Paradoxes no matter if payoffs are independent or dependent.

  2. 2.

    However, subjects exhibit a higher frequency of Allais Paradoxes with independent than with dependent payoffs.

  3. 3.

    The result is robust across the three versions of the Allais Paradox and the two presentation formats.

Figure 4 illustrates that Result 1 holds across all three versions of the Allais Paradox. The bars on the left and in the middle show the net frequency of the classical version and the general version of the Common Consequence Allais Paradox, respectively (see Sect. 3.1.1 for details). The bars on the right show the net frequency of the Common Ratio Allais Paradox. Allais Paradoxes occur often and are more frequent with independent than with dependent payoffs. However, the difference in the net frequencies between independent and dependent payoffs is less pronounced for the Common Ratio than for the Common Consequence Allais Paradox. This is probably because the mechanism in ST behind the Common Ratio Allais Paradox is more subtle than the one behind the Common Consequence Allais Paradox, as mentioned earlier in Sect. 3.1.1.Footnote 15

Fig. 4
figure 4

Net frequency of each version of the Allais Paradox. The figure shows the net frequency of each of the three different versions of the Allais Paradox, separately for lotteries with independent and dependent payoffs. The two bars on the left show the net frequency of the classical version of the Common Consequence Allais Paradox (see Sect. 3.1.1, level of the common consequence: 1 vs. 2a). The two bars in the middle show the net frequency of the general version of the Common Consequence Allais Paradox (see Sect. 3.1.1, level of the common consequence: 1 vs. 2b). The two bars on the right show the net frequency of the Common Ratio Allais Paradox. Net frequency of Allais Paradoxes refers to the difference in the relative frequency of Allais Paradoxes in the expected and the inverse directions. Choices from both presentation formats are pooled together

Result 1 is also robust across the two presentation formats. This robustness allows us to addresses two concerns about the relevance of choice set dependence.

First, it confirms the results by Loomes and Sugden (1987), Loomes (1988), and Starmer and Sugden (1989) who found juxtaposition effects in common ratio lottery choices using the states of the world presentation format. It also rules out the concern that by making the common consequence more obvious, the states of the world presentation influences the frequency of Allais Paradoxes (Birnbaum, 2004; Birnbaum et al., 2017; Keller, 1985; Leland, 2010) or decision times (for details, see Appendix A).

Second, the robustness across presentation formats confirms that manipulating the lotteries’ joint payoff distribution affects the frequency of Allais Paradoxes primarily through choice set dependence and not through event-splitting effects. An event-splitting occurs when an event with a given payoff is split into sub-events that offer the event’s payoff in multiple states of the world. Earlier papers raised the concern that event-splitting could influence subjects’ choices if splitting an event into sub-events increases its weight (Humphrey, 1995; Starmer & Sugden, 1993). If such event-splitting effects played a role in our data, their strength would depend on the presentation format: they would be stronger in the states of the world presentation where the number of sub-states in which a payoff appears is particularly prominent. However, our first main result holds across the two presentation formats in our data. Thus, we conclude that event-splitting effects play no significant role in our results (for details, see Appendix E.2).

Fig. 5
figure 5

Distribution of the net frequency of Allais Paradoxes. The histograms show the distribution of the net frequency of Allais Paradoxes for independent and dependent lottery payoffs. Net frequency of Allais Paradoxes refers to the difference in the relative frequencies of Allais Paradoxes in the expected and the inverse directions. Choices from both presentation formats are pooled together

Next, we analyze the distribution of the net frequency of Allais Paradoxes to get a first glimpse at the potential heterogeneity that may be behind Result 1. Figure 5 depicts the corresponding histograms separately for lotteries with independent and dependent payoffs. Not surprisingly, the distribution for lotteries with independent payoffs is located to the right of the distribution for lotteries with dependent payoffs. However, interestingly, both distributions appear to be bimodal. They both exhibit one mode at the lowest bin, corresponding to a net frequency of Allais Paradoxes between 0 and 5%, and another mode at a bin corresponding to a higher net frequency. This multimodality suggests that Result 1 may be driven by considerable heterogeneity in subjects’ risk preferences. In particular, the choices of some subjects may be predominantly influenced by probability weighting whereas the choices of others may be primarily driven by choice set dependence. There may also exist a minority of EUT-subjects who display no or only few Allais Paradoxes. We examine this possibility with the structural model which we present in the next section.

5 Structural model

In this section, we discuss the set-up and the results of the structural model. We also validate the model’s power to predict preference reversals in a different choice context.

5.1 Set-up

The structural model is based on a finite mixture model (see McLachlan & Peel, 2000, for an overview) and uses a random utility approach for discrete choices (McFadden, 1981). It discriminates between subjects whose preferences are best described by EUT, subjects whose preferences display probability weighting and are best described by CPT, and subjects whose preferences display choice set dependence and are best described by ST. Controlling for the presence of EUT subjects is important, as the behavior of a minority of our subjects may still be best described by EUT, as previously found by other studies (Bruhin et al., 2010; Conte et al., 2011).

5.1.1 Random utility approach

The random utility approach allows the structural model to explicitly take decision noise into account. Consider a subject \(i \in \{1,\ldots ,N\}\) whose preferences are best described by decision model M in the set of decision models \(\mathcal {M} = \{EUT, CPT, ST\}\). She prefers lottery \(X_g\) over \(Y_g\) in binary choice \(g\in \{1,\ldots ,G\}\) when the random utility of choosing \(X_g\), \(V^M(X_g, \theta _M) + \epsilon _X\), is higher than the random utility of choosing \(Y_g\), \(V^M(Y_g, \theta _M) + \epsilon _Y\). The random errors, \(\epsilon _X\) and \(\epsilon _Y\), are realizations of an extreme value 1 distribution with scale parameter \(1/\sigma _M\), and the vector \(\theta _M\) comprises decision model M’s preference parameters. This implies that the probability of subject i choosing \(X_g\), i.e., \(C_{ig} = X\), is given by:

$$\begin{aligned} \begin{aligned} Pr(C_{ig} = X;\;\theta _M, \sigma _M)&= Pr [V^M(X_g, \theta _M) - V^M(Y_g, \theta _M) \ge \epsilon _Y - \epsilon _X]\\ {}&= \frac{\exp [\sigma _M \, V^M(X_g, \theta _M)]}{\exp [\sigma _M \, V^M(X_g, \theta _M)] + \exp [\sigma _M \, V^M(Y_g, \theta _M)]}\,. \end{aligned} \end{aligned}$$
(5)

The parameter \(\sigma _M\) governs the choice sensitivity with respect to differences in the lotteries’ deterministic value. If \(\sigma _M\) is 0, the subject chooses each lottery with probability 50% regardless of the deterministic value it provides. If \(\sigma _M\) is arbitrarily large, the probability of choosing the lottery with the higher deterministic value approaches 1.

Subject i’s contribution to the density function of the random utility model corresponds to the product of the choice probabilities over all G binary decisions, i.e.,

$$\begin{aligned} f_M (C_i; \theta _M, \sigma _M) = &\prod _{g=1}^{G}Pr(C_{ig} = X;\;\theta _M, \sigma _M)^{I(C_{ig}=X)} \, \\&Pr(C_{ig} = Y;\;\theta _M, \sigma _M)^{1-I(C_{ig}=X)}\,, \end{aligned}$$

where \(I(C_{ig}=X)\) is 1 if subject i chooses lottery \(X_g\) and 0 otherwise.

5.1.2 Finite mixture model

Since risk preferences may be heterogeneous, we do not directly observe which model best describes subject i’s preferences. In other words, we do not know ex-ante whether subject i is an EUT-, CPT-, or ST-type. Hence, we have to weight i’s type-specific density contributions by the corresponding ex-ante probabilities of type-membership, \(\pi _M\), in order to obtain her contribution to the likelihood of the finite mixture model:

$$\begin{aligned} \begin{aligned} \ell (\Psi ;C_i)&= \pi _{EUT}\, f_{EUT} (C_i;\;\theta _{EUT}, \sigma _{EUT}) \\&+ \pi _{CPT}\, f_{CPT} (C_i;\;\theta _{CPT}, \sigma _{CPT})\\&+ \, \pi _{ST} \, f_{ST} (C_i;\;\theta _{ST}, \sigma _{ST})\,, \end{aligned} \end{aligned}$$
(6)

where the vector \(\Psi = (\theta _{EUT}, \theta _{CPT}, \theta _{ST}, \sigma _{EUT}, \sigma _{CPT}, \sigma _{ST}, \pi _{EUT}, \pi _{CPT})\) comprises all parameters that need to be estimated, and \(\pi _{ST}=1-\pi _{EUT}-\pi _{CPT}\).Footnote 16 Note that the ex-ante probabilities of type-membership are the same across all subjects and correspond to the relative sizes of the types in the population.

Once we estimated the parameters of the finite mixture model, we can classify each subject into the type she most likely belongs to, given her choices and the estimated parameters \(\hat{\Psi }\). To do so, we apply Bayes’ rule and obtain subject i’s individual ex-post probabilities of type-membership:

$$\begin{aligned} \tau _{iM} = \frac{\hat{\pi }_{M}\, f_{M} (C_i;\;\hat{\theta }_{M}, \hat{\sigma }_{M})}{\sum _{m \in \mathcal {M}}{\hat{\pi }_{m}\, f_{m} (C_i;\;\hat{\theta }_{m}, \hat{\sigma }_{m})}}\,. \end{aligned}$$
(7)

Based on these individual ex-post probabilities of type-membership, we can also assess the ambiguity in the classification of subjects into types. If the finite mixture model classifies subjects cleanly into types, most \(\tau _{iM}\) should be either close to 0 or to 1. In contrast, if the finite mixture model fails to come up with a clean classification of subjects into distinct types, many \(\tau _{iM}\) will be in the vicinity of 1/3.

5.1.3 Specification of functional forms

To keep the model parsimonious and yet flexible in fitting the data, we specify the following functional forms. In all three decision models, we use a power specification for the utility function v, i.e.,

$$\begin{aligned} v(x) = \left\{ \begin{array}{ll} \frac{x^{1-\beta }}{1-\beta } &{} \text{ for }\;\beta \ne 1 \\ \ln x &{} \text{ for }\;\beta = 1 \end{array} \right. \,, \end{aligned}$$

which has a convenient interpretation, since \(\beta\) measures v’s concavity. Specifying the same utility function across all three models puts them on equal footing regarding their flexibility to encompass potential non-linearities in utility. Moreover, this specification turned out to be a neat compromise between parsimony and goodness of fit (Stott, 2006). In CPT, we follow the proposal by Prelec (1998) and specify the probability weighting function as:

$$\begin{aligned} w(p) = \exp (-(-\ln (p))^\alpha ) \,, \end{aligned}$$

where \(0<\alpha\) measures likelihood sensitivity and reflects the shape of the probability weighting function. When \(\alpha = 1\), w is linear in probabilities. When \(\alpha\) gets closer to zero, w becomes more inversely S-shaped. When \(\alpha\) gets larger than one, w becomes more S-shaped. This specification of the probability weighting function satisfies the three properties discussed in Sect. 2.2. We also tested the two-parameter version of Prelec’s probability weighting function. However, as the second parameter measuring the function’s net index of convexity is estimated to be almost 1, results remain virtually unchanged (see Appendix G). Hence, we opt for the one-parameter version to keep the total number of parameters the same for CPT and ST. In ST, the decision weights depend on the degree of local thinking \(0 < \delta \le 1\) which we estimate based on the functional form given in Eq. (2). In all binary choices we use for triggering Allais Paradoxes, the salience ranking of the states of the world is fully determined by ordering, diminishing sensitivity, symmetry, and zero contrast (Sect. 1 of the Online Appendix shows this for every binary choice we use). Hence, we do not need to specify a particular salience function.

5.2 Monte Carlo simulations

5.2.1 Parameter recovery and discriminatory power

We conduct a series of Monte Carlo Simulations to assess the structural model’s power to recover a wide range of parameters and discriminate between the three preference types. The Monte Carlo Simulations also allow us to test the structural model’s robustness against potential serial correlation in the subjects’ errors. In these simulations, we impose a vector of true parameters which we use to simulate the subjects’ choices in each type. Subsequently, we try to recover these true parameters and the subjects’ individual type-membership by estimating the structural model on the simulated choices. This allows us to calculate the potential bias in the estimated parameters, their overall precision in terms of Mean Squared Errors, and the fraction of correctly classified subjects. Each simulation is based on 1,000 simulation runs. Section 4 of the Online Appendix discusses the set-up and the results of these Monte Carlo Simulations in detail.

The simulations reveal that the structural model’s power to recover a wide range of parameters and discriminate between the different types is remarkably high. In particular, the structural model provides unbiased and precise estimates even in a situation where discriminating between EUT-, CPT-, and ST-types is extremely hard – that is, when the simulated subjects have an identical utility function and the same choice sensitivity across the three types and only differ slightly in their degrees of likelihood sensitivity and local thinking. In such a situation, the structural model still classifies the vast majority of simulated subjects into the correct type. We suppose that the model’s ability to detect even slight behavioral differences between the types is mainly because, instead of estimating at the individual level, the model efficiently exploits the choices of all subjects simultaneously. This allows it to recover the true parameters with high precision as individual noise averages out.Footnote 17

5.2.2 Robustness against serially correlated errors

Moreover, the simulations confirm the structural model’s robustness against potential serial correlation in the subjects’ errors. This robustness is an important feature, since we cannot rule out that the errors subjects make when evaluating lotteries are serially correlated across the binary choices. However, the simulations reveal that the estimated parameters remain unbiased and precise, even if the subjects’ errors, \(\epsilon _X\) and \(\epsilon _Y\), follow an AR(1) process with a high degree of serial correlation of \(\rho =0.6\). Also, the vast majority of subjects remains classified in the correct type. The reason for this robustness against serially correlated errors is likely because we randomized the order of choices across subjects. Since the finite mixture model exploits the choices of all subjects simultaneously, their random order causes the impact of serial correlation to average out. Furthermore, to ensure that inferences about the estimated parameters are valid even with serially correlated errors, we report cluster-robust standard errors.

5.2.3 Robustness against heterogeneity within types

Finally, the simulations reveal that the structural model is robust against heterogeneity within types. This robustness is important as there is likely heterogeneity not only between but also within the three types. The simulations reveal that the estimated parameters remain representative for the average preferences in each type when the subjects’ individual parameters – \(\beta _i\) and \(\alpha _i\) or \(\delta _i\), respectively – are heterogeneous and drawn from a uniform distribution within each type. The intuition is that the structural model discriminates between the types primarily based on the frequency of Allais Paradoxes and estimates parameters that are representative for each type. Thus, random individual deviations within a type form these representative parameters cancel each other out.

In sum, the simulations highlight that the experimental choices contain rich information about subjects’ risk preferences and type-membership. By taking the choices of all subjects simultaneously into account, the structural model exploits this information efficiently and is robust against potential serial correlation in the errors and heterogeneity within the types.

5.3 Structural model results

We now present and interpret the results of the structural model. When classifying subjects into types using their ex-post probabilities of type-membership, we obtain a clean classification of subjects into 80 EUT-types, 108 CPT-types, and 95 ST-types. Most of the ex-post probabilities of individual type-membership are either close to 0 or 1, confirming that almost all subjects can be unambiguously classified into one of these three types. Figure 6 shows histograms with the ex-post probabilities of type-membership.

Table 2 exhibits the type-specific parameter estimates of the finite mixture model. The results show that there is substantial heterogeneity in subjects’ risk preferences. The choices of 28.4% of subjects are best described by EUT, the choices of 37.9% by CPT, and those of the remaining 33.7% by ST. This classification confirms Result 1 obtained non-parametrically at the aggregate level. The majority of subjects is best described by either CPT or ST, while – consistent with previous evidence (Bruhin et al., 2010; Conte et al., 2011) – only a minority is best described by EUT.

Fig. 6
figure 6

Distribution of ex-post probabilities of type-membership. Each panel in the figure shows the distribution over all subjects of the individual ex-post probability, \(\tau _{iM}\), of belonging to the corresponding type \(M \in \{EUT, CPT, ST\}\) (see Eq. (7)). The resulting classification of subjects into types is clean as for nearly all subjects these ex-post probabilities of type-membership are either close to 0 or 1

On average, the 80 EUT-types display an almost linear utility function which makes them essentially risk neutral.Footnote 18 Although the estimated concavity of \(\hat{\beta } = 0.080\) is statistically significant, it is negligible in economic magnitude. Moreover, among the three types, the EUT-types exhibit the highest level of decision noise which translates into a relatively low estimated choice sensitivity.

Table 2 Type-specific parameter estimates of the finite mixture model

The 108 CPT-types exhibit, on average, a concave utility function with \(\hat{\beta } = 0.572\) and a strongly inverse S-shaped probability weighting function with \(\hat{\alpha } = 0.469\). This confirms that the CPT-types’ choices are strongly influenced by probability weighting. With these parameter estimates, the average CPT-type displays the Common Consequence Allais Paradox discussed in the motivating example in Sect. 2.

The 95 ST-types display, on average, a strongly concave utility function with \(\hat{\beta } = 0.870\) and a seemingly low but statistically significant degree of local thinking corresponding to \(\hat{\delta } = 0.924\). Note that, although the average ST-type’s degree of local thinking appears to be low, she still exhibits the Common Consequence Allais Paradox discussed in the motivating example in Sect. 2. The reason is that with a strongly concave utility function, even a low degree of local thinking is sufficient to generate the Common Consequence Allais Paradox.Footnote 19

Next, we analyze how well the structural model fits the subjects’ choices compared to aggregate models that neglect any heterogeneity and assume a representative decision maker. The estimation results of the aggregate models can be found in Appendix G. The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) both indicate that the structural model fits the subjects’ choices considerably better than any of the aggregate models. Moreover, when we use the subjects’ classification into types and the type-specific parameter estimates to predict choices in-sample, the structural model yields 75.0% correct predictions overall, that is, 75.0% of the predicted choices coincide with the subjects’ empirical choices. This more than the 50% correct predictions we would expect from purely random behavior and – not surprisingly due to the finite mixture model’s higher number of parameters – also more than any of the aggregate models. The percentage of correctly predicted choices per type confirms that the ST- and CPT-types’ choices are less noisy than the EUT-types’ choices: while the model correctly predicts 83.0% of the ST-types’ and 75.9% of the CPT-types’ choices, the percentage of correctly predicted choices is just 64.1% for the EUT-types. In sum, the analysis reveals that the structural model’s parsimonious way of taking heterogeneity into account leads to a superior fit compared to the aggregate models.Footnote 20

Finally, we also analyze how the structural model’s fit compares to models with only two types – i.e., finite mixture models (i) with only EUT- and CPT-types, (ii) with only EUT- and ST-types, and (iii) with only CPT- and ST-types. Detailed results are available in Sect. 6 of the Online Appendix. Overall, the structural model with three types fits the data much better than each of the models with two types. Two results from the models with two types stand out. First, in the model with only EUT- and CPT-types, the relative sizes of the types and their parameters are in line with previous studies discriminating between EUT- and CPT-types (Bruhin et al., 2010; Conte et al., 2011). Second, in the two models featuring ST-types, the share of ST-types is 44% in the model mixing them with EUT-types and 38% in the model mixing them with CPT-types. Moreover – as in the model with three types – the ST-types’ degree of local thinking is mild with an estimated \(\delta\) above 0.9 and their utility function is strongly concave with an estimated \(\beta\) above 0.8.

Overall, the structural estimations and the subjects’ type-specific behavior yield our second main result.

Result 2

There is vast heterogeneity in the subjects’ risk preferences. Subjects are parsimoniously classified into 28% EUT-types, 38% CPT-types, and 34% ST-types according to the decision theory best describing their behavior.

Finally, we carried out robustness checks to ensure that this result does not depend on a particular specification of our model. They reveal that the result remains virtually unchanged if we use a Fechner-type error directly affecting subjects’ choices instead of the random utility approach. However, the random utility approach yields a superior fit of the structural model. Similarly, modeling choice set dependence with ST yields a superior fit compared to modeling it with RT, the other major choice set dependent theory.Footnote 21

5.4 Predictions of preference reversals

Next, we assess how well the structural model predicts preference reversals in the additional part of the experiment (see Sect. 3.2). These out-of-sample predictions across choice contexts represent a particularly stringent test of the structural model for two reasons. First, the model needs to predict behavior in choices that differ in payoffs and probabilities from the ones it was estimated on. Second, the model also needs to predict behavior across choice contexts, i.e., in choices that trigger preference reversals instead of Allais Paradoxes.

We start by describing and interpreting the net frequencies of preference reversals for each of the three types. Subsequently, we use the structural model’s random utility approach to make quantitative predictions about these net frequencies. Comparing the empirical and the predicted net frequencies of preference reversals will reveal the aspects of behavior the structural model predicts well and potential other aspects which the model does not capture. Such other aspects of behavior which our structural model does not capture would be particularly interesting, as they may provide hints about the instability of certain preference components across choice contexts.

5.4.1 Net frequencies of preference reversals across types

Figure 7 shows the average net frequency of preference reversals for each of the three preference types, i.e., the difference in the frequencies of preference reversals in the expected and in the inverse direction. It reveals that with a net frequency of 43.3% the ST-types exhibit substantially more preference reversals than the EUT- and the CPT-types whose net frequencies are just 32.6% and 25.8%, respectively. Moreover, the EUT- and CPT-types net frequencies of preference reversals are not significantly different. This evidence is in line with our expectation that choice set dependence mainly drives the ST-types’ choices and generates their preference reversals. However, the positive net frequencies of preference reversals of the EUT- and CPT-types indicate that choice set dependence might influence their choices too, although to a lesser extent than the ST-types’ choices.

Fig. 7
figure 7

Net frequency of preference reversals by type. The figure shows the net frequency of preference reversals by type for the choices of the additional part of the experiment (see Sect. 3.2). Net frequency of preference reversals refers to the difference in the relative frequencies of preference reversals in the expected and the inverse directions. The black dots indicate the predicted net frequencies for each type based on the structural model’s random utility approach (see Sect. 5.4.2). The white circles represent the predictions adjusted by the estimated intercept (0.274) as shown in the first column of Table 3. The numbers in parentheses indicate the number of subjects in each of the three types. 34 of the 283 subjects (12.0%) are excluded from the analysis because they exhibit more than one switch-point in at least one of the choice menus used for eliciting the certainty equivalents. Exhibiting more than one switch-point is independent of type-membership (\(\chi ^2\)-test of independence: p-value \(=\) 0.534)

5.4.2 Quantitative predictions

We now use the structural model’s random utility approach to make quantitative predictions about the frequency of preference reversals for each preference type. We start by predicting the probability that a subject belonging to type M with estimated parameters \(\hat{\theta }_M\) and \(\hat{\sigma }_M\) indicates a higher certainty equivalent for lottery \(\tilde{X}\) than for lottery \(\tilde{Y}\) when she evaluates the two lotteries separately in a choice menu (see Fig. 2). First, we predict for each of the two lotteries \(\tilde{L} \in \{\tilde{X}, \tilde{Y}\}\) and for each of the 21 rows r of the corresponding choice menu the probability that the subject prefers the lottery over the sure amount \(z_{r}\), \(z_{1}< \ldots < z_{21}\):

$$\begin{aligned} \hat{Pr}[V^M(\tilde{L})>v(z_r)] = \frac{\exp [\hat{\sigma }_M \, V^M(\tilde{L}, \hat{\theta }_M)]}{\exp [\hat{\sigma }_M \, V^M(\tilde{L}, \hat{\theta }_M)] + \exp [\hat{\sigma }_M \, V^M(z_{r}, \hat{\theta }_M)]}\,. \end{aligned}$$

Second, by assuming a unique switch-point, we use these predicted probabilities to infer the probability distribution over the \(k \in \{1, \ldots , 20\}\) possible certainty equivalents for each lottery.Footnote 22 The predicted probability that the certainty equivalent \(CE(\tilde{L}) = (z_{k} + z_{k+1}) / 2 \equiv \bar{z}_k\), corresponds to:

$$\begin{aligned} \tilde{Pr}[CE(\tilde{L})=\bar{z}_k] &= \prod _{r=1}^{k} \hat{Pr}[V^M(\tilde{L})>v(z_r)] \\&\times \prod _{r=k+1}^{21} \left( 1-\hat{Pr}[V^M(\tilde{L})>v(z_r)]\right) \, . \end{aligned}$$

Since we assume a unique switch-point, we need to normalize these predicted probabilities to \(\hat{Pr}[CE(\tilde{L})=\bar{z}_k] = \tilde{Pr}[CE(\tilde{L})=\bar{z}_k]/\sum _{m=1}^{20}\tilde{Pr}[CE(\tilde{L})=\bar{z}_m]\) to obtain a proper probability distribution which sums up to one. Third, by combining the probability distributions over the possible certainty equivalents of the two lotteries, we obtain the joint probability distribution over the \(20 \times 20 = 400\) states in which either the certainty equivalent of lottery \(\tilde{X}\) or the one of lottery \(\tilde{Y}\) is higher. Knowing this joint probability distribution allows us to predict the probability that the subject indicates a higher certainty equivalent for \(\tilde{X}\) than \(\tilde{Y}\). Subsequently, we evaluate Eq. (5) to predict the probability that the subject will choose \(\tilde{X}\) over \(\tilde{Y}\) in the pairwise choice. By applying this procedure to all 6 choices of the additional part (see Appendix C), we can predict the type-specific frequencies of preference reversals in the expected and inverse directions using the structural model and the estimated parameters \(\hat{\theta }_M\) and \(\hat{\sigma }_M\). Finally, the predicted net frequency of preference reversals is the difference between the predicted frequency of preference reversals in the expected direction and those in the inverse direction.

Table 3 compares the empirical to the predicted net frequencies of preference reversals using OLS regressions. We start by interpreting the estimated coefficients of the regression in the first column, which uses the type-specific parameters \(\hat{\theta }_M\) and \(\hat{\sigma }_M\) to predict the subjects’ net frequencies of preference reversals. The coefficient on the predicted net frequencies of preference reversals is 0.907 and not significantly different from one. This indicates that the structural model captures the behavioral differences between the types remarkably well as, on average, a given change in the predicted frequencies of preference reversals translates nearly one to one into a change in the corresponding empirical frequencies. In other words, the structural model predicts the average differences in the empirical net frequencies of preference reversals between the three types almost perfectly.

Table 3 OLS regressions of empirical on predicted net frequencies of preference reversals

However, the estimated intercept is positive, revealing that the structural model consistently underestimates the frequency of preference reversals across all three types by 27.4 percentage points. This can be seen when we visualize the type-specific predictions in Fig. 7: as indicated by the black dots, the predicted net frequencies of preference reversals are consistently too low across all three types. Moreover, as indicated by the white circles, when we adjust the predicted net frequencies by the estimated intercept of 0.274, they all match the empirical frequencies almost perfectly and fall well within the 95% confidence intervals.

This evidence suggests not only that choice set dependence plays a role across all three types but also that its influence is stronger in the choices of the additional part than in the choices of the main part. We hypothesize that this could be because the influence of choice set dependence may be shaped by the choice context. More specifically, in the additional part, subjects fill out choice menus that always offer choices between a lottery with two payoffs and a series of sure amounts. This specific choice context may shift the subjects’ focus of attention towards differences in payoffs and, thus, may inflate the role of choice set dependence. In contrast, in main part, subjects always face binary choices between two lotteries with up to three payoffs. This choice context may shift the subjects’ focus of attention towards differences in probabilities and, thus, may dampen the influence of choice set dependence. Overall, the evidence gained from the out-of-sample predictions suggests that exploring how the choice context shapes the role of choice set dependence is an important avenue for future research.

Next, we interpret the fraction of the variance in the empirical net frequencies of preference reversals which the structural model manages to predict. At first glance, the fraction of the predicted variance is disappointingly low with an R\(^2\) of just 0.054. The low R\(^2\) indicates that the structural model is not well suited for predicting individual net frequencies of preference reversals since there is apparently a considerable amount of heterogeneity within each of the three preference types. However, as discussed above, the structural model predicts the average differences in the net frequencies of preference reversals between the types remarkably well and, in fact, the R\(^2\) of the corresponding regression amounts to 0.936.

Finally, we investigate whether the heterogeneity within each of the three preference types results from systematic differences in individual preferences or rather from noise. To do so, we take the finite mixture model’s classification of subjects into types and estimate the parameters of the corresponding decision models separately for each subject i. This yields a distinct set of parameter estimates for every subject, \(\hat{\theta }_{Mi}\) and \(\hat{\sigma }_{Mi}\), which we use for predicting individual-specific net frequencies of preference reversals. If the heterogeneity within the types results mainly from systematic differences in individual preferences, the predictions based on the individual-specific estimates would pick up these differences and, thus, would exhibit a superior out-of-sample performance than the predictions based on the type-specific estimates. In contrast, if the heterogeneity within the types results mainly from noise, which randomly changes across the two parts of the experiment, the predictions based on the individual-specific estimates would pick up this random noise and, thus, their out-of-sample performance would fall short of the predictions based on the type-specific estimates.

The second column of Table 3 reveals that the performance of the predictions based on the individual-specific estimates falls short of those based on the more parsimonious type-specific estimates in all relevant dimensions. First, the estimated coefficient of the predictions based on the individual-specific estimates (0.149) is far below one, indicating that they severely underestimate differences in the net frequencies of preference reversals across types. With an intercept of 0.297, they also consistently underestimate the level of preference reversals. Second, and even more striking, the predictions based on the individual-specific estimates explain a fraction of just R\(^2=\) 0.025 of the variance in the empirical net frequencies of preference reversals – much less than the predictions based on the more parsimonious type-specific estimates. They are also worse at predicting the average differences across types as the corresponding R\(^2\) is just 0.122. Overall, these results indicate not only that the individual heterogeneity within the preference types primarily results from noise but also that, despite their parsimony, the structural model’s type-specific estimates pick up most of the relevant heterogeneity across the types.

The analysis of the structural model’s power to predict preference reversals yields the third main result.

Result 3

The structural model has power to predict type-specific behavioral differences across choice contexts.

  1. 1.

    Subjects classified as ST-types exhibit significantly more preference reversals than the other subjects.

  2. 2.

    The structural model accurately predicts the quantitative differences in the average frequencies of preference reversals across types.

  3. 3.

    Due to their parsimony, the structural model’s type-specific parameter estimates outperform noisy individual estimates.

Furthermore, the predictions of preference reversals suggest that the choice context shapes the relative importance of choice set dependence.

6 Conclusion

The paper assesses the relative importance of probability weighting and choice set dependence both non-parametrically and with a structural model. This represents one of the first joint tests of the two most descriptive behavioral theories of choice under risk.

There are three main results. First, for aggregate choices, both choice set dependence and probability weighting matter. This result does not rely on specific functional forms and is robust across the three versions of the Allais Paradox as well as across the two presentation formats. Second, there is substantial heterogeneity in risk preferences which can be parsimoniously characterized by three types: 38% CPT-types, 34% ST-types, and 28% EUT-types. Finally, this classification of subjects is valid in a different choice context, as the subjects classified as ST-types exhibit significantly more preference reversals than their peers.

These results are directly relevant for the literature that aims at identifying the main behavioral drivers of risky choices. Knowing about the relative importance of probability weighting and choice set dependence could inspire new decision theories taking both concepts into account and lead to better predictions in various domains of risk taking behavior, such as investment, asset pricing, insurance, and health behavior.

The conclusions also open up avenues for future research. First, our methodology could be used to study how the relative importance of probability weighting and choice set dependence varies with educational background, cognitive ability, and other socio economic characteristics in the general population. Studying this variation in the general population could lead to new explanations for the observed variation in socio-economic outcomes as the different types may fall prey to distinct behavioral traps during their lives. Second, to improve the decision models’ predictive power, it would be important to explore how the choice context shapes the role of choice set dependence. Knowing how the choice context influences the role of choice set dependence could lead to more accurate predictions in other domains such as consumer, investor, and judicial choice.