1 Introduction

The St. Petersburg’s Paradox challenged the notion that decision makers should choose the risky alternative that maximizes expected value (Bernoulli, 1738). In the nearly four centuries since Bernoulli, expected value as a descriptive theory of decision-making under risk has advanced in several important respects. A family of modern descriptive theories of decision under risk depart from expected value maximization in three essential ways: the transformation of outcomes, the transformation of probabilities, and the composition rule that combines the two transformations. Prospect theory (Kahneman and Tversky 1979; Tversky and Kahneman 1992) has emerged as the frontrunner of these descriptive theories (see reviews of empirical evidence in Barberis (2013); Camerer (1992, 1995); Dhami (2016); Fox et al. (2015); Starmer (2000); Wu et al. (2004); for alternative viewpoints, see Birnbaum 2004, 2006, and Birnbaum 2008). In prospect theory, outcomes are transformed by an S-shaped value function, typically concave for gains, convex for losses, and steeper for losses than gains, whereas probabilities are transformed by an inverse S-shaped probability weighting function, most commonly concave for small probabilities and convex for medium and large probabilities (see also Edwards 1954; Preston and Baratta 1948).

A substantial empirical and theoretical literature has generally supported these two aspects of prospect theory. In contrast, the third piece, the composition rule, has received relatively little direct empirical attention (a review of relevant work appears later in this paper). Moreover, the choice of the composition rule is not trivial: original prospect theory (OPT; Kahneman and Tversky 1979) and its descendent, cumulative prospect theory (CPT; Tversky and Kahneman 1992) differ in how these two transformation functions are combined, and, consequently, in how gambles with three or more outcomes are valued.

This paper investigates the empirical merits and shortcomings of the two representations. We perform a test that exploits the following observation: CPT and OPT coincide for two-outcome gambles but differ for gambles with three or more outcomes. This linkage allows us to perform a novel test—for each of our experimental participants, we estimate prospect theory parameters for two-outcome gambles, a domain where the two models concur. We then apply these estimates to three-outcome gambles, a domain where the models diverge. As a result, we can test whether CPT and/or OPT fit three-outcome gamble data well and document the nature of systematic discrepancies between actual and predicted cash equivalents.

The paper proceeds as follows. We first present the two models and review previous empirical research. We then review one of our previous studies (Gonzalez and Wu 1999) in which we estimate the probability weighting function on two-outcome gambles (where the two models coincide). We supplement this analysis with previously unreported data from 37 participants. We then use three-outcome gambles as a holdout sample to test how the two models perform. At the aggregate level, both models perform well, but are biased in a predictable manner: CPT slightly under-predicts and OPT slightly over-predicts the cash equivalents of three-outcome gambles. This pattern, however, masks the considerable heterogeneity across types of gambles and across participants. A large part of this heterogeneity can be linked to differences in the curvature and elevation of the probability weighting function across individuals. The two models also perform differently depending on whether the worst outcome is zero or non-zero. Finally, we conclude with thoughts on how this analysis offers new insights about psychological processes underlying choice among complex gambles, as well as recommendations on which prospect theory to use for applications.

2 Prospect theories

2.1 Preliminaries

In this section, we review original prospect theory (OPT, Kahneman and Tversky 1979) and cumulative prospect theory (CPT; Tversky and Kahneman 1992; see also Luce and Fishburn 1991; Starmer and Sugden 1989; Wakker 1994, 2010; Wakker and Tversky 1993). Let \((p,x;q,y;1-p-q,z)\) denote a prospect that gives p chance at x, q chance at y, and \(1-p-q\) chance at z, where \(x>y\ge z\ge 0\).Footnote 1 We first present Kahneman & Tversky’s (1979) original formulation of OPT for two-outcome gambles, where \(z=0\) and \(p+q=1\), where the gamble is denoted (\(p,x;1-p,y\)) for short. The function \(V_O\) on the reals represents the preference between two gambles (i.e., \(\succeq \)), with the subscript O denoting “original”

$$\begin{aligned} (p,x; 1-p,y)\succeq & {} (p^{\prime},x^{\prime};1-p^{\prime},y^{\prime}) \nonumber \\\Longleftrightarrow & {} \nonumber \\ \left( \begin{aligned} V_O(p,x;1-p,y)\\ = \\ \pi (p)v(x)+[1-\pi (p)]v(y) \end{aligned} \right)\ge & {} \left( \begin{aligned} V_O(p^{\prime},x^{\prime};1-p^{\prime},y^{\prime})\\ = \\ \pi (p^{\prime})v(x^{\prime})+[1-\pi (p^{\prime})]v(y^{\prime}) \end{aligned} \right) , \end{aligned}$$
(1)

where \(\pi :[0,1]\rightarrow [0,1]\) is a probability weighting function; \(v(\cdot )\) is a value function defined with respect to a reference point, with \(v(0)=0\); \(V_O(\cdot )\) is the OPT functional. OPT requires that the decision weights attached to outcomes x and y be \(\pi (p)\) and \(1-\pi (p)\), and therefore, decision weights sum to 1 in this case. It is important to note that Representation (1) applies to two-outcome gambles in which \(y=0\) and \(y>0\). Below, we show that OPT’s representation for two-outcome gambles is identical to CPT.

We next turn to three-outcome gambles in which \(z=0\) and \(p+q<1\)

$$\begin{aligned} V_O(p,x;q,y;1-p-q,0)=\pi (p)v(x)+\pi (q)v(y). \end{aligned}$$
(2)

In this case, the decision weights attached to x and y are \(\pi (p)\) and \(\pi (q)\), respectively, with the weight attached to outcome 0 not requiring specification, because \(v(0)\,=\,0\), and, therefore, the weight becomes irrelevant.

Whereas, Kahneman and Tversky (1979) only specified OPT for gambles with two non-zero outcomes, others have extended OPT in Representation (2) to the case of \(z > 0\) (see, e.g., Camerer and Ho 1994; Fennema and Wakker 1997):Footnote 2

$$\begin{aligned} V_O(p,x;q,y;1-p-q,z)=\pi (p)v(x)+\pi (q)v(y)+\pi (1-p+q)v(z). \end{aligned}$$
(3)

Both representations in (2) and (3) have been criticized, primarily because they permit violations of stochastic dominance (Fishburn 1978; Luce 2000; see also, Wakker 2010, Section 5.3).

Rank-dependent utility (RDU), proposed by Quiggin (1982; see also, Green & Jullien, 1988; Quiggin 1993; Segal 1989; Wakker 1989; Yaari 1987), avoided this problem. In turn, Tversky and Kahneman (1992) adopted RDU in their formulation of cumulative prospect theory. CPT consists of two separate RDU representations for losses and gains. Under this model, the value of \((p,x;q,y;1-p-q,z)\) is given by

$$\begin{aligned} V_C(p,x;q,y;1-p-q,z)=\pi (p)v(x)+[\pi (p+q)-\pi (p)]v(y)+[1-\pi (p+q)]v(z), \end{aligned}$$
(4)

where \(V_C(\cdot )\) is the CPT functional. In contrast to OPT, the decision weight for an outcome depends not only on the probability of that outcome, but also on the rank position of that outcome relative to other outcomes in the gamble. Stochastic dominance violations are prohibited under the CPT representation in (4).Footnote 3 Moreover, RDU, and hence CPT, also generalizes easily and naturally to an arbitrary number of outcomes, as well as from the domain of decision under risk to decision to uncertainty.

2.2 Psychological differences between OPT and CPT

Fennema & Wakker (1997; also Diecidue and Wakker 2001) noted that CPT was not merely a technical improvement over OPT that eliminated violations of stochastic dominance and generalized to an arbitrary number of outcomes. CPT and OPT offer different empirical content. Whereas OPT and CPT coincide for two-outcome prospects [Representations (1) and ( 4)]; the two models differ in how prospects with three or more outcomes are valued.

Which of the two model values complex gambles most accurately? To answer this question, we first ask: when is the cash equivalent under OPT higher than under CPT, and when is the opposite true? Before we derive some predictions, we consider a typical probability weighting function, shown in Fig. 1. The weighting function is inverse S-shaped, concave for small probabilities, and convex for large probabilities (Wu and Gonzalez 1996), and exhibits lower subadditivity, \(\pi (p)>\pi (p+q)-\pi (q)\) for \(p+q<1\), as well as upper subadditivity, \(1-\pi (1-p) > \pi (1-q)-\pi (1-q-p)\) for \(1-q-p>0\) (Tversky and Wakker 1995). Lower and upper subadditivity hold empirically both at the level of aggregate data (Tversky and Fox 1995; see also, Abdellaoui 2000; Bleichrodt and Pinto 2000; Fox and Tversky 1998; Gonzalez and Wu 1999) and at the level of the individual subject (Abdellaoui 2000; Bleichrodt and Pinto 2000; Gonzalez and Wu 1999). Informally, lower subadditivity captures the boundary effect near zero, often called the possibility effect, while upper subadditivity captures the boundary effect near 1, often called the certainty effect. In addition, the probability weighting function intersects the identity line below \(p=0.5\), a property called subcertainty (Kahneman and Tversky 1979; Tversky and Kahneman 1992), which can be interpreted as consistent with the certainty effect being larger than the possibility effect (Tversky and Fox 1995), although there is substantial heterogeneity (e.g., Bruhin et al. 2010; Fehr-Duda et al. 2006; Gonzalez and Wu 1999; Murphy and ten Brincke 2018).

Fig. 1
figure 1

A typical probability weighting function, \(\pi (\cdot )\). The weighting function is inverse S-shaped, concave for small probabilities, and convex for medium to large probabilities, and intersects the identity line below \(\frac{1}{2}\)

These typical properties of the weighting function, in turn, help us understand the ordering of differences in how CPT and OPT value three-outcome gambles. We start with \(z=0\). It is straightforward to show that OPT overvalues a gamble relative to CPT if \(\pi (p+q)<\pi (p)+\pi (q)\) for \(p+q<1\), or if the probability weighting function exhibits lower subadditivity.

The answer is more complex for \(z>0\). Comparing the OPT and CPT Representations, (3) and (4), respectively, yields \(V_O(p,x;q,y;1-p-q,z)>V_C(p,x;q,y;1-p-q,z)\) if \(\left[ \pi (p)+\pi (q)-\pi (p+q) \right] v(y) > \left[ 1-\pi (p+q) - \pi (1-p-q) \right] v(z)\). The standard qualitative restrictions on the weighting function do not indicate when this inequality holds, because \(\pi (p)+\pi (q)-\pi (p+q)>0\) if the weighting function is lower subadditive, but \(\pi (p+q)+\pi (1-p-q)<1\) if the weighting function is subcertain (i.e., \(\pi (p)+\pi (1-p)<1\)). However, it appears that, for typically observed weighting functions, OPT overvalues gambles with a positive worst outcome relative to CPT. We show this by considering the special case of the neo-additive probability weighting function or piecewise linear weighting function, \(\pi (p)=\alpha +(1-\alpha -\beta )p\), used by Kothiyal et al. (2014) and Tversky and Fox (1995), among others. For this function, \(\alpha \) is a measure of lower subadditivity (i.e., \(\pi (\epsilon ) \approx \alpha \) for \(\epsilon \) near 0), with \(\beta \) a measure of upper subadditivity (i.e., \(\pi (1-\epsilon ) \approx 1-\beta \)). Tversky and Fox (1995, Table 4) found that \(\beta \approx 2\alpha \), which is sufficient to produce overvaluation of OPT relative to CPT, because \(V_O(p,x;q,y;1-p-q,z) > V_C(p,x;q,y;1-p-q,z)\) if \(\alpha v(y)> (\beta -\alpha )v(z)\), or, since \(v(y)>v(z)\), if \(\beta \le 2 \alpha \).

Thus, commonly observed empirical properties of the weighting function indicate that OPT will typically overvalue a three-outcome gamble relative to CPT. It is still, however, unclear which of the two models is more appropriate for modeling gambles with more than two outcomes. To begin answering this question, we note that the two composition rules differ in an important psychological sense. Consider, for example, the gamble \((p,x;p,y;1-2p,0)\). OPT gives no precedence to the highest or second highest outcome, i.e., the decision weights for the outcomes are identical: \(\pi (p)\). In contrast, under CPT and a lower subadditive \(\pi (\cdot )\), the second highest outcome is “marginal,” or receives less weight, than the highest outcome, i.e., \(\pi (2p)-\pi (p)\) versus \(\pi (p)\). Thus, CPT generalizes the idea of diminishing sensitivity from merely being a distortion of probabilities (a property of the weighting function, see Fig. 1) to also affecting the weight attached to outcomes (a property of decision weights, Fennema and Wakker 1997). This highlights the interrelation between the choice of the weighting function and the choice of the composition rule. In other words, under CPT with probabilities held constant, extreme outcomes (highest and lowest) receive more weight relative to intermediate outcomes.

2.3 Empirical findings

Almost all of the standard violations of expected utility are consistent with both OPT and CPT. The common-ratio effect (Kahneman and Tversky 1979) involves only one non-zero outcome, and thus is explained identically by both models (Prelec 1998). The original common-consequence effect (Kahneman and Tversky 1979) involves at three outcomes and, therefore, may lead to different predictions (Wu and Gonzalez 1998). However, the common-consequence form of the Allais Paradox and more generalized common-consequence effect violations can be explained by OPT and CPT equally well (Wu and Gonzalez 1996).Footnote 4

Goodness-of-fit tests paint a mixed picture. Wu and Gonzalez (1996) fit binary choice data collected to test common-consequence effects using parametric specifications for \(\pi (\cdot )\) and \(v(\cdot )\). OPT fits the aggregate data slightly better, but is outperformed by CPT on 3 of 5 ladders. Camerer and Ho (1994) fit OPT and CPT to several datasets and found that OPT fits the data slightly better in maximum likelihood tests. Of course, estimation and goodness-of-fit statistics confound tests of models with assumptions about parametric forms of \(\pi (\cdot )\) and \(v(\cdot )\) (e.g., Stott 2006). Fennema and Wakker (1997) found that CPT explained the data of Lopes (1993) better than OPT. Wakker (2003) showed that the mixed gamble of data of Levy and Levy (2002) could be explained by CPT but not OPT.

Several studies have employed direct tests. Starmer and Sugden (1993) documented that decision weights change depending on whether that outcome is received in a single state or multiple states, an effect inconsistent with rank-dependent models (see also Birnbaum 2008; Humphrey 1995). Wu (1994) documented violations of an axiom known as ordinal independence (Green & Jullien, 1998), thus providing one of the first counter-examples to RDU models. Diecidue et al. (2007) measured decision weights and provided evidence for rank dependence in the domain of decision-making under uncertainty. Wu et al. (2005) developed a critical test of the two models based on the probability trade-off consistency of Abdellaoui (2002). They observed that choices were consistent with OPT, but not CPT, for gambles that do not involve a certainty effect, but consistent with both models for gambles involving a certainty effect. Most recently, Bernheim and Sprenger (2020) found prices elicited from an “equalizing reductions” elicitation task were insensitive to payoff rank, a finding inconsistent with rank-dependent models (however, see, Abdellaoui et al. 2020).

3 Empirical study

In this section, we present a study designed to test how OPT and CPT predict three-outcome gamble cash equivalents. As we previously noted, OPT and CPT are identical for two-outcome gambles [see OPT and CPT representations (1) and (4), respectively]. We thus adopt the strategy of fitting parametric forms of the value and weighting function to two-outcome gamble cash equivalents. We then use these estimates to predict cash equivalents for three-outcome gambles. The first step of this procedure is presented in detail in Gonzalez and Wu (1999), so we provide only a brief summary here.

3.1 Method

We report data from 47 participants, all of whom followed an identical procedure. Data from the first 10 participants were reported in Gonzalez and Wu (1999). Analyses of the additional 37 participants have not been previously published. Each participant was paid $50 for participating in 4 1-h sessions, plus an additional fee ($50 on average) received for playing out one of their choices using an incentive compatible elicitation procedure (Becker et al. 1964). Cash equivalents were elicited via price lists, using a computer program similar to that used in Tversky and Kahneman (1992); details of this procedure are found in Gonzalez and Wu (1999).

The design consisted of 165 two-outcome gambles: 15 outcome levels crossed with 11 levels of probability associated with the maximum outcome (see Gonzalez and Wu 1999). Because all gambles offered non-negative outcomes, prospect theory codes all outcomes as gains. In addition to the two-outcome gambles, there were 22 gambles that had two non-zero outcomes and one zero outcome, and there were 15 gambles with three non-zero outcomes (see Table 2). Finally, 9 of the 165 two-outcome gambles were repeated.Footnote 5 Thus, the entire study consisted of 211 gambles.

3.2 Two-outcome gamble results

Median cash equivalents for each gamble are found in the Appendix, Table 5. We performed a nonlinear regression to estimate two-outcome cash equivalents, assuming a power value function, \(v(x)=x^\alpha \), and the “linear in log odds” probability weighting function, \(\pi (p) = \frac{\delta p ^ \gamma }{\delta p ^\gamma + (1-p)^\gamma }\) (see Gonzalez and Wu 1999, for justification of these functional forms.) The parametric estimates, \(\alpha \), \(\delta \), and \(\gamma \), for all 47 participants, as well as the median data, appear in Table 1. The probability weighting and value functions for the median data are plotted in the top two panels of Fig. 2, while plots for Participants 1 through 10 are found in Gonzalez and Wu (1999) and the plots for all participants are shown the Appendix, Figs. 5 and 6. Note that the median subject has a concave \(v(\cdot )\) and inverse S-shaped \(\pi (\cdot )\), a pattern exhibited by the majority of individuals. For the parametric specification used, \(\alpha <1\) indicates a concave value function, \(\gamma <1\) indicates an inverse S-shaped weighting function, and \(\delta <1\) indicates that the weighting function crosses the identity line for \(p<\frac{1}{2}\), i.e., \(\pi (\cdot )\) is subcertain. Across the 47 participants, \(\alpha <1\) for 33 participants (sign test, \(p<.001\)), \(\gamma <1\) for 39 participants (sign test, \(p<.001\)), while \(\delta <1\) for 30 participants (sign test, \(p=0.24\)). In addition, while there is substantial heterogeneity in curvature and in elevation of \(\pi (\cdot )\), \(\delta \) and \(\gamma \) are essentially independent (\(\rho =.07\), n.s.). Finally, the parameter estimates for participants 11 through 47 are significantly closer to 1 than for participants 1 through 10 previously analyzed in Gonzalez and Wu (1999).Footnote 6

Table 1 Parameter estimates obtained from a standard nonlinear least squares regression with a power value function (\(v(x)=x^\alpha \)) and the linear in log odds weighting function (\(\pi (p)=\frac{\delta p^\gamma }{\delta p^\gamma + (1-p)^\gamma }\)). Values in parentheses are standard errors

3.3 Three-outcome gamble results

We apply the parametric estimates of \(\pi (\cdot )\) and \(v(\cdot )\) from Table 1 to three-outcome gambles (median cash equivalent data for each three-outcome gamble are presented in Table 2). Consider the gamble \((p,x;q,y;1-p-q,z)\). Using the OPT representation (3), the predicted cash equivalent (CE) under OPT, \({\widehat{CE}}_O\), is given by

$$\begin{aligned} {\widehat{CE}}_O=v^{-1}\left( \pi (p) v(x)+\pi (q)v(y)+\pi (1-p-q)v(z)\right) , \end{aligned}$$

where \(v(x)=x^{{\hat{\alpha }}}\), \(\pi (p)=\frac{{\hat{\delta }} p ^{{\hat{\gamma }}}}{{\hat{\delta }} p ^{{\hat{\gamma }}} + (1-p)^{{\hat{\gamma }}}}\), and \({\hat{\alpha }}\), \({\hat{\delta }}\), and \({\hat{\gamma }}\) are the parameter estimates from Table 1. Similarly, CPT predicts the following cash equivalent:

$$\begin{aligned} {\widehat{CE}}_C=v^{-1}(\pi (p) v(x)+[\pi (p+q)-\pi (p)]v(y)+[1-\pi (p+q)]v(z)). \end{aligned}$$

For each of the 47 participants as well as the median data, we then compare the predicted CE under both models (using the parameter estimates from the two-outcome gambles) with the actual CE.

Fig. 2
figure 2

The panels show the estimated (a) probability weighting function and (b) value function for the median data, as well as the predicted cash equivalents under (c) OPT and under (d) CPT. Gambles with two non-zero outcomes (\(z=0\)) appear in red circles, whereas gambles with three non-zero outcome (\(z>0\)) appear in blue squares

Table 2 Three-outcome gambles, \((p,x;q,y;1-p-q,z)\), used in the study. Cash equivalents for each gamble are also given, as well as cash equivalents under CPT and OPT, predicted from parameters fitted on two-outcome gambles

We illustrate this procedure with the prospect, (.50, 400; .40, 200; .10, 0). The estimates for the median data (from Table 1) are \({\hat{\alpha }}=.75\), \({\hat{\delta }}=.87\), and \({\hat{\gamma }}=.70\), yielding \(CE_C\)=$232 and \(CE_O\)=$248, compared with the actual median CE is $245. For this example, CPT under-predicts the cash equivalent, whereas OPT over-predicts the cash equivalent.

Figure 2 plots the predicted versus actual cash equivalents for both models using the median data. CPT is plotted in the bottom left panel and OPT is plotted in the bottom right panel. The pattern found in our illustration generalizes across the remaining 36 gambles (see Table 2). Both models predict the actual cash equivalents extremely well. For CPT, the mean absolute deviation (in percentage terms) between prediction and actual is \(6.8\%\) (interquartile range: \(2.8\%\) to \(8.5\%\)), compared to \(4.3\%\) for OPT (interquartile range: \(2.1\%\) to \(6.6\%\)) However, estimates differ from actual cash equivalents in the same systematic way illustrated by the example: OPT over-predicts 19 of the 37 gambles, and CPT under-predicts 32 of the 37 gambles. A similar finding emerges when we regress the predicted cash equivalents on the actual cash equivalents (fixing the constant to be zero). The estimated slopes indicate that CPT slightly under-predicts (\(\beta _C=.977\), \(p<.001\)) and OPT slightly over-predicts (\(\beta _O=1.027\), \(p<.001\)); interaction effect testing difference of the two slopes is statistically significant (\(p<.001\)).

The relatively good fits at the aggregate level mask the considerable heterogeneity across types of gambles and participants. Figure 2 distinguishes between gambles with \(z=0\) and gambles with \(z>0\). For gambles with \(z=0\), OPT slightly under-predicts cash equivalents (\(\beta _O=0.980\), \(p=.07\)), whereas CPT significantly under-predicts cash equivalents (\(\beta _C=.910, p<.001\)). In contrast, for gambles with \(z>0\), OPT over-predicts (\(\beta _O=1.043,p<.001\)), whereas CPT is on average accurate (\(\beta _C=0.999\), \(p=0.85\)). Thus, for the median data, CPT is more accurate than OPT for \(z>0\) gambles, with OPT more accurate than CPT for \(z=0\) gambles. We discuss this finding in more detail in the next section.

We next turn to individual-level analysis of the 47 participants. Table 3 reports the degree of over- or under-prediction for each participant for the entire set of 37 gambles, as well as separately for the 22 \(z=0\) gambles and the 15 \(z>0\) gambles. (Plots for each of the 47 participants are found in the Appendix, Figs. 7 and 8.) We find the same general pattern observed for the median data at the level of individuals. CPT under-predicts cash equivalents \((\beta _C<1\)) for 36 of 47 participants (sign test, \(p<.01\)), whereas OPT over-predicts cash equivalents (\(\beta _O>1)\) for 24 of 47 participants (sign test, n.s.). We also find that CPT under-predicts cash equivalents for 40 of 47 participants (sign test, \(p<.001\)) for the \(z=0\) gambles but only 27 of 47 participants for the \(z>0\) gambles (sign test, n.s.). OPT under-predicts cash equivalents for 23 of the 47 participants for the \(z=0\) gambles (sign test, n.s.). and over-predicts cash equivalents for 27 of the 47 participants for the \(z>0\) gambles (sign test, \(p=.047\)). OPT shows considerably more variation in the degree of over- or under-prediction than CPT (e.g., the interquartile range for \(\beta _O\) is .868 to 1.439, compared to .894 to 1.000 for \(\beta _C\)).

Table 3 Slopes of regression line for zero-intercept regressions (predicted CE vs. observed CE) for cumulative prospect theory (CPT) and original prospect theory (OPT) for \(z=0\) gambles and \(z>0\) gambles. Supercertain participants (\(\delta >1\)) appear in bold font. \(^{*}\), \(^{**}\), \(^{***}\) denote coefficients significantly below 1 (.05, .01, .001 level). \(^{+}\), \(^{++}\), \(^{+++}\) denote coefficients significantly greater than 1 (.05, .01, .001 level)

To understand what may be driving differences across individuals, in Fig. 3, we plot the \(\beta \)-coefficients for each individual (Table 2) and each model against the individual-level parameter estimates, \({\hat{\delta }}\), \({\hat{\gamma }}\), and \({\hat{\alpha }}\) (Table 1). The top panel shows a strong positive relationship between \({\hat{\delta }}\) and the \(\beta \)-coefficients for OPT (\(\rho =.76\), \(p<.001\)), leading to over-prediction (\(\beta _O>1\)) for all 17 subjects with \({\hat{\delta }}>1\). These participants have supercertain weighting functions, i.e., \(\pi (p) + \pi (1-p) > 1\), resulting in gambles for which OPT is fits poorly. To illustrate, consider the prospect (.50, 200; .40, 100; .10, 0). Subject 2 has a supercertain weighting function with \({\hat{\delta }}=1.51\) (and \({\hat{\gamma }}=.65\) and \({\hat{\alpha }}=.23\)). These estimates yield \(\pi (.5)+\pi (.4)=1.14\), and hence, a predicted cash equivalent of 257 that exceeds the highest outcome 200, violating a standard condition that a CE of a gamble cannot exceed its most extreme outcomes. Note that we also see a significant negative correlation between \({\hat{\alpha }}\) and \(\beta _O\) (\(\rho =-0.50\), \(p<.001\)). However, since \(\alpha \) and \(\delta \) are highly correlated, \(\rho =-.51\), we conduct a multiple regression (see below) to analyze the effect of all parameters simultaneously.

Fig. 3
figure 3

The panels plot the relation between the degree of under-prediction and over-prediction for CPT and OPT (\(\beta _C\) and \(\beta _O\)) and the prospect theory parameter estimates (\({\hat{\delta }}\), \({\hat{\gamma }}\), and \({\hat{\alpha }}\)). Color refers to whether \(\delta \le 1\) (green) or \(\delta >1\) (orange). Participant 22 (\(\delta =2.45\) and \(\beta _O=6.89\)) is an outlier and is omitted from the figure, but included in the analyses in the text

We also find a smaller, but still significant, positive correlation between \({\hat{\delta }}\) and \(\beta _C\) for CPT (\(\rho =.48\), \(p<.001\)), and a modest, but significant, negative correlation between \({\hat{\alpha }}\) and \(\beta _C\) (\(\rho =-0.30\), \(p=.04\)). To understand the independent effect of parameters on over- and under-prediction, we perform a multiple regression, regressing \(\beta _C\) and \(\beta _O\) on \({\hat{\alpha }}\), \({\hat{\delta }}\), and \({\hat{\gamma }}\). The regressions are found in Table 4, for the whole set of 37 gambles and separately for the \(z=0\) and \(z>0\) gambles. The regressions show a continued positive relationship between \({\hat{\delta }}\) and \(\beta _O\) for OPT, while also indicating no relationship between \({\hat{\alpha }}\) and \(\beta _O\), after controlling for \({\hat{\delta }}\),

Table 4 Regression coefficients (dependent variables, \(\beta _C\) and \(\beta _O\)). \(^{*}\), \(^{**}\), \(^{***}\) denote coefficients significantly different from 0 (.05, .01, .001 level)

Recall that CPT is on average accurate for the \(z>0\) gambles, with OPT accurate for gambles with \(z=0\). When \(z>0\) and CPT is most accurate, there is no significant relationship between the individual parameter estimates and the degree of over- or under-prediction for CPT. In contrast, for the \(z=0\) gambles, where OPT is most accurate, we find a significant relationship between all the parameters and \(\beta _C\). To understand this relationship, suppose that an individual satisfies OPT for gambles with \(z=0\) and that this individual has a relatively low \({\hat{\gamma }}=.5\). For the gamble (.5, 200; .40, 100; .10, 0), OPT, which is the correct model by assumption, gives the middle outcome 100 a decision weight of \(\pi (.4)=.32\), whereas the decision weight under CPT is \(\pi (.9)-\pi (.5)=.24\). Thus, CPT will tend to under-predict the cash equivalents most when \({\hat{\gamma }}\) is low and least when \({\hat{\gamma }}\) is close to 1.

Finally, we examine whether decision weights are rank-dependent, as required by CPT, or rank-independent, as required by OPT. To do so, we examine the 22 3-outcome gambles, \((p,x;q,y;1-p-q,0)\) in which \(z=0\). Consider the middle outcome y. We denote the decision weight for that outcome w(qp). OPT assigns a decision weight of \(\pi (q)\) to y, while CPT assigns a decision weight of \(\pi (p+q)-\pi (p)\) to the same outcome. If \(\pi (\cdot )\) satisfies lower subadditivity, then a middle outcome with probability q will have a lower decision weight than a best outcome with probability q. In other words, CPT, with a typical probability weighting function, requires that extreme outcomes receive the greatest weight, holding probabilities constant (Fennema and Wakker 1997).

We use the estimates obtained from two-outcome gambles to infer a decision weight for the middle outcome y. We again illustrate this procedure with the prospect (.50, 400; .40, 200; .10, 0), which has a median CE of $245. The decision weight for the middle outcome, 200, is \(w(.40;.50) = \frac{v(CE)-\pi (.50)v(400)}{v(200)}\) (which we call the implied decision weight). Under OPT, \(w(.40;.50)=\pi (.40)\), while \(w(.40;.50)=\pi (.90)-\pi (.50)\) for CPT. For the parameter estimates for the median data, \({\hat{\alpha }}=.75\), \({\hat{\delta }}=.87\), and \({\hat{\gamma }}=.70\), we can compute three estimates for the decision weight of the middle outcome: the implied decision weight \(w(.40;.50)=.382\), the estimate of the OPT decision weight \(\pi (.4)=.396\), and the estimate of the CPT decision weight \(\pi (.90)-\pi (.50)=.336\). For this gamble, the decision weight exhibits modest rank dependence, \(w(.40;.50)<\pi (.40)\), but not as much as required by CPT, \(w(.40;.50)=\pi (.90)-\pi (.50)\).

We infer decision weights for the middle outcome for the remaining 21 3-outcome gambles with \(z=0\).Footnote 7 The results are shown in Fig. 4, with the same plot for each of the 47 participants found in the Appendix Fig. 9. The inferred decision weights are always above those implied by CPT (sign test, \(p<.001\)), but also above those implied by OPT in 16 of 22 cases (sign test, \(p=.033\)), a result inconsistent with rank-dependent decision weights and contrary to the findings of Diecidue et al. (2007) and roughly consistent with the findings of Bernheim and Sprenger (2020).

Fig. 4
figure 4

Implied decision weights for middle outcome (y) for 22 3-outcome gambles with \(z=0\) for the median data. The tests are sorted by combinations of q, the probability of the middle outcome, and p, the probability of the highest outcome. For example, “.01 (.50)” bins the three tests in which \(q=.01\) and \(p=.50\). Red triangles denote implied decision weights calculated from estimated probability weighting and value function parameters. “C” and “O” indicate decision weights assuming CPT and OPT, respectively

4 Discussion

Of the three components of choice models—value function, probability weighting function, and composition rules—composition rules have received the least empirical attention. The empirical contest to date between OPT and CPT, the two most widely studied “nonlinear in probability” models, has been mixed. In this paper, we take a different approach than previous researchers. We estimate parameter values on two-outcome gambles, where the models coincide, and then use these parametric estimates on a holdout sample of three-outcome gambles. We find that both models predict the median data extremely well. However, CPT slightly but systematically underestimates and OPT slightly but systematically overestimates three-outcome gamble cash equivalents. We document similar results using Bayesian hierarchical methods (e.g., Murphy and ten Brincke 2018; Nilsson et al. 2011) and when we employ heterogeneous error terms in a nonlinear mixed model regression (see Supplementary Information).

Our aggregate level findings mask the considerable heterogeneity at the level of types of gambles and individual subjects. Our analyses showed that OPT overvalues cash equivalents relative to CPT. However, we also found that OPT was on average accurate for three-outcome gambles in which the lowest outcome is 0, whereas CPT was accurate when the worst outcome is positive.

We offer a speculative hypothesis for this pattern: the lowest outcome frames how the middle outcome is viewed and hence how much weight is given to that outcome. When the lowest outcome is 0, the middle outcome is seen as one of the two positive outcomes that might result from this gamble—“If I win something, it will either be x or y.” In this case, neither of the two outcomes (x or y) appears to receive any precedence, consistent with the treatment under OPT. On the other hand, when the lowest outcome is positive, then the evaluation of this gamble might differ considerably. The decision-maker might code this gamble in the following way—“I am going to win something. I could win as much as x or as little at z.” Under this coding, the middle outcome, y, will likely receive less weight, being viewed as “marginal” relative to the best and worst outcome. Of course, this psychological process is naturally approximated by the rank-dependent composition rule used by CPT.Footnote 8

Earlier, we discussed some psychological features of the two composition rules. CPT has the attractive feature of generalizing diminishing sensitivity from probabilities to outcomes, whereas OPT has the feature that outcomes that are distinctive from others receive more weight because of the subadditivity of \(\pi (\cdot )\). Our speculative hypothesis points to a constructive decision process in the tradition of Payne et al. (1993). It is possible that neither composition rule captures the decision-making process used by subjects for all gambles, but that both composition rules might approximate the process used to evaluate particular types of gambles. We suggest that researchers might consider the relation between psychological processes and algebraic models (e.g., Brandstatter et al. 2006; Johnson et al., 2006). Researchers might start with attention. Decision weights can be thought of as capturing the amount of attention devoted to each outcome. Such an interpretation of decision weights is not new. Many researchers including Lopes and Oden (1999), Wakker (1990), and Weber (1994) have offered motivational accounts of decision weights as capturing the balance between security and potential, optimism and pessimism, and asymmetric loss functions. We suggest, in addition, that there may be a cognitive explanation involving attention. How a gamble is presented to the respondent may influence how much attention each outcome garners. For example, attention to middle outcomes might differ if a distributional presentation such as that used by Lopes and Oden (1999) is used rather than a verbal presentation such as that used in Tversky and Kahneman (1992) or if the worst outcome is non-zero, as in our study. Mouselab and other process tracing techniques offer promising ways of studying attention by measuring how long and often subjects attend to particular outcomes and probabilities (Costa-Gomes et al. 2001; Johnson et al. 2002; Payne et al. 1993; Schulte-Mecklenbeck et al. 2019, 2017; see Murphy and ten Brincke 2018, for an example based on prospect theory).

Economists have been increasingly interested in fitting non-expected utility choice models to real-world data such as behavior from asset, insurance, and race track markets (e.g., Barberis 2013; Barberis and Huang 2008; Barseghyan et al. 2013; Camerer 2000; Chiappori et al. 2019; Dimmock et al. 2021; Gurevich et al. 2009; Jullien and Salanie 2000). The primary concern is whether these models organize empirical regularities better than the classical expected utility models. Although our analysis suggests that neither composition rule, CPT or OPT, works perfectly, goodness-of-fit measures indicate that both models do well in explaining three-outcome cash equivalent data and also explain three-outcome data significantly better than expected utility. Pragmatically, there are additional reasons to prefer CPT as a choice model for applications. Our investigations indicate that CPT is more robust to heterogeneity in individual preferences, or, more specifically, well behaved for individuals with both subcertain and supercertain probability weighting functions. In addition, it is parsimonious (aggregate data are explained well using only one more parameter than expected utility, Gonzalez and Wu 1999), relatively tractable (using CPT is like using EU with transformed probabilities, however some equilibrium proofs can be rather difficult with transformed probabilities), and generalizes naturally from discrete to continuous probability distributions.