## Introduction

Expected utility theory is central to understanding decisions and games (von Neumann and Morgenstern 1944, Maynard Smith 1982). It is widely used in the analysis of decision-making processes in animals (Real and Caraco 1986; Stephens and Krebs 1986; Stephens et al. 2007) and humans (Raiffa 1968). In game theory, the values in a payoff matrix are equivalent to fitness units in animals (Maynard Smith 1982; Nowak 2006) or utility (economic gain) in humans (von Neumann and Morgenstern 1944). The basic principle of expected utility theory is simply the maximization of expected fitness (utility) of current energy reserves (wealth) for an animal (human) decision maker. This is based on the fact that, in humans, the ‘value’ of a $10 bill when one’s wallet is empty is much greater than when one’s wallet is full of$20 bills. Similarly, in animals, the value of one small piece of food when starved is much greater than that of one large piece when the stomach is full. The ‘value’ is called ‘utility,’ in expected utility theory, but it is equivalent to Darwinian ‘fitness’ in animals. Although expected utility theory is well formulated mathematically (von Neumann and Morgenstern 1944; Maynard Smith 1982), the original version has been criticised because of a few serious inherent flaws and heretofore unresolved paradoxes (Friedman and Savage 1948; Markowitz 1952; Edwards 1961; Raiffa 1968; Allais and Hagen 1979; Machina 1987a). The best known, and perhaps the most important, problem is the Allais paradox (Allais and Hagen 1979). The Allais paradox classically involves inconsistencies inherent in human decisions, as follows: if given a pair of choices (“monetary gambles”), human subjects tend to prefer a sure payoff over a gamble, even if the expected payoff of the gamble is much higher than the sure payoff (the ‘sure thing principle,’ after Savage 1954). However, if two gambles are offered, humans prefer the option with the highest payoff (expected utility), even if its probability is slightly diminished (Edwards 1961). That is, some decisions are more influenced by probability, others more by monetary rewards. No single utility function can describe adequately the switching behavior in preferences observed in the Allais paradox. Even though expected utility theory has greatly advanced our understanding of behavioral decisions and evolutionary game theory, it has been criticized heavily for its persistent difficulties, including the Allais paradox, reconciling the problem of gambling and insurance at multiple wealth levels, and that of simultaneous gambling and insurance (Friedman and Savage 1948; Markowitz 1952; Edwards 1961; Machina 1982, 1987a, b).

For example, the problem of gambling and insurance at multiple wealth levels is explained as follows. Since people at any wealth level play both gambling and insurance, the utility (fitness) function should be concave up and convex down at the current wealth level (Friedman and Savage 1948). However, people at all wealth levels play some gambling while paying some insurance premiums, indicating the utility function should shift around the current wealth level (Markowitz 1952). Thus, we cannot define a single utility function for wealth, as in the standard expected utility theory. The simultaneous gambling and insurance is explained as follows. We cannot explain why people play gambling and insurance simultaneously, since these behaviors are opposite: risk-taking and risk-averse (Machina 1987a, b). Among these persistent difficulties, the Allais paradox is widely considered the most intractable problem in expected utility theory (Allais and Hagen 1979; Kahneman and Tversky 1979; Machina 1987a).

Here, extending the dynamic utility function of Yoshimura et al. (2012), we develop the principle of dynamic utility in terms of the avoidance of bankruptcy in humans (death by starvation animals). The novel components of dynamic utility are that: (1) utility is assumed to be conditional on a current state variable (body size or energy fat reserves in animals; wealth in humans); and (2) losses are weighted on utility more heavily than gains, so that death by starvation (bankruptcy in humans) is avoided. We apply the dynamic utility approach to the Allais paradox in human economic behavior. Our solution indicates that paradoxical switching behavior in preferences is a function of current states. We discuss the significance and implications of our findings to the fields of animal foraging behavior, as well as human economic behavior.

The current paper is aimed in particular at the basic principle of optimality in sequential decision making in animals, including humans. Even though we emphasise solving the Allais paradox in humans, our solution also has important implications for understanding optimization schemata in animal behavior. The principal difference between these phenomena lies in the currency under optimization: in humans, gain accrues in economic terms (wealth); in animals, in body size (energy reserves).

## Example of the Allais paradox

We now present an example of the Allais paradox for our numerical solution following Machina (1987a). Imagine your preferences between choices a and b, and between c and d.

Let U = U(x) be the utility of wealth \$x. Then, the expected utility of each choice is

$$E(U_{a} ) = U(1M)$$
(1)
$$E(U_{b} ) = 0.10U(5M) + 0.89U(1M) + 0.01U(0)$$
(2)
$$E(U_{c} ) = 0.11U(1M) + 0.89U(0)$$
(3)
$$E(U_{d} ) = 0.10U(5M) + 0.90U(0)$$
(4)

where M denotes millions of dollars. If choice a is preferred over choice b, then the expected utility theory asserts that $$E(U_{a} ) > E(U_{b} )$$ (Eqs. 1, 2).

Now we apply the independence axiom of expected utility theory (von Neumann and Morgenstern 1944). A straightforward explanation of the independence axiom follows (Machina 1982). Suppose we have risky prospects A, B and C, and their combined choices X and Y, where X is the combined reward of A and C, with a p:(1 − p) chance of A or C, that is,

while Y is that of B and C, with a p:(1 − p) chance of B or C, that is,

We have three options for both (A vs. B) and (X vs. Y), e.g., (1) prefer A over B, (2) prefer B over A and (3) indifferent between A and B. The independence axiom states that choice A is preferred (otherwise ignored or avoided) over choice B, if and only if X is preferred over Y, such that $$A \succ B \Leftrightarrow X \succ Y$$. Here, we explain only the case (1), but the latter two cases hold in the same manner (i.e., $$A \prec B \Leftrightarrow X \prec Y$$ and $$A\sim B \Leftrightarrow X\sim Y$$). This means that we can ignore the same component C in choices X and Y because its probability is the same between the choices X and Y, reducing to choices between A and B. The literal meaning of the independence axiom implies that only differences among choices A, B and C need be compared.

Applying the independence axiom, we can subtract 0.89 U(1 M) from both sides of this inequality (Eqs. 1, 2). Then we get:

$$0.11U(1M) > 0.10U(5M) + 0.01U(0)$$
(5)

Similarly, if choice d is preferred over choice c, then $$E(U_{d} ) > E(U_{c} )$$ (Eqs. 3, 4). We also apply the independence axiom to choices c and d. We subtract 0.89 U(0) from both sides, and this likewise reduces to

$$0.10U(5M) + 0.01U(0) > 0.11U(1M)$$
(6)

The inequality in the second choice (Eq. 6) is exactly opposite of that in the first choice (Eq. 5). The independence axiom of the expected utility theory asserts that preference a over b is equivalent to preference c over d and vice versa. However, experimental tests and psychological observations all suggest that humans tend to prefer a over b, but d over c (Machina 1987a, b). Thus, human behavior seems inconsistent with expected utility theory (specifically with the independence axiom). This kind of systematic violation of expected utility theory is often referred to as the Allais paradox.

## Dynamic utility with discounting for bankruptcy

Suppose a human maximizes his lifetime wealth through a sequence of economically based decisions. We may replace human wealth by body size (the energy state or fat reserves) in animals. The wealth at time t, w t , is assumed the non-negative state variable of the decision-maker for t = 0,…,T, where T ≫ 0 is a finite end time. We then assume that final wealth, w T , represents the lifetime fitness of an individual. The maximization of lifetime fitness, w T (for t = 0,…,T) is then given by

$$\text{Maximize} :\,w_{T} = w_{0} r_{0} r_{1} r_{2} \ldots r_{T - 1} = w_{0} \prod\limits_{j = 0}^{T - 1} {r_{j} }$$
(7)

where r t (≥0) denotes the multiplicative growth rate of wealth at time t, such that w t+1  = r t w t . We here assume that multiplicative growth rate r t (t = 0…T) follows a stochastic process, such that r t are independent non-negative random variables (non-negative i.r.vs.), i.e., r t  = r (>0) with probability p(r) for a probability distribution of {r}. We here choose the optimal set from many possible sets of random variables. We search for the rules of choice for the optimal probability distribution that gives the highest final wealth (or body size) w T . Then the geometric mean G(r) in Eq. (1) becomes:

$$\text{Maximize} :G(r) = \prod\limits_{j = 0}^{t - 1} {r_{j}^{{\left( \frac{1}{t} \right)}} }$$
(8)

This geometric mean is the temporal average in the growth rates of wealth (body size) of an individual over time. Here, Eq. (8) is the temporal mean of observed growth factors r t . In order to analyse the optimality of decision sequences, we replace r t with the corresponding probability distribution p = p(r) (r > 0, i.e., probability p(r j ) = 1/t), such that

$$\text{Maximize} :G(r) = \prod\limits_{{p(r)}} {r^{p(r)} } \quad \quad\quad for {\sum {p(r)} = 1}$$
(9)

We now introduce a criterion for the avoidance of bankruptcy in the maximization function of human wealth. The term “bankruptcy (starvation death)” here means that wealth (or energy reserves) is reduced to zero, i.e., w t  = 0. In the case of animals, it is the death caused by starvation. The idea of a penalty function is that a decision maker always fears an ultimate irreversible loss of wealth (energy reserves). We wish simultaneously to (1) maximize w t , and (2) minimize the possibility of bankruptcy (or irreversible losses).

We combine the above two optimization criteria by the following procedure. Let f(r) be a penalty function for negative growth rates, such that

\begin{aligned} f & = f(r)\quad \text{where}\quad f(r) = \, 0 \quad \text{for}\quad r \ge 1 \\ & \quad \quad \quad \quad \text{and}\quad f(r) > 0\quad \text{for}\quad r < 1 \\ \end{aligned}
(10)

We then define dynamic optimization with discounts on negative growth, such that

$$\text{Maximize} :G_{\text{Total}} (r) = \prod\limits_{p(r)} {(r^{1 + f(r)} )^{p(r)} }$$
(11)

where G Total(r) is the combined optimization criterion for the wealth maximization and the avoidance of negative growths. Equation (11) simply adds an additional weight f(r) on negative growth in the overall fitness function. The reason we introduce the penalty function (Eq. 10) in the current optimization scheme is that the current geometric mean analyses is not capable of including bankruptcy, since log(r) = −∞, if r = 0 (not assumed). Therefore, we introduce a given penalty for any negative growth (r < 1), with no penalty for positive growth. Because of the inclusion of the penalty on negative growth, f(r) is discontinuous at r = 1. The current inclusion of penalty on negative growth should express the avoidance of bankruptcy to some extent. Taking the logarithm of Eq. (11), we get

$$\text{Maximize} :\log G_{\text{Total}} (r) = \sum\limits_{p(r)} {p(r)\{ (1 + f(r))\log r\} } = E\{ (1 + f(r))\log r\}$$
(12)

where E now refers to the commonly used arithmetic mean, in this case of [1 + f(r)]log r. This optimization criterion is simply expressed by utility theory (von Neumann and Morgenstern 1944). Namely, we simply define a utility function u(r):

$$u(r) = (1 + f(r))\log (r)$$
(13)

and maximize the expected utility, E[u(r)]. We now replace the growth rate r by gain g and current wealth w, such that r = (g + w)/w. Then we define the dynamic utility surface u(g;w) of gain g (decision variable) given current body size w (state variable), such that (Fig. 1):

$$u\left( {g;w} \right) = (1 + f(r))\log \left( {\frac{g + w}{w}} \right)$$
(14)

where we simply maximize the expected utility of gain, E[u(g;w)] given current wealth w. This optimization rule for any one-time decision based on dynamic utility surface includes both the criteria of final wealth maximization and the avoidance of bankruptcy at any single decision during the lifetime of a human or an animal.

## Solution for the Allais paradox

Here, we present a numerical solution for the Allais paradox, applying the dynamic utility function (Eq. 14). We here present only a numerical solution, but the current theory is completely logical (purely analytical), that is, the Allais paradox is the inevitable outcome for some people as long as all the conditions are met.

We set a constant penalty on all negative growth, i.e., f(r) = c (constant) for −w < g < 0. Practically, the penalty should depend on the value of negative growths. For example, a large negative value incurs a large penalty, and a small negative value, a small penalty. We also analyze the case of proportional penalty (see Supplementary material). With a constant penalty, the dynamic utility of gain (Eq. 14) becomes:

$$u\left( {g;w} \right) = \log \left( {\frac{g + w}{w}} \right) \quad \text{for} \, g \geq 0$$
$$u\left( {g;w} \right) = (1 + c)\log \left( {\frac{g + w}{w}} \right)\quad \text{for} \, -w < g < 0 \,$$
(15)

To calculate the Allais paradox numerically, we set the current wealth state w = 10,000 and the penalty constant c = 3 in Eq. (15). Now we put Eq. (15) into the example of the Allais paradox (Eqs. 14). First, imagine the choice between c and d. Introducing Eq. (15) into Eqs. (3) and (4), the dynamic expected utilities of choices c and d, E(u c ) and E(u d ), are expressed as, respectively:

$$E(u_{c} ) = 0.11\log \left( {\frac{1M + w}{w}} \right) = 0.507663$$
(16)
$$E(u_{d} ) = 0.10\log \left( {\frac{5M + w}{w}} \right) = 0.621661$$
(17)

where M stands for a million dollars and w = 10,000. Since $$E (u_{d} )> E (u_{c} )$$, the result is a predicted preference for d over c. Note that the expected utilities of choices c and d do not depend on the penalty constant c.

Now consider the choice between a and b. Since choice a implies no uncertainty, a decision maker thinks of it as having been chosen already. This imaginary choice will shift the current wealth state, i.e., w’ = 1 M + w. Then the two choices are rewritten as:

In reality, this involves a single choice of the offer b′ or otherwise. The expected utilities of choices a′ and b′ are similarly calculated from Eq. (15):

$$E(u_{{a^{'} }} ) = u(0;w^{'} ) = 0$$
(18)
$$E(u_{{b^{'} }} ) = 0.10u(4M;w^{'} ) + 0.89u(0;w^{'} ) + 0.01u( - 1M;w^{'} ) = - 0.0244563$$
(19)

Because $$E(u_{a'} ) > E(u_{b'} )$$, the predicted preference is now for a′ over b′. Therefore, the original preference becomes a over b, and the Allais paradox is resolved.

Our results depend on the current wealth state w (Fig. 2). We denote w = x 1 when $$E(u_{d} ) = E(u_{c} )$$ and w = x 2 when $$E(u_{a'} ) = E(u_{b'} )$$. In the current example, the two boundaries are approximately x 1 = 0.1024 and x 2 = 18,922.

If 0 < w < 0.1024 = x 1, then the preferences are a over b and c over d. This represents a pure conservative strategy that can only rarely be achieved. If x 1 = 0.1024 < w < 18,922 = x 2, the preferences are a over b and d over c, exhibiting the Allais paradox. Finally, if w > 18,922 = x 2, the preferences are b over a and d over c, resulting in risk-taking strategies. This suggests that extremely rich people are less likely to show the Allais paradox than poor people. It also agrees with common observations that the preference reversal between choices a and b is not strict, but that the preference d over c is almost invariant.

We should also note that the range of the Allais paradox depends on the penalty constant c (Fig. 2b). Even though the lower boundary x 1 is constant and independent of the penalty, the higher boundary x 2 depends on the penalty cost c. When c is large, x 2 becomes large, resulting in the larger middle classes exhibiting the Allais paradox (Fig. 2b).

## Discussion

Our numerical solution to the Allais paradox comprises three critical components: (1) a shift in reference points, (2) a penalty on negative growth, and (3) a wealth-dependent utility function. It is important to note that the Allais paradox requires all three conditions, and that the first two conditions are not unique to our model. We should also note that the penalty is more likely to be proportional to the value of negative growth functions, since it is correlated with the probability of bankruptcy (for the case of proportional penalty, see supplementary information).

The first condition, a shift in reference points, has been suggested (Markowitz 1952) and tested empirically in the context of problems (Hershey and Schoemaker 1980). This means that a utility function is based on a gain (or loss) given the current state of wealth, rather than the overall wealth (von Neumann and Morgenstern 1944; Friedman and Savage 1948). In foraging behavior of animals, this means simply a dependence on the current state; this concept is well established in studies of risk-sensitive foraging (Stephens and Krebs 1986). For example, animals tend to remain in a safe nest until hungry (Lima et al. 1985; Ito et al. 2013). Here, the value of food is low when an animal is satiated, but high when the animal is starved. Mathematically, this principle, current-state dependence, agrees with the basic principle of (stochastic) dynamic modeling, e.g., the optimality principle of dynamic programming (Bellman 1957; Mangel and Clark 1988).

The second condition, a penalty on negative growth, is often assumed in expected utility theory (Markowitz 1991) and applied in prospect theory (Kahneman and Tversky 1979). In the absence of a penalty on negative growth [f(r) = 0], preferences cannot be reversed in our model. This suggests that the Allais paradox is a phenomenon related to the avoidance of bankruptcy in human behavior and the avoidance of starvation in foraging behavior (Lima 1985; Ito et al. 2013).

The unique contribution of our model lies in the third condition: the (current) wealth-dependent utility function. A utility function in expected utility theory has often been characterized as ad hoc, since its shape can only be deduced by logic or estimated from empirical observations (Friedman and Savage 1952; Caraco 1980; Real and Caraco 1986; Yoshimura and Shields 1987; Mangel and Clark 1988). Furthermore, previous solutions for the Allais paradox weighted probability values in a nonlinear fashion (Edwards 1961; Kahneman and Tversky 1979; Machina 1987a, b; Kadane 1992). Such modifications inevitably introduce subjectivity and inconsistency in the systems under analysis (Fishburn 1988). Therefore, the previous solutions have been criticized as descriptive models, while the expected utility theory is called a normative model (Fishburn 1988).

The proposed wealth-dependent utility function is an analytical model, since it is derived analytically from optimization criteria. Under our solution, we adopt a constant penalty on negative growth for the sake of simplicity. For the simplicity of calculations, we assume that penalty is constant irrespective of the amount of losses. However, the larger the loss, the more humans tend to avoid it. Therefore, wealth-dependent utility function is more reasonable if the penalty is proportional to negative growth. However, the result of proportional penalty is qualitatively the same with the constant case (see Supplementary material for detail derivations). We should note that this concept is derived from the maximization of the final states, such as body size. Thus, it is equivalent to the traditional average weight maximization in foraging behavior (Stephens and Krebs 1986) and dynamic modeling in behavioral decisions (Mangel and Clark 1988).

An important aspect of our proposed solution is that expected utility theory is a static model (Yoshimura et al. 2012). This implies that game theory and the well-known concept of the evolutionarily stable strategy (ESS) are only valid in the context of decisions arrived at singly. From the very existence of the Allais paradox, many economists and applied mathematicians suggest that standard expected utility theory may be flawed (e.g., Edwards 1961; Allais and Hagen 1979; Kahneman and Tversky 1979; Machina 1982, 1987a, b). However, we should insist that the standard expected utility theory is a valid theory if it is applied as a static model dealing with one-time decisions. It is not the theory of optimization, but preference. As a static model, a utility function can be defined as a function of overall wealth only (Friedman and Savage 1948; Markowitz 1952). The only problem is that it is applied to sequential decisions that cause various problems, such as the Allais paradox, and the problem of simultaneous gambling and insurance (see Machina 1982, 1987b). In our model, by taking account of the timing (state) of decisions, we define utility in terms of gain and loss, that is, a dynamic utility surface. With these modifications, the theory can accommodate the Allais paradox and generate testable predictions about the effects of wealth on decision making. Note that dynamic programming is another method that can account for dynamic decision-making or sequential decisions (Mangel and Clark 1988).

Our model deals not only with foraging in animals but also economic behavior of humans; in this regard, we assume no difference between human and animal behavior (Wilson 1975, 1979; Barash 1979). The optimality of human behavior is often regarded as ‘rational’; that is, each person seeks to maximize his gain (Hardin 1968), rendering it equivalent to adaptive behavior in animals (Harsanyi 1977; Hogarth and Reder 1986). Because the traditional expected utility theory is static, its previous applications to animal and human behavior lack the optimality criterion (Friedman and Savage 1952; Caraco 1980; Real and Caraco 1986; Yoshimura and Shields 1987). We here develop the optimality criterion for behavioral decisions in both animals and humans, and derive the utility (fitness) surface, u(g;w) (see also Yoshimura et al. 2012).

Our dynamic utility model is mathematically quite similar to the concept of geometric mean fitness developed in the study of adaptation in stochastic environments (Lewontin and Cohen 1969; Yoshimura and Clark 1991; Yoshimura et al. 2009). However, our model deals only with the sequential decisions of an individual over its lifetime, unlike the population growth rates considered in treatments of geometric mean fitness. In other words, geometric mean fitness applies at the population level over many generations, while dynamic utility applies at the individual level within a single generation. To consider long-time population growth functions incorporating life-time decisions, we could actually combine both models (Yoshimura and Clark 1991).

In summary, current dynamic utility theory encompasses the optimization of energy reserves (body size) in animals or economic wealth in humans during sequential decisions. We therefore expect that foraging (feeding) behavioral patterns of animals should likewise exhibit the Allais paradox against the risk of starvation. Our theoretical results indicate that a starved animal is more willing to take risks than a satiated animal, hence risk aversion is a fundamental feature of animal behavior.

Most important, the fitness function derives both from current energy gain and current body size. Therefore, the fitness function itself changes along with the current body size of an animal (Eq. 14), unlike the standard theory of expected utility theory in which the fitness of an animal is a universal function of body size, irrespective of the current body size (Caraco 1980; Real and Caraco 1986). The Allais paradox is the first demonstration (experiment) showing indeed the fitness function is state (body size)-dependent. Our current theory of dynamic utility can thus be used to evaluate lifetime fitness based on the optimality of an animal’s behavioral decisions.