Introduction

Environmental uncertainty and risk are critical factors modulating behavioral sequences in animals. Extensive empirical studies of foraging behavior demonstrate animal behavior is highly risk-sensitive (Real and Caraco 1986; Stephens and Krebs 1986; Stephens et al. 2007; Houston et al. 2011). Mathematical approaches used frequently in traditional studies of foraging behavior include (1) the mean-variance or variance-discount method (Stephens 1981; Stephens and Charnov 1982; Stephens et al. 2007); (2) expected utility theory (Caraco 1980; Real 1980a, 1980b). These two methods are highly effective for characterizing and comparing decision-making processes under conditions of uncertainty and have been applied often in the interpretation of behavior in experimental settings. However, they are not easily applicable towards elucidating the sequence of behavioral decisions or dynamic decision-making.

Optimizing behavior under the constraints of uncertainty and risk often depends on the precise sequence of behavioral decisions or dynamic decision-making. Mathematical approaches towards optimizing dynamic decision-making include dynamic programming (DP) (Bellman 1957) and stochastic optimal control (Pontryagin et al. 1962). Since its introduction to behavioral ecology, dynamic programming has been extensively applied to optimality analyses of behavior and life history strategies (Houston and McNamara 1982; Mangel and Clark 1986, 1988; Houston et al. 1988). Moreover, dynamic programming has proved useful in evaluating the lifetime reproductive success (Darwinian fitness) in the development of individual-based models (Judson 1994).

In contrast, stochastic control theory has only rarely been applied to behavioral and evolutionary ecology (see e.g., Katz 1974; Oaten 1977; Oster and Wilson 1978). This is mainly because of the difficulty and intractability of Pontryagin’s maximum principle, the core principle of stochastic control theory (Stephens and Krebs 1986; Mangel and Clark 1988). Further, the applicability of this approach to behavioral biology is rather limited owing to its complexity and the numerous, possibly unrealistic, assumptions inherent in it.

Apart from some additional assumptions of continuous variables (Mangel and Clark 1988), stochastic control theory is mathematically equivalent to dynamic programming, but the latter is more powerful and flexible when applied to dynamic decision-making in behavioral and life history analyses (Mangel and Clark 1986; Houston et al. 1988). The essence of dynamic programming is Bellman’s principle of optimality (Bellman 1957, 1961), which states that in a multistage decision process, the series (sequence) of optimal choices always consists of the optimal choice at each time step.

In spite of the wide applicability and usefulness of dynamic programming in behavioral and evolutionary ecology, there is a weak point in this method: the difficulty in the interpretation of numerical results. In dynamic programming, the output is always in the form of numerical tables summarizing dynamic changes of animal behavioral states: characterizing optimal behavior is not straightforward. Therefore, great care must be exercised in the interpretation of numerical values representing optimal behavioral sequences. In some complicated cases, such interpretation is impossible or nearly so, because of complex model settings. In spite of its limitations, dynamic programming has remained the most useful approach for exploring dynamic problems in behavioral and evolutionary ecology, since stochastic control theory is much more limited in its applicability (Mangel and Clark 1988).

Here we develop an analytical model of dynamic decision-making under uncertainty and risks by applying the mathematical analyses (techniques) developed in prior studies of geometric mean fitness (Yoshimura and Clark 1991), within the framework of dynamic utility maximization (Strotz 1955). We argue that this is the first method to characterize properties of fitness in a dynamic sense, in which the fitness function (optimization criterion) changes dynamically in time. In this paper, the mathematical technique developed for geometric mean fitness in stochastic environments (Yoshimura and Clark 1991) is applied to the optimization of sequential decisions of an individual animal. We should note here that the mathematics we develop for behavioral decision-making is at the individual level within a single generation, and highly distinct from that of geometric mean fitness approximating population growth over many generations. For example, the risk of predation of an individual within one generation is different from that of stochastic variation over many generations (Ito et al. 2013).

We consider the optimization of daily sequential decisions from birth to death of an individual and evaluate its lifetime reproductive success. We consider the optimality of a stochastic decision process by applying Bellman’s principle of optimality used in dynamic programming (Bellman 1957, 1961). We solve the optimal decision sequence analytically. The result shows that the fitness function (optimal decision criterion) is dependent on current animal states. Finally we discuss the implications of the result, especially for game theory.

Theory

Suppose that a juvenile animal grows every day to adulthood. Body size, w t , is the non-negative state variable of the animal (decision-maker) at time t for t = 0,…,T, where T ≫ 0 is a finite end time (until death). We may regard body size as a proxy for the energy state or fat reserves of an animal. Note that T corresponds to the time of reproduction in semelparous animals, e.g., anadromous salmon. For simplicity, we assume that the final body size, w T , represents the potential fitness (lifetime reproductive success of an individual animal).

Let r t (≥0) denote the multiplicative growth rate of body size at time t, such that w t+1  = r t w t . Then the body size, w t , is expressed:

$$ w_{t} = w_{0} r_{0} r_{1} r_{2} \ldots r_{t - 1} = w_{0} \prod\limits_{j = 0}^{t - 1} {r_{j} } $$
(1)

Note that, at each time t, the decision-maker chooses an option that results in a growth rate r t , where r t is given by the probability distribution of growth rates associated with the option. Then the growth rate r t (t = 0,…,T) is a stochastic process, and the decision-maker can optimize this stochastic process by choosing an option at each time point. We here evaluate the optimality of sequential decisions over t = 0,…,(T − 1) (decision sequence of totalling T decisions). Then the geometric mean G(r) of the body size growth function of an individual in Eq. (1) becomes:

$$ G(r) = \prod\limits_{j = 0}^{t - 1} {r_{j}^{{\left( \frac{1}{t} \right)}} } $$
(2)

The above Eqs. (1, 2) are similar to those of geometric mean fitness at the population level (Yoshimura and Clark 1991), but differ by incorporating multiple decisions of an individual over its lifetime (at the individual level).

We assume that the multiplicative growth factors r t (t = 0,1,2,… ,T) of the body size of an animal are independent and identically-distributed random variables (i.i.d.r.vs.). Here Eq. (2) is the average of observed growth factors r t evaluated over time t. In order to analyze the optimality of decision sequences, we replace r t with the corresponding probability distribution p = p(r) (i.e., probability p(r j ) = 1/t), such that

$$ G\left( r \right) = \prod {r^{p(r)} } $$
(3)

Note this equation includes a probability distribution which can be interpreted as an ensemble average estimated by a large number of trials with many individuals. Now the body size w t at time t (Eq. (1)) is expressed:

$$ w_{t} = w_{0} {\left( {\prod {r^{p(r)} } } \right)^t} = w_{0} G{(r)^t} $$
(4)

Then we consider the optimization of the body size of an individual until time t, such that

$$ {\text{Maximize}}:w_{t} $$
(5)

for all the decisions over j = 0,…,(t − 1). In this paper we call this estimate of reproductive success the potential fitness (of an individual animal). Maximizing w T (i.e., t = T) yields the optimization of body size over an individual’s lifetime. This framework is identical to the principle of optimality in dynamic programming (Bellman 1957, 1961). From Eq. (4), maximizing the potential fitness (Eq. 5) becomes simply: max {w t } = w 0{max G(r)}t. Then the maximization of the geometric mean potential fitness G(r) is achieved by maximizing log{G(r)}, since log(r) is a monotone increasing function. Thus maximizing potential fitness w t (Eq. 5) is equivalent to maximizing log{G(r)}, such that

$$ {\text{Maximize}}:\,\,\log G(r) = \sum\limits_{j = 0}^{t - 1} {\log r_{j} } = \sum\limits_{r} {p(r)\log r} = E\{ \log r\} , $$
(6)

where E now refers to the commonly used arithmetic mean, in this case of log r. This expression ties in nicely with utility theory (von Neumann and Morgenstern 1947). Namely we simply define a utility function u(r):

$$ u\left( r \right) = \log \left( r \right) $$
(7)

and then express our evolutionary hypothesis as ‘natural selection maximizes expected utility E(u(r)) of growth rate r.’ Since log r is universally concave downwards, we reach the immediate prediction that the utility maximization under conditions of uncertainty is, in principle, risk averse (Mangel and Clark 1988, Yoshimura and Clark 1991).

Finally the growth rate r of body size can be replaced by weight gain g and current body size w at any time. Since r t  = w t+1 /w t and w t+1  = (gain at time t) + w t , we get

$$ r = \frac{g + w}{w} $$
(8)

We then define the dynamic utility surface u(g;w) of gain g (decision variable) given current body size w (state variable), such that:

$$ u\left( {g;w} \right) = \log \left( {\frac{g + w}{w}} \right) $$
(9)

and the principle of dynamic utility (DU) optimization is to maximize the expected utility (potential fitness) E(u(g; w)) of weight gain g, given current body size w (Fig. 1). Thus, given a current state w, the static solution of dynamic utility becomes a simple form of the expected utility theory with a logarithmic utility function of gain g given w. This result demonstrates that the potential fitness function itself (optimization criterion or utility function) depends on the current state w of animals, i.e., u = u(g; w).

Fig. 1
figure 1

Dynamic utility function, u = u(g;w). a, u plotted against g b, u plotted against (g + w). Body sizes are w = 1 (solid line), 2 (dashed line), 3 (dotted line)

Discussion

Previously, fitness has been considered a universal function of a single variable w (body size, food, energy or fat reserves), e.g., the utility function u = u(w) (Caraco 1980; Real 1980a, b). However, it has been pointed out in human economic behavior that the shape of the utility function should depend on the current wealth state (Friedman and Savage 1948; Markowitz 1952). Our analysis demonstrates that the shape of the fitness function (potential fitness of an individual) also depends on two variables: one state variable (e.g., current body size) and one decision variable (current gain), i.e., u = u(g; w) (Eq. 9, Fig. 1a). This indicates that potential fitness functions can vary among individual animals as long as their current states (e.g., body size) vary as well (Fig. 1b).

The current result does not agree with the properties of the utility function in a payoff matrix under game theory (e.g., Maynard Smith 1989). For example, consider the payoff matrix of the famous hawk-dove game (Fig. 2). When a dove plays with a dove, they divide the fitness reward, V/2. Suppose the body size of one dove (dove 1) is ten times that of the other dove (dove 2), i.e., w 1 = 10w 2. Then, the equal fitness value (V/2) means that the gain of dove 1 is ten times that of dove 2, i.e., g 1 = 10g 2. This implies that the larger one will receive proportionately more food than the smaller one. For example, if there is 110 g of food in total, dove 1 gets 100 g, while dove 2 gets 10 g. This contrasts with the peaceful (equal) division of rewards (55 g each) among doves unless all doves have an equal body size. However, once the game is played among hawks and doves, the body sizes of the players become distinct. The weight w of the dove against a dove becomes w + V/2, whereas that of the dove against a hawk remains w. In similar fashion, the hawk body size varies as well. The hawk versus hawk becomes w + (V − C)/2, while the hawk versus dove becomes w + V. Thus, once the game has begun, the current body sizes of players become highly variable. In the payoff matrix that is derived from the traditional utility theory, the current w does not affect the utility function from the third axiom (so called independence axiom) of expected utility theory (von Neumann and Morgenstern 1947). Thus our result shows that the third axiom is not valid for dynamic behavior in principle. We thus conclude that the expected utility theory is a static model.

Fig. 2
figure 2

Typical payoff matrix of a hawk-dove game. Dynamic utility indicates that each payoff should not be a utility, but an absolute gain

In reality, for an animal player with a full stomach, the reward V has close to zero utility, but for a starved animal, it has a considerably higher utility, irrespective of the players being hawk or dove. It follows that the reward v is the actual gain (food or energy gain in animals and money in humans), instead of its utility. This means that the reward is the same for all players in the payoff matrix, but the utility of a player depends on the current state (body size or energy reserve) of the player.

Dynamic utility is useful in characterizing behavioral dynamics because its outcome is analytical. Thus it overcomes the limitations of dynamic programming generating only numerical outputs. The analytical solution of dynamic utility derives from the assumption of the multiplicative growth rate r as a stochastic process. However, in analyses of dynamic utility we cannot apply individual environmental parameters used by dynamic programming (or the equivalent stochastic control).

Our analysis proves that the dynamic optimality (optimization criterion) always depends on current states of animals. This indicates that the mean-variance methods and expected utility theory are static models referring to a decision of an individual at one time point (Mangel and Clark 1988; Houston et al. 2011). Dynamic utility is a new addition to the two existing dynamic optimization methods for analyzing animal behavior: dynamic programming and stochastic control.

Most empirical studies of foraging suggest animals exhibit risk-averse behaviors (Caraco and Chasin 1984; Yoshimura and Shields 1987; Ito et al. 2013). Note that the derived dynamic utility function (logarithm of gains) implies a diminishing return of gains. This means that optimal behavioral decisions are universally risk-averse. However, risk-prone behavior has also been observed in some empirical studies in a variety of animal species, including insects (Moses and Sih 1998), fish (Sih 1994), squirrels (Bowers and Breland 1994), chimpanzees (Gilby and Wrangham 2007) and humans (Codding et al. 2011). The theory of dynamic utility is an important and basic principle of dynamic decision-making and merits further theoretical and empirical contributions.