Cautious stochastic choice, optimal stopping and deliberate randomization

We study cautious stochastic choice (CSC) agents facing optimal timing decisions in a dynamic setting. In an expected utility setting, the optimal strategy is always a threshold strategy—to stop/sell the first time the price process exits an interval. In contrast, we show that in the CSC setting, where the agent has a family of utility functions and is concerned with the worst case certainty equivalent, the optimal strategy may be of non-threshold form and may involve randomization. We provide some carefully constructed examples, including one where we can solve explicitly for the optimal stopping rule and show it is a non-trivial mixture of threshold strategies. Our model is consistent with recent experimental evidence in dynamic setups whereby individuals do not play cut-off or threshold strategies.


Introduction
It is well recognized that individual decision making is not fully captured by expected utility theory and many non-expected utility theories have been developed with the aim of providing a better fit to observed behavior. Many of these alternative theories have been well studied in a static setting, but recently there has been much interest in studying non-expected utility preferences in dynamic settings which describe timing problems arising in real world decisions. Examples of such timing decisions include when to stop gambling in a casino, when to sell a stock, when to exercise an option and when to stop searching and accept a job offer. Theoretical work in this vein include Ebert and Strack (2015) and in experimental settings, Oprea et al. (2009). Our paper considers agents who face optimal timing decisions in a dynamic setting and who exhibit cautious stochastic choice (CSC). The CSC agent is unsure which utility function to use from a family of possibilities and applies caution to choose the worst-case certainty equivalent. In our optimal stopping setup, a CSC agent may have an optimal strategy which is not of threshold form and may involve randomization. We demonstrate this through a series of example models. The dynamic CSC model gives predictions which are consistent with recent experimental evidence in dynamic setups whereby individuals do not play cut-off or threshold strategies (Strack and Viefers 2021;Fischbacher et al. 2017).
In this paper, we build upon the theory of cautious stochastic choice (Cerreia-Vioglio et al. 2015, also Maccheroni 2002 to develop a continuous time optimal stopping model with CSC preferences. Cerreia-Vioglio et al. (2015) (see also Maccheroni 2002) develop a theory of CSC in a static decision making setting. The agent aims to select a best lottery from a given set. Under CSC the agent has a family of possible utility functions in mind. For a given lottery, and for each utility, the agent computes the certainty equivalent. The agent then values the lottery via the worst-case certainty equivalent. Finally the agent chooses the best lottery which maximizes this value. Since CSC does not satisfy the quasi-convexity property, agents may benefit from mixing (see Cerreia-Vioglio et al. 2019).
In this paper we focus on an asset sale problem and consider a continuous time model in which the price process is given by a one-dimensional time-homogeneous diffusion. If the agent were an expected utility maximizer, it is well known that the optimal stopping rule is given by the first exit time of the price process from an interval, ie. a pure threshold strategy (Karni and Safra 1990 in discrete-time). We formulate an optimal stopping problem with CSC as follows. The agent has a family of utility functions and for a given stopping rule (in an appropriate class of admissible strategies), for each utility, computes the certainty equivalent. The worst-case is then taken over utilities. The goal is to find the stopping rule which maximizes the worst-case certainty equivalent value.
Under optimal stopping models with a law invariance property, but not quasiconvexity, it is known that it is sufficient to search over stopping rules of a particular form-those of randomized threshold form (Henderson et al. 2018b). CSC falls under these assumptions. However, this result does not say that once quasi-convexity does not hold, pure thresholds cannot still be optimal. It might still be the case that a pure threshold is preferred to randomization (regardless of how randomization is imple-mented), and we need to study particular models, such as CSC, to explore this further. A contribution of this paper is to construct relevant examples for our CSC setting where randomization is in fact used. We provide both some realistic models and a stylized model.
We first consider two models where the asset price follows exponential Brownian motion and demonstrate that the optimal strategy is not a pure threshold. In the first model, the family of utilities are S-shaped and reference level dependent. There are many such utilities as the agent is unsure of the strength of their loss aversion, risk aversion and risk seeking and the value of their reference level. Our model extends trading models of Kyle et al. (2006), Barberis and Xiong (2012), Henderson (2012), and Ingersoll and Jin (2013) to use the cautious approach, with a worst case over many utilities, and shows that pure price thresholds are no longer optimal sale triggers. An implication of this model is that using a cautious approach with S-shaped reference dependent utilities may lead to non-trivial strategies, akin to the already known results for prospect theory (Henderson et al. 2018a). However, as shown by Ebert and Strack (2015), naive prospect theory agents continue to gamble indefinitely whilst Duraj (2019) (also Huang et al. 2020) demonstrates CSC agents do not suffer from this somewhat extreme behaviour.
Our second model uses a family of concave utility functions and highlights that the CSC approach can lead to non-trivial strategies, even for a set of concave functions. This pair of models show CSC agents do randomize in realistic continuous-time optimal stopping settings. By also considering a stylized but tractable example, we can actually calculate the optimal stopping rule and show that it is a non-trivial mixture of threshold strategies. Furthermore, some of our ideas used in the proofs of calculating the optimal stopping rule may be useful in other settings.
In contrast to the behavior of an EU agent, our CSC agent does not only use pure threshold strategies and instead prefers mixed or randomized strategies. The CSC agent is deliberately randomizing. We will now describe the body of experimental evidence which is consistent with our theoretical model. An important finding in experimental studies of individual decision making is the phenomenon of stochastic or random choice. When subjects are asked to choose from the same set of options many times, they are inconsistent in their choices. Patterns of stochastic choice were first recorded by Tversky (1969) and many studies have replicated, explored and extended his results (see Agranov and Ortoleva 2017 for recent findings and a comprehensive overview, and, amongst others, Dwenger et al. 2018;Hey and Orme 1994;Feldman and Rehbeck 2020;Permana 2020). In particular, recent studies of Ortoleva (2017, 2020) and Dwenger et al. (2018) interpret their experimental results as suggesting the main force is a deliberate desire of participants to randomize. Much of this evidence is gathered in static settings. Recently, researchers have studied dynamic settings which can better reflect the real decision making situations individuals face in economics and finance (eg. Oprea et al. 2009). Strack and Viefers (2021) conduct an experiment in a sophisticated asset selling task.
They present evidence that players do not play cut-off or threshold strategies over gains-they do not behave time-consistently within rounds 75% of the time, and visit the same price level three times on average before stopping at it. In their study of the impact of automatic selling devices on experimental trading behavior, Fischbacher et al. (2017) find that participants tend to set any upper limit further away from the current price than any lower limits and use the upper limit less frequently.
Our CSC model can also be viewed as contributing a new dynamic optimal stopping model to the wider literature on stochastic choice modelling. CSC falls into the class of stochastic models postulating that stochasticity is a deliberate choice of the agent. 1 Deliberate randomization (Machina 1985) emerges in non-EU settings such as prospect theory (see Wakker 1994 in a static setting, and Henderson et al. (2017) and He et al. (2017) in dynamic setups). There are fewer models capturing the phenomenon of stochastic choice in the dynamic setting of a stopping problem. Strack and Viefers (2021) combine random utility with regret preferences in a stopping context. Henderson et al. (2017) and He et al. (2017) show randomized strategies are optimal in a stopping model with prospect theory preferences. The largest class of stopping models are the bounded rationality Drift Diffusion models (DDM) of which the work of Fudenberg et al. (2018) is a recent example.
The paper is organised as follows. Section 2 presents the optimal stopping modelsboth the classical EU model and our CSC optimal stopping model. Section 3 describes and solves two models with S-shaped reference dependent or concave families of utilities. A stylized example is given in Sect. 4. We defer supplementary material and proofs to the Appendices. Appendix A outlines the CSC model in its original static setup (Cerreia-Vioglio et al. 2015 and demonstrates mixing may be beneficial. Further results and proofs on optimal stopping under EU are in Appendix B. Proofs for the stylized and generalized example are in Appendices C and D. Appendix E provides some insights on discounting in the CSC optimal stopping model.

The optimal stopping models
Optimal stopping theory has been influential in several areas of economics. In finance, the sale and purchase of stocks and the pricing of American options are classical stopping problems (McKean 1965;Merton 1973). Following McDonald and Siegel (1986) the optimal timing of irreversible investments and market entry decisions are modelled as stopping problems (Dixit and Pindyck 1994). In labour economics, Stigler (1962) and McCall (1970) established job search as a stopping task.
We first establish notation and review the theory for the optimal liquidation of an asset in the classical setting of a maximizer of expected utility. For J an interval subset of R, let F J ↑ be the set of increasing functions F J is a stochastic process and S is a class of stopping times then let Q Z (S) = {L(Z τ ); τ ∈ S}. Let δ z be the point mass at z.
We work on a filtered probability space ( , F, F = {F t } t≥0 , P). Let Y = (Y t ) t≥0 be a (F, P)-stochastic process on this probability space. Let I Y be the state space of Y and letĪ Y be the closure of I Y . We suppose that Y is a regular, time-homogeneous diffusion with initial value Y 0 = y which lies in the interior of I Y . Further we suppose that lim t↑∞ Y t exists. A sufficient condition for this is Assumption 1 below.
Throughout, Y may represent the price process of a stock in the market, the accumulation of returns from an investment project, or the accumulation of an agent's wealth when gambling or trading, to give a few possibilities. We give further details on some of these interpretations at the close of Sect. 3.1.

Optimal stopping under expected utility
Let U be an increasing utility function, U ∈ FĪ Y ↑ . For a maximizer of expected utility the objective is to find the certainty equivalent (1) over a suitable class S of stopping times. We introduce three classes of stopping times • T , the class of all stopping times; • T T , the class of (pure) threshold stopping times; • T R , the class of randomized threshold stopping times.
Note that T T ⊂ T R ⊂ T . The set of pure threshold stopping times includes stopping immediately and can be written as where τ Y β,γ = inf u≥0 {u : Y u / ∈ (β, γ )} and the union is taken over (β, γ ) in an In order to be able to define the set of randomized threshold stopping times T R we suppose that F 0 is rich enough as to support any probability measure η on D, and that the dynamics of Y are independent of a random variable with law η. Then we define a randomized stopping time τ η by is F 0 measurable and has law η} Often, the best way to solve (1) is via a change of scale. Let s be a strictly increasing function such that X = s(Y ) is a local martingale. 2 Then U (Y τ ) = g(X τ ) where g = U • s −1 and (1) can be rewritten as where x = s(y). Since the scale function s is fixed, in finding the optimal stopping rule it is sufficient to consider sup τ ∈S g −1 (E x [g(X τ )]). We do not make a concavity assumption on U . Monotonicity is preserved under the transformation U → g, but in general concavity is not. Indeed, if g is concave then typically stopping immediately (τ = 0) is optimal.
The state space of X is I X = s(I Y ). ThenĪ X = s(Ī Y ). If I X is not bounded below then for any level γ in the interior of I X with γ ≥ x the first hitting time H X γ = inf u≥0 {u : X u = γ } is finite almost surely and C EU (T ) = sup γ ∈I X U −1 g(γ ) = sup{γ : γ ∈ I Y } = max{γ : γ ∈Ī Y }. We want to exclude this degenerate case. Hence we make the following assumption: is bounded below. Then, without loss of generality we may assume that the lower limit of I X is zero. Any accessible boundary point for X is absorbing.
The upper limit of I X may be finite or infinite. Note that since X is a non-negative local martingale lim t↑∞ X t exists and hence lim t↑∞ Y t exists. We do not exclude τ such that P(τ = ∞) > 0 and on the set τ = ∞ we define X τ = lim t↑∞ X t . This is why we want to considerĪ X as well as I X . Then T is the set of all stopping times, and not just finite stopping times.

Example 1 Suppose Y is geometric Brownian motion
. Provided ψ = 0 we have s(z) = sgn(ψ)z ψ . (If ψ = 0 then s(z) = ln z is the scale function.) If ψ ≤ 0 then s(0) = −∞. This is equivalent to 2μ ≥ σ 2 , in which case Y hits arbitrarily high price levels with probability one and the optimal stopping problem is degenerate. If ψ > 0 then s(γ ) . Hence T T has the alternative representation for an appropriate set D X . The right space to choose is Then D in (2) and (3) is given by T R can also be rewritten as {u : X u / ∈ ( β , γ )where = ( β , γ ) has law η.} Note that the certainty equivalent depends only on the law of X τ . The following result is classical. (In discrete time, see Karni and Safra (1990) and Strack and Viefers (2021), and in mathematical finance, see Dayanik and Karatzas (2003). For a textbook treatment, see Chapter 4, Peskir and Shiryaev (2006).) Corollary 1 In trying to find the optimal stopping rule in the classical (single utility) case it is sufficient to restrict attention to pure threshold strategies of the form τ = τ Y a,b .
One approach to proving Proposition 1 is to show first that the problem can be recast as one involving the process in natural scale X , and then that the problem of maximizing over stopping times can be recast as a maximization over distributions. In particular, we see from (1) or (4) that the certainty equivalent depends on τ only through the law of the stopped process. Hence, instead of searching over stopping rules we can search over laws of the stopped process instead. In terms of maximizing expected utility of the stopped process, it can be shown that the optimal law places mass on at most two points. Such a distribution can be achieved using a pure threshold rule. This explains why C EU (T T ) = C EU (T ) and the more general result of the first part of the Proposition follows since clearly C EU (T T ) ≤ C EU (T R ) ≤ C EU (T ).

Optimal stopping under cautious stochastic choice
Our goal in this section is to develop an optimal stopping model with CSC. Let Y be a time-homogeneous diffusion with state space I Y . Let W Y ⊆ FĪ Y ↑ be a set of increasing utility functions. The goal is to find sup As in the classical, single-utility setting, it is often convenient to work with the process X in natural scale rather than Y . We set For a fixed stopping time τ and a fixed utility u ∈ W we definethe certainty equivalent Once we have minimized over utilities the value function for a single stopping time Under CSC the optimal stopping problem is to find V (S) = sup τ ∈S V τ where S is a set of stopping times. Since V τ depends on the stopping time only through the law of the stopped process we have and τ * ∈ argmax τ ∈S V τ . In particular, we want to consider S = T , S = T R and S = T T . Note that the agent solves the problem based upon what is optimal given current information, and under the assumption that they commit to this optimal strategy. This is a natural starting point and was also the approach taken by Henderson et al. (2018a), where a stopping problem for a prospect theory agent who can pre-commit was solved.
We want to solve (6). Henderson et al. (2018b) study a class of stopping problems where the value associated to a stopping rule depends upon the law of the stopped process. Their result states that under a law invariance property, we have V (T R ) = V (T ). The law invariance property holds for our CSC setting. Hence we know that it is sufficient to look for optimal strategies of randomized threshold form (we do not need to look beyond the class S = T R ). However, we only know that V (T T ) ≤ V (T R ) and so we can only say that a pure threshold strategy may not be optimal.
Our contribution here is to show that we can indeed find models where pure thresholds are not optimal, and we demonstrate for those examples that the agent can do better by randomization. Unlike in the classical case (see Proposition 1(1)), we may

Two realistic models
In this section we develop two realistic models which are based on either S-shaped reference dependent utilities or on concave utilities.

A model with S-shaped reference dependent utilities
The first model is based on the S-shaped reference-dependent preferences found in prospect theory (Tversky and Kahneman 1992). Utility is defined over gains and losses relative to a reference point, rather than over final wealth, an idea proposed by Markowitz (1952). The utility function exhibits concavity in the domain of gains and convexity in the domain of losses, and the function is steeper for losses than for gains, a feature known as loss aversion. In our CSC model, there are many such utilities as the agent is unsure of the strength of their loss aversion, risk aversion and risk seeking and the value of their reference level.
In recent years, there have been a number of optimal stopping models for asset sales and trading which utilise the S-shaped utility, beginning with Kyle et al. (2006) and continued by Barberis and Xiong (2012), Henderson (2012), and Ingersoll and Jin (2013). These papers employ a single S-shaped utility to represent investor preferences and ask questions such as: when does an investor sell an asset? How do risk aversion, risk seeking and loss aversion impact on their sale decision? These models also seek to provide an explanation for the disposition effect observed in empirical trading data (Odean 1998) whereby investors have a higher propensity to take gains over losses. Each of the models in Kyle et al. (2006), Barberis and Xiong (2012), Henderson (2012), and Ingersoll and Jin (2013) leads to the derivation of explicit price thresholds at which an investor would sell the asset. That is, for preferences which can be represented by a single S-shaped utility, the optimal strategy is of pure threshold form.
Our model extends those of Kyle et al. (2006), Barberis and Xiong (2012), Henderson (2012), and Ingersoll and Jin (2013) to use the cautious approach, with a worst case over many utilities, and shows that pure thresholds are no longer optimal.
Suppose Y follows geometric Brownian motion and represents the coefficient of risk aversion/risk seeking, R i > 0 is the reference level and κ i ≥ 1 is the loss aversion parameter, introducing an asymmetry. Such piecewise power functions are the specification proposed by Tversky and Kahneman (1992). Our problem is to find the CSC value corresponding to the asset sale problem: If N = 1, we have a single S-shaped utility and this recovers (special cases of) the stock trading models of Barberis and Xiong (2012), Henderson (2012), Ingersoll and Jin (2013). For N = 1, under the utility specification in (7), and with Y following geometric Brownian motion, the optimal pure threshold strategy may be derived explicitly.
and set W X = {g i : 1 ≤ i ≤ N }. By an immediate extension of the arguments leading to (4) we have and hence in the search for the optimal stopping rule it is sufficient to consider the problem in natural scale for X and W X . (7) and in natural scale W X = {g i : 1 ≤ i ≤ N } with g i given in (9). Parameters used are ψ = 1/2 for the price process, , 2)} for the utility functions where for each i, 1 − δ i ∈ (0, 1) represents the coefficient of risk aversion/risk seeking, R i > 0 is the reference level and κ i ≥ 1 is the loss aversion parameter Families of functions W Y and W X are given in Fig. 1 for the parameters: Here we are representing a situation where an agent is unsure of their level of risk aversion/risk seeking parameters and an appropriate reference level. They fix their level of loss aversion at a value of 2, which is around the level estimated in Tversky and Kahneman (1992). Note that certainty equivalents are invariant under affine transformations of the These linear transformations have been designed so thatg i (0) = 0 andg i (x) = 1 for all i. Then, the functionsg i are of comparable sizes over the region [0,x] and we expect that over the relevant range g −1 i does not depend greatly on i. The transformed family of functions W X are plotted in Fig. 2.
Consider first the certainty equivalent from using a pure threshold strategy τ X 0,γ = inf{t : X t / ∈ (0, γ )} for γ > x = 0.2. The certainty equivalents associated with the utilities (u i ) i=1,2,3 as a function of the upper threshold are plotted in Fig. 3. We see from the figure that the best pure threshold strategy uses an upper threshold of approximately 2.75 and yields a CSC certainty equivalent of 0.7263. Note also that we recover the best pure threshold for each separate utility (u i ) i=1,2,3 -derived in closed , 2)} for the utility functions where for each i, 1 − δ i ∈ (0, 1) represents the coefficient of risk aversion/risk seeking, R i > 0 is the reference level and κ i ≥ 1 is the loss aversion parameter Fig. 3 The certainty equivalent value under a pure threshold strategy τ X 0,γ = inf{t : X t / ∈ (0, γ )} as a function of upper threshold γ for γ > X 0 = x = 0.2. The family of S-shaped utility functions u i as defined in (7) are used. The best pure threshold strategy uses an upper threshold of about 2.75 and gives a CSC certainty equivalent of 0.7263, as marked on the figure. Parameters used are ψ = 1/2 for the price process, N = 3 and {(δ i , R i , κ i )} = {(0.15, 1, 2), (0.1, 2, 2), (0.08, 3, 2)} for the utility functions where for each i, 1 − δ i ∈ (0, 1) represents the coefficient of risk aversion/risk seeking, R i > 0 is the reference level and κ i ≥ 1 is the loss aversion parameter. Note also that the best pure threshold for each separate utility (u i ) i=1,2,3 can be seen to be very close to the reference levels R i ; i = 1, 2, 3. These can be derived in closed form by the method in Henderson (2012) Fig. 4 CSC value using the optimal mixture for a given pair of upper threshold levels where X 0 = x = 0.2. The family of S-shaped utility functions u i as defined in (7) are used. The best pair of upper thresholds is 1.1, 3.1 giving a CSC certainty equivalent of 0.8368. Parameters used are ψ = 1/2 for the price process, N = 3 and {(δ i , R i , κ i )} = {(0.15, 1, 2), (0.1, 2, 2), (0.08, 3, 2)} for the utility functions where for each i, 1 − δ i ∈ (0, 1) represents the coefficient of risk aversion/risk seeking, R i > 0 is the reference level and κ i ≥ 1 is the loss aversion parameter form by the method in Henderson (2012). These values can be seen on the figure to be very close to the reference levels R i ; i = 1, 2, 3. Now suppose we are allowed to search for the best mixed threshold strategy based on two upper thresholds (with the lower threshold set to zero). Note, we are not claiming to find the optimal strategy here, but are simply demonstrating that we can do better than pure thresholds. Figure 4 shows the highest CSC value (as the mixture parameter varies) for a given pair of upper thresholds. Figure 5 shows how much probability mass is assigned to the smaller of the two upper thresholds. The best strategy is to assign probability mass 0.75, 0.25 to thresholds 1.1, 3.1 respectively, giving a CSC value of 0.8368. From Fig. 5 we see that for other pairs of thresholds, it is optimal to place all the weight on a single threshold, but for the optimal pair of thresholds the optimal strategy is a proper mixture. It follows that the best randomized strategy is strictly better than any pure threshold strategy (since we can demonstrate even a mixture of a pair of upper thresholds does better).
Let us now consider a mixture which involves at most three upper thresholds. We find that in this restricted class, the optimal randomized strategy assigns probability mass 0.76, 0.11, 0.13 to thresholds 1.1, 2.1, 3.1 respectively and gives a CSC value When both upper thresholds are large, it is optimal to not use a mixture, and only stop at the smaller of the upper thresholds; when both upper thresholds are small, it is again optimal not to use a mixture, and only stop at the larger of the upper thresholds. When the smaller upper threshold is in the range 1-3, it is optimal to use a mixed strategy, with most of the mixture distribution on the smaller of the two upper thresholds. Again, X 0 = x = 0.2. The optimal mixture is to place probability mass 0.75 on threshold 1.1 and weight 0.25 on threshold 3.1. The family of S-shaped utility functions u i as defined in (7) are used. Parameters used are ψ = 1/2 for the price process, N = 3 and {(δ i , R i , κ i )} = {(0.15, 1, 2), (0.1, 2, 2), (0.08, 3, 2)} for the utility functions where for each i, 1 − δ i ∈ (0, 1) represents the coefficient of risk aversion/risk seeking, R i > 0 is the reference level and κ i ≥ 1 is the loss aversion parameter of 0.8425. Again, we see an improvement as we allow for mixtures over a larger number of thresholds. However, the benefit from adding more upper thresholds is diminishing, and the improvement in the CSC value from allowing mixed strategies which randomize over 4 upper thresholds is negligible. The results of randomization among upper thresholds for the family of S-shaped utility functions (in Fig. 1) are summarized in Table 1.
Note the model of this section could be adapted for option payoffs with applications to (financial) American options (McKean 1965, Merton (1973) and to real options (Dixit and Pindyck 1994;McDonald and Siegel 1986). In a financial options setting, a fixed parameter K represents the strike price of the option. In a real options setting, Y represents the accumulation of returns from an investment project and a fixed parameter K is the fixed cost of investment. The CSC value in (8) would become: In each of these applications, our results imply that option holders taking a cautious approach (as defined by CSC) may use randomized strategies when exercising their options, rather than a simple "exercise when the stock price breaches a particular threshold" approach. Similarly, in corporate finance applications to real options, we may see more complex investment timing behaviour than that predicted by standard risk neutral models.

A model based on concave utilities
In the previous model we used a family of S-shaped reference dependent utility functions. In this section we build a model using concave utility functions. We build our example from the sum of a power utility function and an exponential utility. 3 As in Sect. 3.1, suppose Y is geometric Brownian motion with scale function s(z) = z ψ for ψ ∈ (0, 1). For γ, κ, φ non-negative constants, define f = f γ,κ,φ : R + → R + by Then f (0) = 0 and provided φ < 1, f is concave. Set g(w) = f • s −1 so that g γ,κ,φ (w) = f γ,κ,φ (w 1/ψ ). Then Provided ψ < φ we have that g is convex for small values of w and concave for larger values. We will thus assume ψ < φ < 1.
We see from the figure that the best pure threshold strategy uses an upper threshold of approximately 22.68 and yields a CSC certainty equivalent of 0.6215. If we now search for the best randomization over two upper thresholds we find that the best strategy is to assign probability mass 0.56, 0.44 to thresholds 3.84, 187.42 respectively and that this gives a CSC value of 0.6373. Again the best randomized strategy is strictly better than any pure threshold strategy. However, allowing randomization over three upper thresholds brings only negligible further benefits. The results of randomization among upper thresholds for the family of concave utilities (in Fig. 6) are summarized in Table 2.

A stylized but explicitly solved example
Our goal in this section is to give an example for which we can prove that the optimal stopping rule is not a pure threshold strategy. Instead there is an optimal stopping rule which is a non-trivial mixture of threshold stopping rules. The example is highly stylized, and deliberately simple, and this allows us to give a full and complete solution, ie. we are able to solve for the optimal mixed threshold rule. Crucially, the characteristic features are shared with some realistic, non-stylized examples, in particular the Sshaped utility model of Sect. 3.1.
We work with a process Y which is already in natural scale, and a family of payoff functions {u m } m∈M . The process Y is assumed to be bounded below (without loss of generality by zero) and unbounded above, to be a local martingale and to have initial value y > 0. Then Y is a supermartingale. The canonical example is if Y is a Brownian motion started at y and with absorption at zero, Y t = W t∧H W 0 where H W 0 = inf u≥0 {u : W u = 0}, the first hitting time of zero by W . Alternatively, we may consider Y to be geometric Brownian motion with zero drift. The goal in this section is to give an example for which Hence, there is no pure threshold strategy which is optimal within class of all stopping rules.  Figure 8 illustrates the family of utilities described here. Note that the results generalize to utility functions which replace u m (w) = αm with u m (w) = J (m) for w ≥ m in (16), where J is a strictly increasing function with J (m) > km. We will consider this more general case in Sect. 4.4.

Pure threshold strategies
Our first result is that in the stylized example there is no pure threshold strategy which outperforms the trivial strategy of stopping immediately. Note that since Y is a supermartingale and since u m (z) ≤ αz, we have for all τ ∈ T Recall τ β,γ = inf{s : Y s / ∈ (β, γ )}. For m ∈ M and 0 ≤ β ≤ y ≤ γ let G m β,γ be the expected utility associated with the stopping time τ β,γ and the utility function u m and let C m β,γ be the certainty equivalent: Note that for each m ∈ M, G m β,γ and C m β,γ are non-increasing in γ for γ ≥ m * . Also, for each m ∈ M, and γ ≤ m * , G m β,γ and C m β,γ are non-increasing in β for 0 ≤ β ≤ y. The following theorem follows from the fact that in our stylized example V (T T ) = y. This result is proved in Appendix C. From the perspective of the worst agent, any pure threshold strategy can only generate at best a certainty equivalent which is the same as the certainty equivalent from selling the asset immediately.
Theorem 1 In our stylized example no pure threshold strategies outperforms stopping immediately.

Improvement with randomization between two upper thresholds
The goal of this section is to show that there are mixtures of threshold strategies which outperform the best pure threshold strategies. In addition we will develop some intuition which we can use to motivate the derivation of the optimal randomized strategy.
The remarks after (18) suggest that it is not sensible to use upper thresholds above m * , and that it is sufficient to only consider lower thresholds which are set to zero. (This result is proved in Lemma 4 in Appendix C.) In this section we consider using stopping rules which are a mixture of τ 0,γ and τ 0, for m * ≤ γ < ≤ m * . If τ is this mixed stopping rule and < m * then u m * (Y τ ) = kY τ and the certainty equivalent is equal to y. So, if we hope to outperform pure threshold rules we must take = m * .

Hence, V (T 0 2 ) > V (T T ) and a fortiori V (T ) > V (T T ).
Theorem 2 In our stylized example the best strategy outperforms the best pure threshold strategy.
In addition to the above result, we can learn something from our analysis about the optimal mixture of thresholds. First we expect that there must be a positive probability that we take an upper threshold of m * , else the certainty equivalent associated with u m * is y. Second, by considering the problem for finite mixtures of upper thresholds, we expect that the certainty equivalent associated with u m should be constant over m.

The optimal solution
Let T 0 R be the subset of T R such that the lower threshold in the randomization mixture is always at zero, and the upper threshold is in M. Then . Thus, in the stylized example and when considering optimal mixtures of threshold strategies it is sufficient to restrict attention to mixtures in which the lower threshold is always zero and the upper threshold is contained in M. We can calculate the optimal mixed threshold rule. The proof of the proposition and theorem are given in Appendix C.

Theorem 3 Supposeη ∈ P(M) is a mixture of a point mass at m * of size θ * and an absolutely continuous measure
Then an optimal strategy is to take a randomized strategy with mixture distributionη whereη({0}, dγ ) =η(dγ ). The corresponding value function is It is worth highlighting here that the optimal stopping rule is not unique and although in Theorem 3 we find the optimal mixed threshold rule, there are other stopping times which are also optimal. In other words, suppose τ ∈ T R is a randomized threshold rule (which is not a pure threshold rule): then there exist other stopping times τ ∈ T for which L(X τ ) = L(X τ ) or equivalently L(Y τ ) = L(Y τ ).

A generalized example
Fix L > 0 and suppose R ∈ (L, ∞). Let J : [L, R] → R be a continuously differentiable function with J (z) > z and such that sup z∈ [L,R] K : [L, R] → R be the largest increasing function such that K ≤ J . Suppose J (L) = K (L) ≥ R and that the set {x : K (x) = J (x)} is the union of a finite set of intervals. We write {x : Let Y be Brownian motion started at y ∈ (0, L κ ), and absorbed at 0. Consider the problem of finding Note that for any α ∈ [L, R] and any stopping time x over this range. Hence the u −1 may be omitted in the definition of the Cautious Stochastic Utility in this example.

Theorem 4 Let θ be given by
and let ρ : [L, R] → R + be given by Letη ∈ P([L, R]) be the probability measure with density ρ on [L, R] and a point mass of size θ at R. Then an optimal strategy is to take a randomized threshold strategy with mixture distributionη whereη ({0}, dγ ) We prove the theorem in Appendix D. Note that if J is not strictly increasing then we have that K is constant over intervals andμ does not charge such intervals. The reason for this is that the corresponding u α strictly dominate other u β and are never the worst case utilities. For this reason they are not relevant in the CSC formulation. In the proof in the appendix, the utility functions are divided into two classes. For elements of the first class, the certainty equivalent is never smallest, and these utilities do not affect the CSC value. However, all elements of the second class are important, and we find the optimal strategy by making sure that the certainty equivalent is constant across utilities in this class, at least for the optimal mixed threshold stopping rule.

Further methodological remarks
At an abstract level our problem is to find sup x∈A inf y∈B D(x, y) where the spaces A and B may be different in character. In our setting A is a set of stopping times, B is a set of utilities, and D involves an expectation, but more generally games of this form arise in many applications where there are two entities with competing objectives. These include Dynkin games (Dynkin 1969, where both A and B are sets of stopping times), and robust option pricing [Hobson 2011;Touzi 2014 and related ideas of Hansen and Sargent (2008)]. Dynkin games have appeared in various settings in economics, including wars of attrition (Hendricks et al. 1988), pre-emption games (Fudenberg and Tirole 1991), duels [see survey by Radzik and Raghavan (1994)], and pricing of options and callable convertible bonds (Grenadier 1996;Kifer 2000;Sirbu and Shreve 2006;Bielecki et al. 2008).
Define v = sup x∈A inf y∈B D(x, y). In some circumstances one way in which v may be determined is to find a saddlepoint, i.e. to find x * ∈ A and y * ∈ B such that D( Hence (x * , y * ) is optimal and v = D(x * , y * ).
However, in our case there is no saddlepoint and sup x∈A inf y∈B D(x, y) < inf y∈B sup x∈A D(x, y) and need a different approach. The approach which we outline here may have wider applicability.
Suppose we can find x * ∈ A andB ⊆ B such that The first requirement says that for the candidate optimiser x * , D(x * , y) is smaller onB than offB. The second requirement says that no choice x leads to a uniformly higher value of D onB than any other choice. Note that, the first requirement is easier to satisfy ifB is small, whereas the second is easier to satisfy ifB is large. Then, taking both y,ỹ ∈B in (i) we conclude that D(x * , y) = D(x * ,ỹ) and hence D(x * , ·) is constant onB. Then for any non-empty subsetB ofB, Meanwhile, with x replaced by x * in (ii), for each x ∈ A there exists a non-empty subsetB x,x * ofB such that D(x, y) < D(x * , y) onB x,x * . Then, for each x ∈ A, Hence sup x∈A inf y∈B D(x, y) = inf y∈B D(x * , y) = D(x * , y * ) where y * is any element ofB and we have identified both the problem value and an optimiser.

Concluding remarks
This paper considers agents who exhibit cautious stochastic choice (CSC) and who face optimal timing or stopping decisions in a dynamic setting. We build on the seminal work on CSC in a static setting by Cerreia-Vioglio et al. (2015 and provide a continuous-time optimal stopping model under CSC. In our dynamic setup, the value associated with a stopping rule is not quasi-convex and hence we cannot necessarily expect there to be a pure threshold rule which is optimal. Despite this observation, it is quite a challenge to find examples where it can be clearly demonstrated that the optimal stopping rule is a non-trivial mixture of threshold strategies. This paper has taken up this challenge and provides first, realistic models under reference-dependent or concave families of utility functions under which pure threshold strategies are not optimal, and second, a stylized, tractable example whereby the optimal stopping rule and value can be constructed explicitly. Our predictions are in line with recent experimental evidence in dynamic settings whereby individuals do not play cut-off or threshold strategies (Strack and Viefers 2021;Fischbacher et al. 2017).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. equivalent of q with respect to u is defined as Under the CSC paradigm of Cerreia-Vioglio et al. (2015 [see also earlier work of Maccheroni (2002)] the agent chooses a best lottery from Q by displaying cautious behavior: the evaluation for any given lottery q ∈ Q is determined by V q = min u∈W C u q ; the optimal strategy is to choose the lotteryq ∈ Q which maximizes V q . This involves both minimization and maximization steps. Note that typically I , Q and W are taken to be compact so that the optimizers exist.
Now we want to allow the agent to mix over lotteries. Let co(Q) denote the convex hull of Q. Then ρ = ρ λ ∈ co(A), represents a compound lottery obtained through a randomization λ over the lotteries in Q. If λ is a discrete distribution over q ∈ Q we have ρ λ = λ i q i ; more generally, ρ λ = Q λ(dq)q is a measure on I given by ρ λ (dx) = Q λ(dq)q(dx). For a lottery ρ λ ∈ co(Q) we can define the expected utility of u with respect to ρ λ by E ρ λ (u) = I u(x)ρ λ (dx) = Q λ(dq)E q (u), and then the certainty equivalent of ρ λ with respect to u is C u Then an optimal randomized lottery is given by In this static setting, Cerreia-Vioglio et al. (2019) show that mixing over two lotteries may improve the worst case certainty equivalent.
Suppose Q = {p, q} and W = {u, v}. If we have then, a linear combination of p and q is better than any one of them in terms of smallest certainty equivalent. To see this, note that for λ ∈ (0, 1) and and it follows that It follows that in a static setting it can be optimal to take a mixed strategy.

B Further results on optimal stopping in the classical case
Under Assumption 1 the process X is a non-negative local martingale, and hence a supermartingale. Further, for any stopping time E x [X τ ] ≤ x. If I X is bounded above then X is a martingale and E x [X τ ] = x, but for many examples I X = (0, ∞) or [0, ∞) and then there exist τ for which E x [X τ ] < x. Recall Q X (S) is the set of possible laws of the stopped X -process, over stopping times in S.

If I X is bounded then Q
Ī X zν(dz) ≤ x} follows from the remarks before the statement of the Lemma. The fact that we have equality follows from the fact that by Skorokhod's Embedding Theorem any ν ∈ P([0, ∞)) with zν(dz) ≤ x can be obtained as the law of X τ for a stopping time τ . See Pedersen and Pekir (2001) or Cox and Hobson (2004) The case of bounded X has a similar proof.
2. This is immediate from the definition of pure threshold rules.
Lemma 1 characterizes the sets Q X (S) for various sets S. However, the sets T , T T and T R do not depend on whether we consider stopping times for the process X or Y . Hence {ν : ν ∈ Q Y (S)} = {η s : η ∈ Q X (S)} where, by definition η s(A) = η(s(A)).

Proof of Proposition 1
This proposition is standard but we provide a short proof which will have parallels to our method in the CSC case. The results will follow if we can show Otherwise, there exist a n → a x and b n → b x such that g(a n ) → g cv (a x ) and

C Proofs and auxilliary results for the stylized example
Proof of Theorem 1 The result follows immediately from the following lemma.

Lemma 2 For all
Proof If γ ∈ [y, m * ) then C m β,γ = y for all m ∈ M. If γ ≥ m * then using the fact that C m β,γ is increasing in m for m ≤ γ and αm * = km * , Finally, if γ ∈ [m * , m * ) then inf m∈M C m β,γ ≤ C m * β,γ = y. The first statement of the lemma follows from consideration of the three possible cases. The second statement follows from the first, given that for all m, C m β,y = y.
Next we record some useful properties about G m β,γ C m β,γ which follow immediately from the definitions in (18).
. Let η be a probability measure on [0, y) × [y, ∞]. We can define a randomized stopping time τ = τ η by generating a random variable = ( β , γ ) with law η on [0, y)×[y, ∞] and setting τ = τ β , γ . Then we define G m η to be the expected utility from using the randomized stopping rule τ η : Finally we set C m η to be the certainty equivalent: M(γ ) . Then for any η ∈ P([0, y) × [y, ∞]), writingη = P(η), Corollary 2 shows that V (T 0 R ) = V (T R ) and it is sufficient to only consider threshold strategies in the mixture with lower bound at 0 and upper bound in M. The fact that V (T R ) = V (T ) follows from Henderson et al. (2018b) and Q(T R ) = Q(T ).
Our calculation of the optimal strategy in the CSC setting is based on the following general proposition. Let Z be a set and let N be a measurable space. Let D : Z ×N → R be a map and set D * (z) = inf n∈N D(z, n) and D * = D * (Z) = sup z∈Z D * (z).
Proposition 3 Suppose there exist Z 0 ⊆ Z, N 0 ⊆ N , z * ∈ Z 0 , ν ∈ P(N 0 ), a family (h n ) n∈N 0 of strictly increasing functions h n : R → R and constantsD,Ĥ such that In our interpretation we take Z to be either the space of stopping rules or the space of attainable laws or the set of randomizations of the levels of lower and upper thresholds. (Since our problem is law invariant, the final result will be equivalent.) Z 0 is a space of relevant stopping rules or attainable laws or randomizations, for example the set of randomized threshold rules for which the upper barrier lies in some interval. N is a parameterization of the space of utility functions and N 0 is a set of relevant utility functions. We may have N 0 = N if there are utility functions for which the certainty equivalent is never the lowest over the family of utility functions. See Sect. 4.4 for an example. Then D(z, n) is the certainty equivalent using utility function u n and stopping rule z; D * (z) is the CSC value of the stopping rule z.
The first idea behind the proof is that we expect the certainty equivalent value of the optimal stopping rule to be constant across the set of (relevant) utility functions. If not, we might expect to be able to improve the certainty equivalent value under the worst utility, at the expense of the certainty equivalent values of those utilities which have a higher certainty equivalent value. This would raise the CSC value. Hence we expect D(z * , n) is constant on N 0 for the optimal choice z * .
The second idea is that we want there to be only one (randomized threshold) stopping rule for which the certainty equivalent is constant (across all relevant utilities). This possibility is precluded by a requirement that no stopping rule can achieve a certainty equivalent value which exceeds that of another relevant stopping rule, uniformly across all relevant utilities.
Hence, for any z ∈ Z, w ∈ Z 0 there exists a non-empty set N z,w ⊆ N 0 such that

Proof of Theorem 3
The idea is to apply Proposition 3. To this end take Z 0 = Z = P(M), N 0 = N = M and set Note that for χ ∈ P(M), C m Then by Proposition 3, if we can findζ ∈ P(M) such that C m ζ does not depend on m and ν such that M f (γ , m)ν(dm) does not depend on γ thenζ characterizes the optimal mixture of thresholds. The required conditions follow from the next two lemmas.

Lemma 5 Forη as in the statement of Theorem
Proof It follows from the definition of C * and θ * that Lemma 6 Let λ = α k > 1. Let ν be a mixture of an atom of size φ = (λ λ λ−1 − λ + 1) −1 at m * and an absolutely continuous measure ζ on M with density Dm .
The two square brackets in this last expression are zero by the choice of D and β. Then Z is the set of candidate randomizations, and Z 0 is a set of relevant randomizations which are not dominated by some other randomization. N is a parameterization of the utility functions, and N 0 is a set of utility functions such that no member dominates any other element of N .
Recall the definitions of θ , ρ,η andη from the theorem. By the choice of θ ,η is a probability measure on [L, R]. Define : [L, R] → R by Then is differentiable and from the definition of ρ in (22) For ζ ∈ P([0, y) × [y, ∞]) and α ∈ N define Then, forη ∈ Z as in the statement of the Theorem, where we use the first inequality in (A-3) to show that the integrand in the penultimate line is zero. Then with equality if α ∈ N 0 . It remains to show that there exists a measure ν with support in [L, R] andĤ such that N 0 ν(dα)D(ζ, α) =Ĥ for ζ ∈ Z 0 , and N 0 ν(dα)D(ζ, α) ≤Ĥ for general ζ ∈ Z. (We take h α (d) = d for all α ∈ N .) Recall that {z : Let ν be the measure on {z : K (z) = J (z)} such that ν has atoms of size φ i at i for i = 1, 2, . . . , N , together with a density ζ on ∪ N i=1 ( i , r i ) given by Here φ 1 = 1 and ζ 1 = φ 1 J (L) L(J (L)−L) , and then, proceeding inductively, for 1 ≤ i ≤ N −1, For anyζ with support in [L, R] we can defineζ = p −1 (ζ ). Then First we show that (w) := α≤w ν(dα) K (α) w + α>w ν(dα) is constant for w ∈ N 0 . For w ∈ ( i , r i ) We get that is constant on (  To prove that is constant on N 0 it remains only to show that (r i ) = ( i+1 ). We have

E Optimal stopping and discounting
In Sect. 2.2, our discussion of the optimal stopping problem, i.e. of finding the certainty equivalent (see (5) and CSC value V (S) = sup τ ∈S inf u∈W Y C u τ , for Y a time-homogeneous diffusion process and u an increasing function, makes the implicit assumption that the agent does not incorporate discounting into their preferences. Since our main goal is to demonstrate that deliberate randomisation is an essential and endogenous feature of optimal stopping under CSC, a major justification for abstracting away from problems with discounting is pedagogic and expositional.
In this section we briefly discuss how discounting might be incorporated. Note that in some applications, e.g. casino gambling and decisions over when to cease gambling and to leave the casino, the time-periods involved are negligible and it is quite reasonable for discounting to be ignored.

E.1 Utility of the discounted payoff
One potential approach to the introduction of discounting is to modify (A-6) so that the agent calculates the certainty equivalent of the expected utility of the discounted stopped value, i.e.
where β ≥ 0 is a discount parameter. In general, even if Y is a homogeneous diffusion, (e −βt Y t ) t≥0 is not, and the methods of this paper do not apply. (In our setting of time-homogeneous Markov processes finding the optimal stopping rule is already challenging, outwith this setting the problem becomes essentially impossible to solve except in degenerate cases.) However, in the canonical example of exponential Brownian motion the discounted process is again an exponential Brownian motion, and therefore a time-homogeneous process. Hence, with this set-up, the problem with discounting reduces immediately to the problem without discounting we have analysed in this paper.

E.2 Discounting of the utility of the payoff
An alternative approach is to modify (A-6) so that the agent calculates the certainty equivalent of the expected discounted utility of the stopped value, i.e.
This approach, whilst attractive in general, brings conceptual issues when the utility function can take negative values. The key point is that in this formulation where u + is given by u + (x) = max{0, u(x)}. To see this, note that if τ is any stopping time, then if σ = σ (τ ) is the stopping time given by σ = τ on {ω : u(Y τ (ω) (ω)) ≥ 0} and σ (τ ) = ∞ otherwise then σ is a stopping time and e −βτ u(Y τ ) ≤ e −βσ (τ ) u(Y σ (τ ) ) = e −βσ (τ ) u + (Y σ (τ ) ) = e −βτ u + (Y τ ). (A-7) Hence, provided the problem is perpetual (and if not then if the stopping horizon is large, never stopping can be approximated by stopping at a very large time with similar results), and since E[e −βτ u + (Y τ )] ≥ 0 and u −1 and conversely, with the first equality following from (A-7) and the second from the fact that the set of stopping rules of the form {σ = σ (τ ) for a stopping rule τ } is a subset of the set of all stopping rules. In particular, if the goal is to study optimal stopping problems with S-shaped utilities (or indeed any utility which takes negative values), then applying discounting to the utility allows the agent to effectively walk away from any 'losses' by letting the discounting eliminate any negative impacts. Careful consideration needs to be given to the interpretation of the solution to such problems. Tversky, A.: Intransitivity of preferences. Psychol. Rev. 76, 31 (1969) Tversky, A., Kahneman, D.: Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5(4), 297-323 (1992) Wakker, P.: Separating marginal utility and probabilistic risk aversion. Theor. Decis. 36(1), 1-44 (1994) Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.