Optimal mean-variance portfolio selection

Assuming that the wealth process Xu is generated self-financially from the given initial wealth by holding its fraction u in a risky stock (whose price follows a geometric Brownian motion with drift μ ∈ R and volatility σ > 0) and its remaining fraction 1−u in a riskless bond (whose price compounds exponentially with interest rate r ∈ R), and letting Pt,x denote a probability measure under which Xu takes value x at time t , we study the dynamic version of the nonlinear mean-variance optimal control problem sup u [ Et,Xu t (X u T ) − cVart,Xu t (Xu T ) ] where t runs from 0 to the given terminal time T > 0, the supremum is taken over admissible controls u, and c > 0 is a given constant. By employing the method of Lagrange multipliers we show that the nonlinear problem can be reduced to a family of linear problems. Solving the latter using a classic Hamilton-Jacobi-Bellman approach we find that the optimal dynamic control is given by u∗(t, x) = δ 2cσ 1 x e 2−r)(T−t) where δ = (μ−r)/σ . The dynamic formulation of the problem and the method of solution are applied to the constrained problems ofmaximising/minimising themean/variance subject to the upper/lower bound on the variance/mean from which the nonlinear problem above is obtained by optimising the Lagrangian itself. B Goran Peskir goran@maths.man.ac.uk Jesper Lund Pedersen jesper@math.ku.dk 1 Department of Mathematical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark 2 School of Mathematics, The University of Manchester, Oxford Road, Manchester M13 9PL, UK


Introduction
Imagine an investor who has an initial wealth which he wishes to exchange between a risky stock and a riskless bond in a self-financing manner dynamically in time so as to maximise his return and minimise his risk at the given terminal time. In line with the mean-variance analysis of Markowitz [11] where the optimal portfolio selection problem of this kind was solved in a single period model (see e.g. Merton [12] and the references therein) we will identify the return with the expectation of the terminal wealth and the risk with the variance of the terminal wealth. The quadratic nonlinearity of the variance then moves the resulting optimal control problem outside the scope of the standard optimal control theory (see e.g. [5]) which may be viewed as dynamic programming in the sense of solving the Hamilton-Jacobi-Bellman (HJB) equation and obtaining an optimal control which remains optimal independently from the initial (and hence any subsequent) value of the wealth. Consequently the results and methods of the standard/linear optimal control theory are not directly applicable in this new/nonlinear setting. The purpose of the present paper is to develop a new methodology for solving nonlinear optimal control problems of this kind and demonstrate its use in the optimal mean-variance portfolio selection problem stated above. This is done in parallel to the novel methodology for solving nonlinear optimal stopping problems that was recently developed in [13] when tackling an optimal mean-variance selling problem.
Assuming that the stock price follows a geometric Brownian motion and the bond price compounds exponentially, we first consider the constrained problem in which the investor aims to maximise the expectation of his terminal wealth X u T over all admissible controls u (representing the fraction of the wealth held in the stock) such that the variance of X u T is bounded above by a positive constant. Similarly the investor could aim to minimise the variance of his terminal wealth X u T over all admissible controls u such that the expectation of X u T is bounded below by a positive constant. A first application of Lagrange multipliers implies that the Lagrange function (Lagrangian) for either/both constrained problems can be expressed as a linear combination of the expectation of X u T and the variance of X u T with opposite signs. Optimisation of the Lagrangian over all admissible controls u thus yields the central optimal control problem under consideration. Due to the quadratic nonlinearity of the variance we can no longer apply standard/linear results of the optimal control theory to solve the problem.
Conditioning on the size of the expectation we show that a second application of Lagrange multipliers reduces the nonlinear optimal control problem to a family of linear optimal control problems. Solving the latter using a classic HJB approach we find that the optimal control depends on the initial point of the controlled wealth process in an essential way. This spatial inconsistency introduces a time inconsistency in the problem that in turn raises the question whether the optimality obtained is adequate for practical purposes. We refer to this optimality as the static optimality (Definition 1) to distinguish it from the dynamic optimality (Definition 2) in which each new position of the controlled wealth process yields a new optimal control problem to be solved upon overruling all the past problems. This in effect corresponds to solving infinitely many optimal control problems dynamically in time with the aim of determining the optimal control (in the sense that no other control applied at present time could produce a more favourable value at the terminal time). While the static optimality has been used in the paper by Strotz [21] under the name of 'pre-commitment' as far as we know the dynamic optimality has not been studied in the nonlinear setting of optimal control before. In Sect. 4 below we give a more detailed account of the mean-variance results and methods on the static optimality starting with the paper by Richardson [19]. Optimal controls in all these papers are time inconsistent in the sense described above. This line of papers ends with the paper by Basak and Chabakauri [1] where a time-consistent control is derived that corresponds to the Strotz's approach of 'consistent planning' [21] realised as the subgame-perfect Nash equilibrium (the optimality concept refining Nash equilibrium proposed by Selten in 1965).
We show that the dynamic formulation of the nonlinear optimal control problem admits a simple closed-form solution (Theorem 3) in which the optimal control no longer depends on the initial point of the controlled wealth process and hence is time consistent. Remarkably we also verify that this control yields the expected terminal value which (i) coincides with the expected terminal value obtained by the statically optimal control (Remark 4) and moreover (ii) dominates the expected terminal value obtained by the subgame-perfect Nash equilibrium control (in the sense of Strotz's 'consistent planning') derived in [1] (Sect. 4). Closed-form solutions to the constrained problems are then derived using the solution to the unconstrained problem (Corollaries 5 and 7). These results are of both theoretical and practical interest. In the first problem we note that the optimal wealth exhibits a dynamic compliance effect (Remark 6) and in the second problem we observe that the optimal wealth solves a meander type equation of independent interest (Remark 8). In both problems we verify that the expected terminal value obtained by the dynamically optimal control dominates the expected terminal value obtained by the statically optimal control.
The novel problems and methodology of the present paper suggest a number of avenues for further research. Firstly, we work within the transparent setting of one-dimensional geometric Brownian motion in order to illustrate the main ideas and describe the new methodology without unnecessary technical complications. Extending the results to higher dimensions and more general diffusion/Markov processes appears to be worthy of further consideration. Secondly, for similar tractability reasons we assume that (i) unlimited short-selling and borrowing are permitted, (ii) transaction costs are zero, (iii) the wealth process may take both positive and negative values of unlimited size. Extending the results under some of these constraints being imposed is also worthy of further consideration. In both of these settings it is interesting to examine to what extent the results and methods laid down in the present paper remain valid under any of these more general or restrictive hypotheses.

Formulation of the problem
Assume that the riskless bond price B solves where r ∈ R is the interest rate, and let the risky stock price S follow a geometric Brownian motion solving with S 0 = s for some s > 0 where μ ∈ R is the drift, σ > 0 is the volatility, and W is a standard Brownian motion defined on a probability space ( , F , P). Note that a unique solution to (2.1) is given by B t = b e rt and recall that a unique strong solution to (2.2) is given by S t = s exp(σ W t +(μ−(σ 2 /2))t) for t ≥ 0. Consider the investor who has an initial wealth x 0 ∈ R which he wishes to exchange between B and S in a self-financing manner (with no exogenous infusion or withdrawal of wealth) dynamically in time up to the given horizon T > 0. It is then well known (see e.g. [2,Chapter 6]) that the investor's wealth X u solves with X u t 0 = x 0 where u t denotes the fraction of the investor's wealth held in the stock at time t ∈ [t 0 , T ] for t 0 ∈ [0, T ) given and fixed. Note that (i) u t < 0 corresponds to short selling of the stock, (ii) u t > 1 corresponds to borrowing from the bond, and (iii) u t ∈ [0, 1] corresponds to a long position in both the stock and the bond.
To simplify the exposition we will assume that the control u in (2.3) is given by x is a continuous function from [0, T ]×R into R for which the stochastic differential equation (2.3) understood in Itô's sense has a unique strong solution X u (meaning that the solution X u to (2.3) is adapted to the natural filtration of W and ifX u is another solution to (2.3) of this kind then X u andX u are equal almost surely). We will call controls of this kind admissible in the sequel. Recalling that the natural filtration of S coincides with the natural filtration of W we see that admissible controls have a natural financial interpretation as they are obtained as deterministic (measurable) functionals of the observed stock price. Moreover, adopting the convention that u(t, 0) · 0 := lim 0 =x→0 u(t, x) · x we see that the solution X u to (2.3) could take both positive and/or negative values after passing through zero when the latter limit is different from zero (as is the case in the main results below). This convention corresponds to re-expressing (2.3) in terms of the total wealth u t X u t held in the stock as opposed to its fraction u t which we follow throughout (note that the essence of the wealth equation (2.3) remains the same in both cases). We do always identify u(t, 0) with u(t, 0) · 0 however since x → u(t, x) may not be well defined at 0.
Note that the results to be presented below also hold if the set of admissible controls is enlarged to include discontinuous and path dependent controls u that are adapted to the natural filtration of W , or even controls u which are adapted to a larger filtration still making W a martingale so that (2.3) has a unique weak solution X u (meaning that the solution X u to (2.3) is adapted to the larger filtration and ifX u is another solution to (2.3) of this kind then X u andX u are equal in law). Since these extensions follow along the same lines and needed modifications of the arguments are evident, we will omit further details in this direction and focus on the set of admissible controls as defined above.
For a given admissible control u we let P t,x denote the probability measure (defined on the canonical space) under which the solution X u to (2.3) takes value x at time t for (t, x) ∈ [0, T ] × R. Note that X u is a (strong) Markov process with respect to P t,x for (t, x) ∈ [0, T ]×R.
Consider the optimal control problem where the supremum is taken over all admissible controls u such that E t,x [(X u T ) 2 ] < ∞ for (t, x) ∈ [0, T ]×R and c > 0 is a given and fixed constant. A sufficient condition for the latter expectation to be finite is that E t,x T t (1+u 2 s )(X u s ) 2 ds < ∞ and we will assume in the sequel that all admissible controls by definition satisfy that condition as well.
Due to the quadratic nonlinearity of the second term in the expression Var t, 2 it is evident that the problem (2.4) falls outside the scope of the standard/linear optimal control theory for Markov processes (see e.g. [5]). Moreover, we will see below that in addition to the static formulation of the nonlinear problem (2.4) where the maximisation takes place relative to the initial point (t, x) which is given and fixed, one is also naturally led to consider a dynamic formulation of the nonlinear problem (2.4) in which each new position of the controlled process ((t, X u t )) t∈[0,T ] yields a new optimal control problem to be solved upon overruling all the past problems. We believe that this dynamic optimality is of general interest in the nonlinear problems of optimal control (as well as nonlinear problems of optimal stopping as discussed in [13]).
The problem (2.4) seeks to maximise the investor's return identified with the expectation of X u T and minimise the investor's risk identified with the variance of X u T upon applying the control u. This identification is done in line with the mean-variance analysis of Markowitz [11]. Moreover, we will see in the proof below that the problem (2.4) is obtained by optimising the Lagrangian of the constrained problems respectively, where u is any admissible control, and α ∈ (0, ∞) and β ∈ R are given and fixed constants. Solving (2.4) we will therefore be able to solve (2.5) and (2.6) as well. Note that the constrained problems have transparent interpretations in terms of the investor's return and the investor's risk as discussed above. We now formalise definitions of the optimalities alluded to above. Recall that all controls throughout refer to admissible controls as defined/discussed above.

Definition 1 (Static optimality).
A control u * is statically optimal in (2.4) for (t, x) ∈ [0, T ]× R given and fixed, if there is no other control v such that A control u * is statically optimal in (2.5) for (t, x) ∈ [0, T ] × R given and fixed, if Note that the static optimality refers to the optimality relative to the initial point (t, x) which is given and fixed. Changing the initial point may yield a different optimal control in the nonlinear problems since the statically optimal controls may and generally will depend on the initial point in an essential way (cf. [21]). This stands in sharp contrast with standard/linear problems of optimal control where in view of dynamic programming (the HJB equation) the optimal control does not depend on the initial point explicitly. This is a key difference between the static optimality in nonlinear problems of optimal control and the standard optimality in linear problems of optimal control (cf. [5]).

Definition 2 (Dynamic optimality).
A control u * is dynamically optimal in (2.4), if for every given and fixed (t, x) ∈ [0, T ]×R and every control v such that v(t, x) = u * (t, x), there exists a control w satisfying w(t, x) = u * (t, x) such that A control u * is dynamically optimal in (2.5), if for every given and fixed (t, x) ∈ [0, T ]×R and every control v such that v(t, A control u * is dynamically optimal in (2.6), if for every given and fixed (t, Dynamic optimality above is understood in the 'strong' sense. Replacing the strict inequalities in (2.10)-(2.12) by inequalities would yield dynamic optimality in the 'weak' sense.
Note that the dynamic optimality corresponds to solving infinitely many optimal control problems dynamically in time where each new position of the controlled process ((t, X u t )) t∈[0,T ] yields a new optimal control problem to be solved upon overruling all the past problems. The optimal decision at each time tells us to exert the best control among all possible controls. While the static optimality remembers the past (through the initial point) the dynamic optimality completely ignores it and only looks ahead. Nonetheless it is clear that there is a strong link between the static and dynamic optimality (the latter being formed through the beginnings of the former as shown below) and this will be exploited in the proof below when searching for the dynamically optimal controls. In the case of standard/linear optimal control problems for Markov processes it is evident that the static and dynamic optimality coincide under mild regularity conditions due to the fact that dynamic programming (the HJB equation) is applicable. This is not the case for the nonlinear problems of optimal control considered in the present paper as it will be seen below.

Solution to the problem
In this section we present solutions to the problems formulated in the previous section. We first focus on the unconstrained problem.

Theorem 3 Consider the optimal control problem
T ]×R given and fixed. Recall that B solves (2.1), S solves (2.2), and we set δ = (μ − r )/σ for μ ∈ R, r ∈ R and σ > 0. We assume throughout that δ = 0 and r = 0 (the cases δ = 0 or r = 0 follow by passage to the limit when the non-zero δ or r tends to 0).
(A) The statically optimal control is given by The statically optimal controlled process is given by The dynamically optimal control is given by The dynamically optimal controlled process is given by Proof We assume throughout that the process X u solves the stochastic differential equation T ]×R given and fixed where u is any admissible control as defined/discussed above. To simplify the notation we will drop the subscript zero from t 0 and x 0 in the first part of the proof below.
(A): Note that the objective function in (2.4) reads where the key difficulty is the quadratic nonlinearity of the middle term on the right-hand side. To overcome this difficulty we will condition on the size of E t,x (X u T ). This yields Hence to solve (3.8) and thus (2.4) we need to solve the constrained problem for M ∈ R given and fixed where u is any admissible control. 1. To tackle the problem (3.9) we will apply the method of Lagrange multipliers. For this, define the Lagrangian as follows for λ ∈ R and let u λ * denote the optimal control in the unconstrained problem upon assuming that it exists. Suppose moreover that there is It then follows from (3.10)-(3.12) that This shows that u λ * satisfying (3.11) and (3.12) is optimal in (3.9).
2. To tackle the problem (3.11) with (3.12) we consider the optimal control problem where u is any admissible control. This is a standard/linear problem of optimal control (see e.g. [5]) that can be solved using a classic HJB approach. For the sake of completeness we present key steps in the derivation of the solution. From (3.14) combined with (2.3) we see that the HJB system reads Making the ansatz that V λ x x > 0 and minimising the quadratic function of u over R in (3.15) we find that Seeking the solution to (3.18) of the form and making use of (3.16) we find that (3.20) under these terminal conditions we obtain (3.21) Inserting (3.21) into (3.19) and calculating (3.17) we find that Applying Itô's formula to the process Z defined by where we set K := (λ/2) e −r (T −t 0 ) and making use of (2.3) we find that Solving the linear equation (3.24) explicitly we obtain the following closed form expression The process X u defined by (3.25) is a unique strong solution to the stochastic differential equation (2.3) obtained by the control u from (3.22) and yielding the value function V λ given in (3.19) combined with (3.21) above. It is then a matter of routine to apply Itô's formula to V λ composed with (t, X v t ) for any admissible control v and using (3.15)+(3.16) verify that the candidate control u from (3.22) is optimal in (3.14) as envisaged (these arguments are displayed more explicitly in (3.36)-(3.37) below).
3. Having solved the problem (3.14) we still need to meet the condition (3.12). For this, we find from (3.25) that To realise (3.12) we need to identify (3.26) with M. This yields for δ = 0. Note that the case δ = 0 is evident since in this case u λ * = 0 is optimal in (3.14) for every λ ∈ R and hence the inequality in (3.13) holds for every admissible control u while from (2.3) we also easily see that (3.26) (with δ = 0) holds for every admissible control u so that we only have one M possible in (3.8) and that is the one given by (3.26) (with δ = 0). This shows that (3.1)-(3.3) are valid when δ = 0 and we will therefore assume in the sequel that δ = 0. Moreover, from (3.25) we also find that Note that this expression can also be obtained from (3.14) and (3.26) upon recalling (3.19) with (3.21) above. Inserting (3.27) into (3.28) and recalling (3.13) we see that (3.9) is given by for δ = 0. Note that the function of M to be maximised on the right-hand side is quadratic with the coefficient in front of M 2 strictly negative when δ = 0. This shows that there exists a unique maximum point in (3.30) that is easily found to be given by Inserting (3.31) into (3.27) we find that Inserting (3.32) into (3.22) we establish the existence of the optimal control in (2.4) that is given by (3.1) above. Moreover, inserting (3.32) into (3.25) we obtain the first identity in (3.2). The second identity in (3.2) then follows upon recalling the closed form expressions for B and S stated following (2.2) above. Finally, inserting (3.31) into (3.30) we obtain (3.3) and this completes the first part of the proof. (B): Identifying t 0 with t and x 0 with x in the statically optimal control u s * from (3.1) we obtain the control u d * from (3.4). We claim that this control is dynamically optimal in (2.4). For this, take any other admissible control v such that v(t 0 , x 0 ) = u d * (t 0 , x 0 ) and set w = u s * . Then w(t 0 , x 0 ) = u d * (t 0 , x 0 ) and we claim that 4. To verify (3.33) set M v := E t 0 ,x 0 (X v T ) and first consider the case when M v = M * where M * is given by (3.31) above. Using (3.9) + (3.29) and (3.30) + (3.31) we then find that for δ = 0 where the strict inequality follows since M * is the unique maximum point of the quadratic function as pointed out following (3.30) above. The case δ = 0 is excluded since then as pointed out following (3.27) above we only have M * possible in (3.8) so that M v would be equal to M * . This shows that (3.33) is satisfied when M v = M * as claimed.
Next consider the case when M v = M * . We then claim that where V λ * is defined in (3.14) and λ * is given by (3.32) above. For this, note that using (3.16) and applying Itô's formula we get 3) by means of Jensen's and Burkholder-Davis-Gundy's inequalities that 2 ] < ∞ and hence upon recalling (3.19) + (3.21) above (with λ * in place of λ) it follows by Hölder's inequality that E t 0 ,x 0 √ M, M T < ∞ so that M is a martingale. Taking E t 0 ,x 0 on both sides of (3.36) we therefore get where the integrand is non-negative due to (3.15) (with λ * in place of λ). Since u d . Assuming x 0 = 0 by the continuity of v and w it then follows that v(s, x) = w(s, x) for all (s, x) ∈ R ε := [t 0 , t 0 +ε]×[x 0 −ε, x 0 +ε] for some ε > 0 small enough such that t 0 +ε ≤ T as well. Moreover, since w(t, x) is the unique minimum point of the continuous function on the left-hand side of (3.15) (with λ * in place of λ) evaluated at (t, x) for every (t, x) ∈ [0, T ]×R, we see that this ε > 0 can be chosen small enough so that on R ε for some β > 0 given and fixed. Setting where in the first inequality we use that the integrand in (3.37) is non-negative as pointed out above and in the final (strict) inequality we use that τ ε > t 0 with P t 0 ,x 0 -probability one due to the continuity of X v . The arguments remain also valid when x 0 = 0 upon recalling that v(t 0 , 0) and w(t 0 , 0) are identified with v(t 0 , 0) · 0 and w(t 0 , 0) · 0 in this case. From (3.39) we see that (3.35) holds as claimed.
Recalling from (3.10)- It follows therefore from (3.8) that This shows that (3.33) holds when M v = M * as well and hence we can conclude that the control u d * from (3.4) is dynamically optimal as claimed. 5. Applying Itô's formula to e r (T −t) X d t where we set X d := X u d * and making use of (2.3) we easily find that the first identity in (3.5) is satisfied. Integrating by parts and recalling the closed form expressions for B and S stated following (2.2) above we then establish that the second identity in (3.5) also holds. From the first identity in (3.5) we get From (3.42) and (3.43) we obtain (3.6) and this completes the proof.

Remark 4
The dynamically optimal control u d * from (3.4) by its nature rejects any past point (t 0 , x 0 ) to measure its performance so that although the static value V s (t 0 , x 0 ) by its definition dominates the dynamic value V d (t 0 , x 0 ) this comparison is meaningless from the standpoint of the dynamic optimality. Another issue with a plain comparison of the values V s (t, x) and V d (t, x) for (t, x) ∈ [t 0 , T ] × R is that the optimally controlled processes X s and X d may never come to the same point x at the same time t so that the comparison itself may be unreal. A more dynamic way that also makes more sense in general is to compare the value functions composed with the controlled processes. This amounts to look at V s (t, X s t ) and V d (t, X d t ) for t ∈ [t 0 , T ] and pay particular attention to t becoming the terminal value T .
. It is easily seen from (3.2) and (3.5) that the latter two expectations coincide. We can therefore conclude that for all (t 0 , x 0 ) ∈ [0, T ]×R. This shows that the dynamically optimal control u d * is as good as the statically optimal control u s * from this static standpoint as well (with respect to any past point (t 0 , x 0 ) given and fixed). In addition to that however the dynamically optimal control u d * is time consistent while the statically optimal control u s * is not. Note also from (3.4) that the amount of the dynamically optimal wealth u d * (t, x) · x held in the stock at time t does not depend on the amount of the total wealth x. This is consistent with the fact that the risk/cost in (2.4) is measured by the variance (applied at a constant rate c) which is a quadratic function of the terminal wealth while the return/gain is measured by the expectation (applied at a constant rate too) which is a linear function of the terminal wealth. The former therefore penalises stochastic movements of the large wealth more severely than what the latter is able to compensate for and the investor is discouraged to hold larger amounts of his wealth in the stock. Thus even if the total wealth is large (in modulus) it is still dynamically optimal to hold the same amount of wealth u d * (t, x) · x in the stock at time t as when the total wealth is small (in modulus). The same optimality behaviour has been also observed for the subgame-perfect Nash equilibrium controls (cf. Sect. 4).
We now turn to the constrained problems. Note in the proofs below that the unconstrained problem above is obtained by optimising the Lagrangian of the constrained problems.
(A) The statically optimal control is given by The statically optimal controlled process is given by The dynamically optimal control is given by for (t, x) ∈[t 0 , T ]×R. The dynamically optimal controlled process is given by Proof We assume throughout that the process X u solves the stochastic differential equation (2.3) with X u (A): Note that we can think of (3.7) as (the essential part of) the Lagrangian for the constrained problem (2.5) defined by for c > 0. By the result of Theorem 3 we know that the control u s * given in (3.1) is optimal in unconstrained problem for c > 0. Suppose moreover that there exists c = c(α, t, x) > 0 such that It then follows that for any admissible control u such that Var t,x (X u T ) ≤ α. This shows that the control u c * from (3.1) with c = c(α, t, x) > 0 is statically optimal in (2.5).
To realise (3.53) note that taking E t 0 ,x 0 in (3.2) and making use of (3.3) we find that Setting this expression equal to α yields (B): Identifying t 0 with t and x 0 with x in the statically optimal control u s * from (3.45) we obtain the control u d * from (3.48). We claim that this control is dynamically optimal in (2.5). For this, take any other admissible control v such that v(t 0 , x 0 ) = u d * (t 0 , x 0 ) and set w = u s * . Then w(t 0 , x 0 ) = u d * (t 0 , x 0 ) and (3.33) holds with c from (3.56). Using that Var t 0 ,x 0 (X w T ) = α by (3.55) and (3.56) we see that (3.33) yields This shows that the control u d * from (3.48) is dynamically optimal in (2.5) as claimed.
Applying Itô's formula to e r (T −t) X d t where we set X d := X u d * and making use of (2.3) we easily find that the first identity in (3.49) is satisfied. Integrating by parts and recalling the closed form expressions for B and S stated following (2.2) above we then establish that the second identity in (3.49) also holds. From the first identity in (3.49) we get for t ∈ [t 0 , T ). Letting t ↑ T in (3.58) we obtain (3.50) and this completes the proof.  (t 0 , x 0 ). To see why this is possible note that using (3.49) we find that

Remark 6 (A dynamic compliance effect
can indeed be exceeded by the dynamic value V 1 d (t 0 , x 0 ) since the set of admissible controls is virtually larger in the dynamic case. It amounts to what we refer to as a dynamic compliance effect where the investor follows a uniformly bounded risk (variance) strategy at each time (and thus complies with the adopted regulation rule imposed internally/externally) while the resulting static strategy exhibits an unbounded risk (variance). Denoting the stochastic integral (martingale) in (3.49) by M t we see that M, M t = t t 0 e 2δ 2 (T −s) /(e 2δ 2 (T −s) − 1) ds → ∞ as t ↑ T . It follows therefore that M t oscillates from −∞ to ∞ with P t 0 ,x 0 -probability one as t ↑ T and hence the same is true for X d t whenever δ = 0 (for similar behaviour arising from the continuous-time analogue of a doubling strategy see [9,Example 2.3]). We also see from (3.46) and (3.49) that unlike in (3.44) we have the strict inequality This shows that the dynamic control u d * from (3.48) outperforms the static control u s * from (3.45) in the constrained problem (2.5).
(A) The statically optimal control is given by The statically optimal controlled process is given by Fig. 1 below). The static value function V 2 s := Var(X s T ) is given by The dynamically optimal wealth t → X d t and the statically optimal wealth t → X s t in the constrained problem (2.6) of Corollary 7 obtained from the stock price t → S t when t 0 = 0, x 0 = 1, S 0 = 1, β = 2, r = 0.1, μ = 0.5, σ = 0.4 and T = 1. Note that the expected value of S T equals e μT ≈ 1.64 which is strictly smaller than β The dynamically optimal control is given by The dynamically optimal controlled process is given by Fig. 1 above). The dynamic value function V 2 d := Var(X d T ) is given by Proof We assume throughout that the process X u solves the stochastic differential equation T ]×R given and fixed where u is any admissible control as defined/discussed above. To simplify the notation we will drop the subscript zero from t 0 and x 0 in the first part of the proof below.
(A): Note that the Lagrangian for the constrained problem (2.6) is defined by for c > 0. To connect to the results of Theorem 3 observe that which shows that the control u 1/c * given in (3.1) is optimal in the unconstrained problem for c > 0. Suppose moreover that there exists c = c(β, t, x) > 0 such that It then follows that for any admissible control u such that E t,x (X u T ) ≥ β. This shows that the control u 1/c * from (3.1) with c = c(β, t, x) > 0 is statically optimal in (2.6).
To realise (3.70) note that taking E t 0 ,x 0 in (3.2) we find that as claimed. Let us therefore assume that x 0 e r (T −t 0 ) < β in the sequel. Then by (3.70) and (3.71) we can conclude that the control u 1/c * is statically optimal in (2.6). Inserting (3.73) into (3.1) and (3.2) we obtain (3.61) and (3.62) respectively. Inserting (3.73) into (3.55) we obtain (3.63) and this completes the first part of the proof.
(B): Identifying t 0 with t and x 0 with x in the statically optimal control u s * from (3.61) we obtain the control u d * from (3.64). We claim that this control is dynamically optimal in (2.6) when x 0 e r (T −t 0 ) < β. For this, take any other admissible control v such that v(t 0 , x 0 ) = u d * (t 0 , x 0 ) and set w = u s * . Then w(t 0 , x 0 ) = u d * (t 0 , x 0 ) and (3.33) holds with c from (3.73). Using that E t 0 ,x 0 (X w T ) = β by (3.72) and (3.73) we see that (3.33) yields strictly below β for t ∈ [t 0 , T ) with achieving X d

Static versus dynamic optimality
In this section we address the rationale for introducing the static and dynamic optimality in the nonlinear optimal control problems under consideration and explain their relevance for applications of both theoretical and practical interest. We also discuss relation of these results with the existing approaches to similar problems in the literature.
1. To simplify the exposition we focus on the unconstrained problem (2.4) and similar arguments apply to the constrained problems (2.5) and (2.6) as well. Recall that (2.4) represents the optimal portfolio selection problem for an investor who has an initial wealth x 0 ∈ R which he wishes to exchange between a risky stock S and a riskless bond B in a self-financing manner dynamically in time so as to maximise his return (identified with the expectation of his wealth) and minimise his risk (identified with the variance of his wealth) at the given terminal time T . Due to the quadratic nonlinearity of the variance (as a function of the expectation) the optimal portfolio strategy (3.1) depends on the initial wealth x 0 in an essential way. This spatial inconsistency (not present in the standard/linear optimal control problems) introduces the time inconsistency in the problem because the investor's wealth process moves from the initial value x 0 in t units of time to a new value x 1 (different from x 0 with probability one) which in turn yields a new optimal portfolio strategy that is different from the initial strategy. This time inconsistency repeats itself between any two points in time and the investor may be in doubt which optimal portfolio strategy to use unless already made up his mind. To tackle these inconsistencies we are naturally led to consider two types of investors and consequently introduce the two notions of optimality as stated in Definitions 1 and 2 respectively. The first investor is a static investor who stays 'pre-committed' to the optimal portfolio strategy evaluated initially and does not re-evaluate the optimality criterion (2.4) at later times. This investor will determine the optimal portfolio strategy at time t 0 and follow it blindly to the terminal time T . The second investor is a dynamic investor who remains 'non-committed' to the optimal portfolio strategy evaluated initially as well as subsequently and continuously re-evaluates the optimality criterion (2.4) at each new time. This investor will determine the optimal portfolio strategy at time t 0 and continue doing so at each new time until the terminal time T . Clearly both the static investor and the dynamic investor embody realistic economic behaviour (see below for a more detailed account coming from economics) and Theorem 3 discloses their optimal portfolio selection strategies in the unconstrained problem (2.4). Similarly Corollary 5 and Corollary 7 disclose their optimal portfolio selection strategies in the constrained problems (2.5) and (2.6). Given that the financial interpretations of these results are easy to draw directly and somewhat lengthy to state explicitly we will omit further details. It needs to be noted that although closely related the three problems (2.4)-(2.6) are still different and hence it is to be expected that their solutions are also different for some values of the parameters. Difference between the static and dynamic optimality is best understood by analysing each problem on its own first as in this case the complexity of the overall comparison is greatly reduced. 2. Apart from the paper [13] where the dynamic optimality was used in a nonlinear problem of optimal stopping, we are not aware of any other paper on optimal control where nonlinear problems were studied using this methodology. The dynamic optimality (Definition 2) appears therefore to be original to the present paper in the context of nonlinear problems of optimal control. There are two streams of papers on optimal control however where the static optimality (Definition 1) has been used. The first one belongs to the economics literature and dates back to the paper by Strotz [21]. The second one belongs to the finance literature and dates back to the paper by Richardson [19]. We present a brief review of these papers to highlight similarities/differences and indicate the applicability of the present methodology in these settings. 3. The stream of papers in the economics literature starts with the paper by Strotz [21] who points out a time inconsistency arising from the presence of the initial point in the time domain when the exponential discounting in the utility model of Samuelson [20] is replaced by a non-exponential discounting. For an illuminating exposition of the problem of intertemporal choices (decisions involving tradeoffs among costs and benefits occurring at different times) lasting over hundred years and leading to the Samuelson's simplifying model containing a single parameter (discount rate) see [7] and the references therein. To tackle the issue of the time inconsistency Strotz proposed two strategies in his paper: (i) the strategy of 'pre-commitment' (where the individual commits to the optimal strategy derived initially) and (ii) the strategy of 'consistent planning' (where the individual rejects any strategy which he will not follow through and aims to find the optimal strategy among those that he will actually follow). Note in particular that Strotz coins the term 'pre-committed' strategy in his paper and this term has since been used in the literature including most recent papers too. Although his setting is deterministic and his time is discrete on closer look one sees that our financial analysis of the static investor above is fully consistent with his economic reasoning and moreover the statically optimal portfolio strategy derived in the present paper may be viewed as the strategy of 'pre-commitment' in Strotz's sense as already indicated above. The dynamically optimal portfolio strategy derived in the present paper is different however from the strategy of 'consistent planning' in Strotz's sense. The difference is subtle still substantial and it will become clearer through the exposition of the subsequent development that continues to the present time. The next to point out is the paper by Pollak [16] who showed that the derivation of the strategy of 'consistent planning' in the Strotz paper [21] was incorrect (one cannot replace the individual's non-exponential discount function by the exponential discount function having the same slope as the non-exponential discount function at zero). Peleg and Yaari [14] then attempted to find the strategy of 'consistent planning' by backward recursion and concluded that the strategy could exist only under too restrictive hypotheses to be useful. They suggested to look at what we now refer to as a subgameperfect Nash equilibrium (the optimality concept refining Nash equilibrium proposed by Selten in 1965). Goldman [8] then pointed out that the failure of backward recursion does not disprove the existence as suggested in [14] and showed that the strategy of 'consistent planning' does exist under quite general conditions. All these papers deal with problems in discrete time. A continuous-time extension of these results appear more recently in the paper by Ekeland and Pirvu [6] and the paper by Björk and Murgoci [3] (see also the references therein for other unpublished work). The Strotz's strategy of 'consistent planning' is being understood as a subgame-perfect Nash equilibrium in this context (satisfying the natural consumption constraint at present time). 4. The stream of papers in the finance literature starting with the paper by Richardson [19] deals with optimal portfolio selection problems under mean-variance criteria similar/analogous to (2.4)-(2.6) above. Richardson's paper [19] derives a statically optimal control in the constrained problem (2.6) using the martingale method suggested by Pliska [15] who makes use of the Legendre transform (convex analysis) rather than the Lagrange multipliers. For an overview of the martingale method based on Lagrange multipliers see e.g. [2,Sect. 20]. This martingale method can be used to solve the auxiliary optimal control problem (3.14) in the proof of Theorem 3 above. Moreover on closer look it is possible to see that the dynamically optimal control is obtained by setting the Radon-Nikodym derivative of the equivalent martingale measure with respect to the original measure equal to one. Given that the martingale method is applicable to more general problems of optimal control including those in non-Markovian settings as well this observation provides a lead for finding the dynamically optimal controls when a classic HJB approach may not be directly applicable.
Returning to the stream of papers in the finance literature, the paper by Li and Ng [10, Theorems 1&2] in discrete time and the paper by Zhou and Li [24,Theorem 3.1] in continuous time show that if there is statically optimal control in the unconstrained problem (2.4) then this control can be found by solving a linear-quadratic optimal control problem (which in turn also yields statically optimal controls in the constrained problems (2.5) and (2.6)). The methodology in these papers relies upon the results on multi-index optimisation problems from the paper by Reid and Citron [17] and is more involved (in comparison with the simple conditioning combined with a double application of Lagrange multipliers as done in the present paper). In particular, the results of [10] and [24] do not establish the existence of statically optimal controls in the problems (2.4)-(2.6) although they do derive their closed form expressions in discrete and continuous time respectively. In this context it may be useful to recall that the first to point out that nonlinear dynamic programming problems may be tackled using the ideas of Lagrange multipliers was White in his paper [23]. He also considered the constrained problem (2.6) in discrete time (his Sect. 3) and using Lagrange multipliers derived some conclusions on the statically optimal control (without realising its time inconsistency). In his setting the conditioning on the size of the expected value is automatic since he assumed that the expected value in (2.6) equals β. For this reason his first Lagrangian associated with (2.6) was a linear problem and hence there was no need to untangle the resulting nonlinearity by yet another application of Lagrange multipliers as done in the present paper.
All papers in the finance literature reviewed above (including others not mentioned) study statically optimal controls which in turn are time inconsistent. Thus all of them deal with 'pre-committed' strategies in the sense of Strotz. This was pointed out by Basak and Chabakauri in their paper [1] where they return to the Strotz's approach of 'consistent planning' and study the subgame-perfect Nash equilibrium in continuous time. The paper by Björk and Murgoci [3] merges this with the stream of papers from the economics literature (as already stated above) and studies general formulations of time inconsistent problems based on the Strotz's approach of 'pre-commitment' vs 'consistent planning' in the sense of the subgame-perfect Nash equilibrium. A recent paper by Czichowsky [4] studies analogous formulations and further refinements in a general semimartingale setting. For applications of statically optimal controls to pension schemes see the paper by Vigna [22]. 5. We now return to the question of comparison between the Strotz's definition of 'consistent planning' which is interpreted as the subgame-perfect Nash equilibrium in the literature and the 'dynamic optimality' as defined in the present paper. The key conceptual difference is that the Strotz's definition of 'consistent planning' is relative (constrained) in the sense that the 'optimal' control at time t is best among all 'available' controls (the ones which will be actually followed) while the present definition of the 'dynamic optimality' is absolute (unconstrained) in the sense that the optimal control at time t is best among all 'possible' controls afterwards. To illustrate this distinction recall that the subgameperfect Nash equilibrium formulation of the Strotz 'consistent planning' optimality can be informally described as follows. Given the present time t and all future times s > t one identifies the control c s applied at time s ≥ t with an action of the s-th player. The Strotz 'consistent planning' optimality is then obtained through the subgame-perfect Nash equilibrium at a given control (c r ) r ≥0 if the action c t is best when the actions c s for s > t are given and fixed, i.e. no other actionc t in place c t would do better when the actions c s for s > t are given and fixed (the requirement is clear in discrete time and requires some right-hand limiting argument in continuous time). Clearly this optimality is different from the 'dynamic optimality' where the optimal control at time t is best among all 'possible' controls afterwards.
To make a more explicit comparison between the two concepts of optimality, recall from [1] (see also [3]) that a subgame-perfect Nash optimal control in the problem (2.4) is given by for (t, x) ∈[t 0 , T ]×R, the subgame-perfect Nash optimal controlled process is given by for t ∈ [t 0 , T ], and the subgame-perfect Nash value function is given by for (t 0 , x 0 ) ∈ [t 0 , T ] × R (compare these expressions with those given in (3.4)-(3.6) above). Returning to the analysis from the first paragraph of Remark 4 above, one can easily see by direct comparison that the subgame-perfect Nash value V n (t 0 , x 0 ) dominates the dynamic value V d (t 0 , x 0 ) (and is dominated by the static value V s (t 0 , x 0 ) due to its definition). Given that the optimally controlled processes X n and X d may never come to the same point x at the same time t we see (as pointed out in Remark 4) that this comparison may be unreal and a better way is to compare the value functions composed with the controlled processes. Noting that V n (T, X n T ) = X n T and V d (T, X d T ) = X d T it is easy to verify using (3.5) and (4.2) that for all (t 0 , x 0 ) ∈ [0, T )×R. This shows that the dynamically optimal control u d * from (3.4) outperforms the subgame-perfect Nash optimal control u n * from (4.1) in the unconstrained problem (2.4). A similar comparison in the constrained problems (2.5) and (2.6) is not possible since subgame-perfect Nash optimal controls are not available in these problems at present.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.