Dynamic minimisation of the commute time for a one-dimensional diffusion

Motivated in part by a problem in simulated tempering (a form of Markov chain Monte Carlo) we seek to minimise, in a suitable sense, the time it takes a (regular) diffusion with instantaneous reflection at 0 and 1 to travel to $1$ and then return to the origin (the so-called commute time from 0 to 1). Substantially extending results in a previous paper, we consider a dynamic version of this problem where the control mechanism is related to the diffusion's drift via the corresponding scale function. We are only able to choose the drift at each point at the time of first visiting that point and the drift is constrained on a set of the form $[0,\ell)\cup(i,1]$. This leads to a type of stochastic control problem with infinite dimensional state.


Introduction
Suppose that X µ is a diffusion on [0, 1], started at 0 and given by with instantaneous reflection at 0 and 1 (see [10] or [7] for details).Define T x to be the first time that the diffusion reaches x, then define S = S(X µ ), the commute time (between 0 and 1), by S(X µ ) := inf {t > T 1 (X µ ) : X µ t = 0} .In [8], motivated by a question arising in simulated tempering (see [2]), we considered the following problem (and several variants and generalisations): and where the drift at each level must be chosen at or before X µ 's first visit to that level and thereafter remains fixed.
The commute time is defined for random walks on graphs in [1].The original commute time identity (a version of which we give later in (3.1)), was only discovered in 1989 and first appeared in [3].We gave the optimal drift to minimise the quantity in (1.2) in Theorem 4.6 of [8], under the assumption that µ was already fixed on some interval [0, y), the drift is otherwise unconstrained and the starting state is in [0, y).We left open the question of the optimal control when µ is, initially, fixed on some other interval.The key observation in [8] was that we can follow the same solution as for the static case -where we choose the drift function at time 0 -because 'there can be no surprises' (in the path of X).This statement is no longer valid when the set on which the drift is constrained is not of the form [0, y) and we gave, as an example in Remark 4.1 of [8], a heuristic argument for why a different solution would be optimal in the case where the constraint set was of the form (i, 1].Our aim, in the current paper, is to present the solution (in Theorem 4.2) to the dynamic problem in this case, where the 'surprises' are how far down the controlled process gets before time T 1 .
The structure of the paper is as follows: in section 2 we give a formal definition of the model; section 3 is devoted to calculating the candidate value function; section 4 gives the proof that this is correct and we then give some concluding remarks.

The model and some notation
2.1.The model.Let X µ,i 0 t be a regular diffusion on [0, 1] with instantaneous reflection at 0 and 1, starting at i 0 ∈ [0, 1], and given by We need to define the set of admissible controls quite carefully.Two approaches are possible: the first is to restrict controls to choosing the drift µ whilst the second is to control the corresponding random scale function.We adopt the second approach, although we should caveat that the identified optimal policy will not, in general, be in this class (or, equivalently, the relevant infimum will not be attained by any policy in this class).We assume the usual Markovian setup, so that each stochastic process lives on a suitable filtered space (Ω, F, (F t ) t≥0 ), with the usual family of probability measures (P x ) x∈[0,1] corresponding to the possible initial values.Let X µ be the diffusion with instantaneous reflection as given in (1.1).Denote by s µ the standardised scale measure of X µ and by m µ the corresponding speed measure.
From now on, we consider the more general case where we only know that s and m are absolutely continuous with respect to Lebesgue measure (denoted by λ) so that, denoting the respective Radon-Nikodym derivatives by s ′ and m ′ we have For such a pair we shall denote the corresponding diffusion, when it exists, by X s .We emphasize that we are only considering regular diffusions with Brownian "martingale part" σ dB or, more precisely, diffusions X with scale functions s such that for some Brownian Motion B, so that, for example, sticky points are excluded (see [5] for a description of driftless sticky Brownian Motion and its construction, see also [4] for other problems arising in solving stochastic differential equations ) as are singular scale measures.
Remark 2.2.Note that our assumptions do allow generalised drift: if s is the difference between two convex functions (which we will not necessarily assume) then where s ′ − denotes the left-hand derivative of s, s ′′ denotes the signed measure induced by s ′ − and L a t (X) denotes the local time at a developed by time t by X (see [9] Chapter VI for details).
Remark 2.3.Essentially, we treat (2.1) as the canonical dynamics for our problem, but note that we shall consider random scale measures for which s(0) is not known at time 0.

The control setting.
As is usual, we will adopt a weak approach to the control problem so that we work on canonical pathspace Ω = D [0,∞) [0, 1], the space of paths in [0, 1] which are right-continuous with left limits indexed by [0, ∞) equipped with the Borel σ-algebra on Ω with respect to the Skorokhod metric and natural filtration (F t ) t≥0 with X : ω → ω (see [6] for details).
Let M be the set of scale functions/measures on [0, 1] that are absolutely continuous with respect to Lebesgue measure.Given a fixed scale function s 0 ∈ M and a Borel subset F of [0, 1], define the set M s 0 F as follows Then define the set of available controls Now define the admissible control policies A x,s 0 F to be the set of s ∈ C s 0 F such that the corresponding controlled process starting at x ∈ [0, 1] with random scale function s exists; in other words, there exists a p.m. P s,x : with X 0 = x P s,x − a.s. and (2.4) We denote the expectation corresponding to such a p.m. P s,x by E s,x .
Recall that T y is the first hitting time at level y of the process X, that is and S denotes the first time the controlled process reaches 0 after having hit level 1, that is Define the running infimum of X by setting We are able to choose s ′ (the derivative of the scale function s) dynamically, but only once for each level, and we seek to minimise.
More precisely, we will assume the following: Assumption 2.1.For a given level ℓ and a starting point i, with ℓ < i, suppose that s ′ has been fixed at every level on For a given positive cost function f , the control problem consists in finding 3. Heuristic for the optimal strategy.Strategy s has been fixed on C 0 := [0, ℓ) ∪ (i 0 , 1], and we need to determine how to proceed on [ℓ, i 0 ].Since we are only allowed to choose s ′ once for each level, we choose a strategy at y ∈ (ℓ, i 0 ) before time T 1 only if such a level is reached from above before hitting 1. Conversely, if I T 1 > y, then we need only choose the drift at level y after the process has hit level 1. Consequently we are not constrained to enable the process to hit level 1 (again) so may choose an arbitrarily large downward drift at any such level.
Our optimal control should respect this and we proceed to calculate the optimal control in this class.
3. The candidate optimal payoff 3.1.Initial calculations.The commute time identity: is generalised as follows.For any pair x, y ∈ [0, 1], and any s ∈ M, define the function It follows (by [8,Theorem 2.4]) that, defining the measure m f by and Now we suppose that s is an arbitrary (deterministic) scale function which is assumed to equal s 0 on the intervals [0, ℓ] and [i, 1] (for some ℓ ∈ [0, 1] fixed in advance) and which will be dynamically reset to give infinite downward drift on the interval [ℓ, We denote the corresponding (random) scale function by s * .Define We make the following standing assumption whose relevance follows: Remark 3.2.Suppose that s is a scale measure, then the Cauchy-Schwarz inequality tells us that = Φ(s, 0), so that Assumption 3.1 is a necessary condition for the existence of a scale function with finite payoff.
We compute the corresponding payoff of a controlled process {X t } t≥0 , where i is the starting point. Set and note that, since s ′ is fixed on t is the only one of the parameters in (3.3) which is not determined by s ′ restricted to C 0 .
To complete the (infinite-dimensional) state of the problem, we define Lemma 3.4.Let s ∈ M, then the payoff of the policy s * is given by where H is defined by and, as stated earlier, ρ : z → 2f (z) σ 2 (z) .
Proof.First note that s * = s on the event (I T 1 ≤ ℓ), whereas, on (I T 1 > ℓ), s * ′ = s ′ on (I T 1 , 1] and when X s * reaches I T 1 for the first time after time T 1 , the effect of infinite downwards drift is that X is instantaneously translated to level ℓ, and thereafter a reflecting (downwards) barrier is imposed at level ℓ.
It follows that where ) dt and X (l),x is the controlled process started at x and with a reflecting barrier at ℓ.It is easy to see that so that φ s ℓ (ℓ, 0) = κ, while, under the control s, with X starting at i, the distribution function of I T 1 is for i ≥ x ≥ 0 0 for x < 0. (3.6) Now equation (3.5) implies that which becomes (on integrating by parts and recalling (3.2 and (3.6)) , and e H = c s(z) e 1+t/c , we obtain ρHe H H ′ dz, as required.

3.2.
A calculus of variations approach.We wish to find (3.7) since our candidate optimal control lies in this class.To do this, we first treat H(ℓ) and H(i) as fixed parameters and then optimise over suitable values for these parameters.
Proof.Since φ is strictly decreasing, Ψ is finite and positive on (γ δ , ∞).Differentiating, we obtain Observe that the function g : and so establishing that Ψ is strictly convex.
We denote the convex conjugate of Ψ by Ψ * δ ,or just Ψ * , so that We have established the following: Then, V (i) is given by

The dynamic solution
Adopting the usual dynamic approach, we want to give the (conditional) future optimal payoff when I t = i and X t = x (with x ≥ i) (and the process has not yet reached level 1).Since we cannot control the process until it reaches I t again, the payoff is given by where Observe that κ = E * l l T 0 0 f (X u ) du. Equivalently, we can rewrite this as As is usual, to show that V really is the optimal payoff, we consider the processes S s corresponding to using the policy s until time t and then behaving optimally.What is non-standard here is the 'phase change' at time T 1 -after T 1 , S just looks like T 0 .Consequently, we enlarge the state by including a flag process F : so that the generic state becomes (s, X t , I t ∨ℓ, F t ) (or more precisely (s| Ct , C t , X t , F t ) where C t = [0.ℓ)∪ (I t , 1]), and define where Ṽ is our proposed 'post-T 1 ' payoff given by Theorem 4.1.Suppose that (S s t∧S ) t≥0 is a P s,i -submartingale for any admissible policy s and 1. E s,x,i,0 [S ŝ T 1 ] = V (s, x, i) for some admissible policy ŝ with ŝ ∈ A i,s 0 C ; 2. lim inf E ŝn ,x,i,1 [S ŝn S ] = Ṽ (s, C, x), for some sequence of admissible policies and ŝ * is an optimal policy.
Proof.Suppose that s is admissible and (S s t∧S ) t≥0 is a submartingale and Letting t ↑ ∞ we see, by dominated convergence, that Minimising over admissible s yields V(s, C, x, 1) ≥ Ṽ (s, C, x).Conversely, lim inf E ŝn ,x,i,1 [S ŝn S ] = Ṽ (s, C, x) implies that V(s, C, x, 1) ≤ Ṽ (s, C, x), establishing equality.Now suppose that S is admissible and (S s t∧S ) t≥0 is a submartingale and F 0 = 0 then Letting t ↑ ∞ we see that Since s is arbitrary we deduce V(s, C, x, 0) ≥ V(s, C, x, i).Conversely, E x,i,0 [S ŝ S ] = V (s, C, x, i) implies that V(s, C, x, 0) ≤ V (s, C, x, i), establishing equality.Now we establish our main result.
Theorem 4.2.The value function V is given by Proof.We prove that V and Ṽ satisfy the conditions of Theorem 4.1.
1. Consider S s t .Using the fact that where N is a local martingale.
So, to establish that S is a submartingale on [0, T 1 ], since I is a decreasing process which only decreases when I t = X t , it is sufficient to show that V s i (i, i) ≤ 0 and that there exists ŝ such that V ŝ i (i, i) = 0. Differentiating with respect to i yields Setting R δ = R = Ψ * ′ (r), we see that Substituting in (4.4) and applying (4.5) yields If we now use the fact, noted in [8], that for a, b ≥ 0, 2. Note that on the stochastic interval [[t ∨ T 1 , inf{u ≥ t : X s ∈ [ℓ, I T 1 ]}]] S s t is a P s,x -martingale, N say, and is constant on [[[t ∨ T 1 , inf{u ≥ t : X s ∈ [ℓ, I T 1 ]}]].It follows that, denoting V ′ (x) − V ′ (x−) by ∆V ′ (x), dS s t = dN t + ∆V ′ (i)dL i t + ∆V ′ (ℓ)dL ℓ t , (recall that L a t denotes the local time of X s at the level a by time t).Now ∆V ′ (ℓ) = 0 and ∆V ′ (i) = s * (i) mf (i) > 0 so we conclude that, since L ℓ is an increasing process, S s is a P s,x -submartingale on [[T 1 , S]].Now, defining