Dynamic programming principle for classical and singular stochastic control with discretionary stopping

We prove the dynamic programming principle (DPP) in a class of problems where an agent controls a $d$-dimensional diffusive dynamics via both classical and singular controls and, moreover, is able to terminate the optimisation at a time of her choosing, prior to a given maturity. The time-horizon of the problem is random and it is the smallest between a fixed terminal time and the first exit time of the state dynamics from a Borel set. We consider both the cases in which the total available fuel for the singular control is either bounded or unbounded. We build upon existing proofs of DPP and extend results available in the traditional literature on singular control (e.g., Haussmann and Suo, SIAM J. Control Optim., 33, 1995) by relaxing some key assumptions and including the discretionary stopping feature. We also connect with more general versions of the DPP (e.g., Bouchard and Touzi, SIAM J. Control Optim., 49, 2011) by showing in detail how our class of problems meets the abstract requirements therein.


Introduction
In this paper, we prove the dynamic programming principle (DPP) for problems of the form: sup τ,α,ξ J(τ, α, ξ) where τ is a stopping time and α and ξ are, respectively, a classical control and a singular control.The objective function J(τ, α, ξ) is in the form of an expected reward that depends on the path of the stochastic dynamics of a controlled diffusion process X α,ξ and on the amount of control exerted.The optimisation runs over a random time horizon which is determined as the smallest between a deterministic time T and the first time the state-dynamics leaves a given Borel set O. Furthermore, we treat both the so-called finite-and infinite-fuel problem, meaning that the total amount of singular control that can be exerted is either capped or uncapped (see Karatzas [26,27]).
The DPP is easy to state, and rather intuitive, and it quite literally forms the backbone of the whole stochastic (and deterministic) control theory.That is why it is important to develop a full understanding of all the mechanisms underpinning DPP and its rigorous mathematical proof across different classes of stochastic control problems.Moreover, DPP goes hand in hand with the theory of viscosity solutions for partial differential equations (see Crandall, Ishii and Lions [12]).In that respect, the DPP is the first step to prove that the value function of a stochastic control problem is a viscosity solution to a suitable (problem-specific) Hamilton Jacobi Bellman (HJB) equation.The link to viscosity theory opens the door to the study of stochastic control problems via PDE methods that are often more versatile than the classical techniques based on Sobolev-type solutions (strong and weak).Notably, Bayraktar and Sirbu (see, e.g., [4,5,40,41]) developed an alternative approach to showing that the value functions of certain stochastic control problems/games are viscosity solutions of the corresponding HJB equations.Their approach does not require the DPP, which is instead obtained as a by-product.It does not seem to us that a direct application of those results is immediate in our set-up.
Over the course of the past three decades mathematicians have increasingly come to realise that there are numerous subtleties hidden in a rigorous mathematical proof of the DPP.Perhaps the best known of such subtleties concerns the so-called 'measurable selection' (see, e.g., Soner and Touzi [42] and references therein) which is needed in order to concatenate ε-optimal controls starting from random initial conditions.That becomes problematic when the value function is only known to be measurable but it is not an issue when, for example, the value function is known to be continuous.Work by Bouchard and Touzi [7] and Bouchard and Nutz [6] develop a notion of weak DPP that overcomes the measurable selection problem without even requiring continuity of the value function (an extension of those ideas to the case of non-linear expectations is provided by Dumitrescu et al. in [16]).It is also worth mentioning an approach based on optimisation over families of probability measures, associated to controlled dynamics, on the space of càdlàg paths as in, e.g., El Karoui and Tan [17,18] or Zitkovic [46].Further delicate technical problems arise from the use of regular conditional probabilities and the role played by null sets when changing the so-called reference probability system while following the trajectories of the controlled dynamics.Those difficulties are indicated in the monograph by Fabbri, Gozzi and Swiech [19] and in the work by Claisse, Talay and Tan [11], which we take as the main building blocks for our study.
For an overview of classical results on the DPP we refer to traditional monographs on stochastic control (e.g., Krylov [30], Fleming and Soner [20], Yong and Zhou [44]) and to the references therein.Due to the singular control feature, our work is closely related to work by Haussmann and Suo [21,22] who originally developed the DPP for singular control problems and their connection with viscosity solutions of suitable HJB equations.Ma and Yong [32,33] extended the results by Haussmann and Suo to a more general set-up and gave sufficient conditions under which the value function of the problem is the unique continuous viscosity solution of a suitable HJB equation.In a one-dimensional setting Chiarolla [10] obtained analogous results when the diffusion coefficient of the singularly controlled dynamics is not Lipschitz, which poses additional technical difficulties in proving continuity of the value function.Atar and Budhiraja [2] study DPP for state-constrained singular stochastic control problems and obtain that the value is the unique viscosity solution of the corresponding HJB equation.In particular, their controlled state process is a Brownian motion constrained to evolve inside a cone.
Our proof of the DPP encompasses a framework that is more general than in the papers from the paragraph above (except that we do not have a constraint on the controlled dynamics as in [2]).We include the discretionary stopping feature, the exit time from the domain O and we allow the cost of exerting singular controls to be state-dependent.We also avoid making specific assumptions on the problem data, which other papers normally introduce as sufficient conditions for growth estimates and continuity of the value function.Instead we shift the focus to mild regularity of the objective function J and pathwise uniqueness of the controlled dynamics (Assumptions 3.4 and 3.7).Of course our assumptions are implied by the more specific ones made, e.g., in [33].Continuity of the value function is also not required but it is replaced by a weaker condition on the convergence of the expected values of suitable stochastic processes (Assumption 3.9).Again, that requirement is satisfied if, for example, the value function can be shown to be continuous.Finally, we do not impose that the gain functions and costs appearing in the objective function J have a sign.
Works in [6,7,17,46] provide the DPP under great generality.It seems reasonable to expect that a suitable adaptation and combination of the results and techniques in those papers would allow to devise a DPP in our setup.However, the task is highly non-trivial.The generality attained in [6,7,17,46] relies in part on abstract overarching assumptions, including concatenability of controls and stability under conditioning (in the language of [17]) that need to be verified on a caseby-case basis.Our work is self-contained and complements those results: we present a constructive approach based on probabilistic concepts and tools from the general theory of stochastic processes, under assumptions for which we also provide simple sufficient conditions with wide applicability.The overall philosophy in our paper is certainly inspired by [19] but the actual derivation of key technical results requires a different line of argument, due to the structural differences between our set-up and that in [19].
The class of problems we consider encompasses modelling features that have already attracted sustained attention from the scientific community.For applications in singular control, it is interesting to consider exit times from a domain O as they do appear, for example, in the famous optimal dividend problem (see, e.g., Jeanblanc and Shiryaev [24]); in that context, O is the solvency region.Discretionary stopping is also a desirable feature in several models addressed in the literature.In the case with only classical controls and without exit time from a domain we find work by Karatzas and Zamfirescu [29] characterising martingale properties of the value process.In a similar setting but adopting relaxed controls (a notion close to that of randomised controls) we find work by Bassan and Ceci [8], who prove that the value function is a viscosity solution of a suitable HJB equation.Explicit solutions in some particular problems of singular control with discretionary stopping over an infinite-time horizon are obtained by Davis and Zervos in [13] for one dimensional controlled dynamics.In a similar set-up, Morimoto [35] adds also an exit time of the controlled dynamics (a controlled geometric Brownian motion) from an interval of the real line.Using variational methods and penalisation techniques [35] proves that the value function solves a suitable HJB equation.A finite-fuel singular control problem with discretionary stopping and infinite-time horizon is solved by Karatzas et al. in [28] in closed form using free boundary methods applied to a parametric family of ordinary differential equations.Chen and Yi [9] use (parabolic) PDE methods and free boundary theory to solve a finite-time horizon problem of singular control with discretionary stopping with one dimensional controlled dynamics.Our contribution to this stream of the literature is to provide a rigorous derivation of the DPP, which was missing so far, in sufficient generality to cover all the models mentioned above and more.
The paper is organised as follows.In Section 2 we collate some notation that will be used throughout the paper.In Section 3 we set-up the problem and state standing assumptions.The main results of the paper are presented in Section 4 but their proofs are given in subsequent sections.In particular, in Section 5 we prove independence of the value function of our problem from the reference probability system adopted; this leads to the equivalence of weak and strong formulation of the problem and to several useful equivalences in law of the controlled dynamics under different reference probability systems.In Section 6 we use the technical results from Section 5 to finally prove the DPP and the other results stated in Section 4. The paper is completed by a technical appendix gathering useful results (largely known) on regular conditional probabilities.

Notation and terminology
In this section we summarise notations and terminology adopted throughout the paper.While this is a useful compendium of symbols, it can be skipped at a first read as all concepts will be introduced in the paper when they first appear.
(a) T > 0 is the time horizon, d, d ′ , l ∈ N denote dimensions of various state processes.
(b) For a vector x ∈ R d , we denote by |x| its Euclidean norm.For x, y ∈ R d , we denote by x, y = d i=1 x i y i the inner product.Given a set A ∈ R d , we denote A c = R d \ A. For z ∈ R, we denote (z) ± = max{0, ±z}.
(c) For a Polish space Γ, we denote by B(Γ) the Borel σ-algebra on Γ.
(d) For a random variable X : (Ω, F) → (Γ, S), defined on a probability space (Ω, F, P) and with values in a measurable space (Γ, S), we denote by L P (X) its law.Notice that, in particular, X could be a stochastic process (see, e.g., [25,Page 24]).
(e) For a bounded variation function f : [0, T ] → R d with f = (f 1 , . . ., f d ), we denote by f ± = (f 1,± , . . ., f d,± ) the two components of its Jordan decomposition.Namely, for every i = 1, . . ., d, we have For each i, we denote by Then, the variation of f reads (h) For t ∈ [0, T ], a reference probability system starting at time t is a 5-tuple ) and F t s is the augmentation of F t,0 s with the P-null sets.The class V t contains all reference probability systems starting at time t ∈ [0, T ].
(i) We say that a reference probability system ν ∈ V t is standard if there exists a σ-algebra F 0 such that F t,0 T ⊆ F 0 ⊆ F, where F is the completion of F 0 with the P-null sets and (Ω, F 0 ) is a standard measurable space.Recall that a measurable space is standard if it is Borel isomorphic to, e.g., (N, B(N)) with B(N) the Borel σ-algebra for the discrete topology.
(j) For t ∈ [0, T ], the canonical reference probability system (starting at time t) is the 5-tuple ) and B t s is the augmentation of B t,0 s with the P * -null sets.(k) For a fixed ν ∈ V t , we denote by A ν t the collection of {F t s }-progressively measurable processes α = (α s ) s∈[t,T ] , taking values in a (possibly compact) subset K ⊆ R l (l ∈ N as in (a)).
(l) For a fixed ν ∈ V t and a given u ∈ [t, T ], we denote by X ν u the collection of processes ξ = (ξ s ) s∈[t,T ] such that (i) ξ is R d -valued and {F t s }-adapted.(ii) ξ is left-continuous and of bounded variation P-a.s.(iii) ξ s = 0 for every s ∈ [t, u], P-a.s.(iv) E |V [u,T ] (ξ)| p < ∞, for some fixed p > 0 (depending on the problem).
Similarly, for a given random variable ζ ∈ [0, ∞), P-a.s., we denote by X ν u (ζ) the class of finite-fuel controls, i.e., those for which condition (iv) above is replaced by and a F t u -measurable random variable Z ≥ z, P-a.s., we set and define the truncation of ξ ± at Z (after time u) by (ξ ± s∧σ Z ) s∈[t,T ] .The increments after time u of the truncated process are denoted by (n) For a fixed ν ∈ V t and a given u ∈ [t, T ], we denote by T ν u the collection of {F t s }-stopping times such that u ≤ τ ≤ T , P-a.s.
(q) For a fixed ν ∈ V t we denote by P Ω the σ-algebra of {F t,0 s }-predictable sets.Recall that P Ω is a σ-algebra on [t, T ] × Ω and it is generated by the sets of the form (s, u] × A with t ≤ s < u ≤ T and A ∈ F t,0 s− and of the form {t} × A, with A ∈ F t,0 t .Notice that by left-continuity of the raw Brownian filtration, we have F t,0 s− = F t,0 s for every s ∈ (t, T ] (and by convention F t,0 t s }-predictable.

Problem formulation
Let p > 0, d, d ′ , l ∈ N. Fix t ∈ [0, T ] and let V t be the collection of reference probability systems starting at time t.That is, ν ∈ V t means: ), where (Ω, F, P) is a complete probability space equipped with a d ′ -dimensional Brownian motion W = (W s ) s∈[t,T ] starting at time t, i.e., P(W t = 0) = 1, and F t s is the augmentation of F t,0 s := σ(W u : u ∈ [t, s]) with the P-null sets.With no loss of generality, we assume that t → W t (ω) is continuous for all ω ∈ Ω.For completeness we should use the notation W t for the Brownian motion, in order to keep track of the starting condition W t t = 0. We omit this notation here as the time t will be fixed (but arbitrary) throughout the paper.
Here V [t,T ] (ξ) is the (random) total variation of the process ξ over the time interval [t, T ] defined as the sum of the variations of each coordinate ξ i , i = 1, . . .d; that is, where ξ i = ξ i,+ − ξ i,− is the Jordan decomposition of the i-th entry of the vector ξ and we use monotonicity of ξ i,± .Alternatively, we can also consider control-stopping trebles (τ, α, ξ) where for a given random variable ζ ∈ [0, ∞), P-a.s., we denote by X ν t (ζ) the class of finite-fuel controls, i.e., those satisfying (iii) above but when the condition The results in this paper hold for both types of control pairs (finite and infinite fuel) and we will explicitly refer to small differences in the arguments of proof as needed.For future reference, given u ∈ [t, T ], we also introduce the subsets X ν u ⊂ X ν t and X ν u (z) ⊂ X ν t (z) of processes from (iii) above but such that P(ξ s = 0 for s ∈ [t, u]) = 1.
Given an admissible pair (α, ξ) the controlled state process for our problem follows the stochastic differential equation (SDE) (3.1) We will assume that the SDE (3.1) admits a unique, caglad, {F t s }-adapted solution for any admissible pair (α, ξ), up to indistiguishibility (see Assumption 3.4).
Sometimes it is convenient to denote the solution of (3.1) by X t,x;ν;α,ξ to highlight the dependence on the reference probability system ν, the admissible controls (α, ξ) and the initial condition (t, x).However, when no confusion shall arise we may also use the notations X t,x;α,ξ , X t,x , X α,ξ or simply X depending on the circumstances.Similarly, we denote by ρ t,x;ν;α,ξ O the first exit time of the process X t,x;ν;α,ξ from a domain O ∈ B(R d ), i.e., (3.2) When no confusion shall arise we may also use the simpler notations ρ t,x;α,ξ O , ρ α,ξ O and ρ O .Given an initial condition (t, x) ∈ [0, T ] × R d , a reference probability system ν ∈ V t and an admissible treble (τ, α, ξ) ∈ T ν t × A ν t × X ν t , the objective function in our problem reads The integrals with respect to the controls ξ ± are Lebesgue-Stieltjes integrals and are understood as We allow for different costs c + and c − associated to the two increasing processes ξ + and ξ − , respectively.For the sake of generality, c ± take values in R d so that negative costs may be allowed too.
Remark 3.1.It is worth emphasising that while in the case of state-independent costs, c ± (t), there is a unique interpretation of the integrals above, in the state-dependent case, c ± (t, x), the classical Lebesgue-Stieltjes integral is known to cause some technical problems with the use of Hamilton-Jacobi-Bellman (HJB) equations.This fact is well-illustrated in [1] where the absence of an admissible optimal control is demonstrated in a one-dimensional problem.This misalignment can be resolved, at least in the case when the cost of control is the same in all directions, i.e., c i ± = c j ± = c for all i, j, by taking a different type of integral.Namely, for the singular control one should use the representation ξ t = [0,t) n s d|ξ| s , where n s ∈ R d is a unitary vector, (s, ω) → n s (ω) is progressively measurable and |ξ| s (ω) denotes the total variation of ξ(ω) on [0, s].Then, the cost per unit of control exerted is defined as where ξ c is the continuous part of ξ.With this formulation, it is normally possible to connect the singular stochastic control problem with a HJB equation with gradient constraint (see, e.g., [45]).From the point of view of our analysis, the specific choice of the integral is irrelevant, as long as it is a measurable function of the paths of the controlled state process.So we avoid delving further into this matter as our results continue to hold under, for example, the specification in (3.4).
The controller-stopper aims at maximising the objective function J ν t,x (τ, α, ξ) over all admissible trebles (τ, α, ξ).To simplify the notation, we set A priori there are two formulations of the problem: In the case of finite fuel, we must fix the total fuel z ∈ [0, ∞) and add one state variable to the problem that accounts for the remaining fuel at each moment in time.That means that we will consider the state process (s, X α,ξ s , Z ξ s ) s∈[t,T ] with the additional dynamics where z ∈ [0, z].Analogously to X, we may use the simpler notations Z t,z;ξ and Z when no confusion would arise.The objective function may be taken as in (3.3) or it may also depend on the total amount of fuel exerted.In the latter case, we denote it by where now ρ O = inf{s ≥ t : (X α,ξ s , Z ξ s ) / ∈ O} ∧ T for some O ⊂ R d × [0, z] and with obvious changes to the domains of the functions f , g 1 and g 2 .It will be completely clear from our analysis below that our results hold if we take an even more general form of the cost per unit control exerted, i.e., c ± (s, X s , Z s ).We refrain from adding that extension to avoid further notational complexity.Instead, to simplify the notation, we set The value functions in the two formulations with finite fuel read as follows: As it will be shown, weak and strong formulations (both for finite and infinite fuel) are in fact identical.The dual approach is, however, essential in order to prove the DPP (see Remark 6.3).Remark 3.2.We adopt the same terminology as in [19, Section 2.1].The term "weak" in (3.6) and (3.9) refers to the fact that the reference probability system can vary together with the controls whereas in the "strong" formulation ν is fixed (see (3.5) and (3.8)).
Remark 3.3.Our setup is well suited to cover also the following situations: (a) Controls that do not operate in all directions, i.e., ξ i ≡ 0 for some i = 1, 2, . . ., d.(b) Monotone controls, i.e., either ξ + ≡ 0 or ξ − ≡ 0. We can accommodate fully degenerate controlled diffusions.In particular, this allows to consider a state-dependent discount factor in the definition of the objective function J ν t,x and J ν t,x,z by taking, e.g., ), for some functions f , ḡi , i = 1, 2 (and analogously for the costs c ± ).
Throughout the paper we make a number of standing assumptions that simplify the exposition.Such assumptions concern mainly the controlled dynamics and the objective function and can be checked on a case-by-case basis in practical applications.In particular, ideas contained in [32] for singular control problems can be easily adapted to our setting and, more specifically, [34, Ch.III, Sec.9] contains mild sufficient conditions for problems of singular control with discretionary stopping.Assumptions 3.4-3.8below hold throughout the paper, whereas Assumption 3.9 is only needed in Theorems 4.4 and 4.5.Assumption 3.4 (Indistinguishability).For every admissible pair (α, ξ), the SDE (3.1) admits a unique {F t s }-adapted solution, up to indistiguishibility.When V [ t, T ](ξ) is p-integrable with p ≥ 2 and µ and σ are Lipschitz-continuou, Assumption 3.4 holds by standard SDE techniques.For a more general case, when no integrability on ξ is assumed, one can use results from [15].Assumption 3.4 also yields the following simple lemma.
For concreteness, we assume some integrability of the reward/cost functions appearing in the objective of our optimisation.Given an admissible treble (τ, α, ξ) on a reference probability system ν, we introduce the process with (X, Z) = (X t,x;α,ξ , Z t,z;ξ ).An analogous process is clearly defined for the infinite-fuel problem, by dropping the dependence on Z.We also denote (x) ± := max{0, ±x}.

Assumption 3.6 (Objective function I).
There is a constant ḡ > 0 such that for i = 1, 2 it holds g i (t, x, z) ≥ −ḡ for all (t, x, z).Moreover, for any (t, x, z) and any admissible treble (τ, α, ξ) in a reference probability system ν, we have It is immediately satisfied when, for example, the total variation of ξ is at least p-integrable with p ≥ 1, f is bounded from below and c ± are bounded from above.If the coefficients of the SDE (3.1) are Lipschitz, then when p ≥ 2 it is enough to assume that |f for some c > 0 and c ± bounded from above.
We also impose some continuity on the objective function.This is easy to state for infinite-fuel problems but it requires an additional notion of truncated controls in the case of finite-fuel problems.
The continuity requirements for the objective function can be easily verified in the infinite-fuel case when the coefficients in the SDE (3.1) are Lipschitz continuous, the functions f , g 1 and g 2 are, for example, Hölder continuous and c ± are functions of time only (see [34, Proposition III.9.5]).In the finite-fuel case, the uniform bound on the total variation of the singular controls allows to prove these continuity requirements (by a similar argument) also if c ± depend on the space variable and are, e.g., Hölder continuous.The next assumption is a minimal technical assumption on measurability and finiteness of the problem's value function in its weak formulation, that guarantees well-posedness of the optimisation problem.
It is not difficult to check that in the infinite-fuel case Assumptions 3.7 and 3.8 imply that for each u ∈ [t, T ], given any ε > 0 there exists δ > 0 such that Analogously for the finite-fuel case, for each u ∈ [t, T ] and any ε > 0, there exists δ > 0 such that Here, notice that the inequality v(u, x 1 , x 2 ) ≤ v(u, x 2 , z 2 ) + ε follows easily from Assumption 3.7 since z 2 ≥ z 1 , whereas the opposite inequality is slightly more delicate and can be shown as follows.By Assumption 3.7, we have that if ξ and thus we obtain For our last assumption, given an admissible treble (τ, α, ξ) let the process M = (M ν;α,ξ u∧τ ) u∈[t,T ] be defined as with (X, Z) = (X ν;α,ξ , Z ν;ξ ).Notice that M u = N u + v(u, X u , Z u ) on the event {τ ∧ ρ O ≥ u}.An analogous definition holds in the infinite-fuel problem if we drop the dependence on Z.
Recalling that s → (X s , Z s ) is left-continuous P-a.s., the convergence in (3.13) can be obtained if, e.g., s → v(s, X s , Z s ) is also left-continuous and if the dominated convergence theorem applies.Continuity of the value function and suitable growth estimates are therefore sufficient.
Continuity of v is generally satisfied when, for example, the coefficients in the SDE (3.1) are Lipschitz continuous, the functions f , g 1 and g 2 are Hölder continuous and the costs c ± depend continuously on time only (see Corollary III.9.8 and Proposition III.9.10 in [34] for the infinite-fuel case when p ≥ 2; the finite-fuel case is analogous.See also [32,Theorem 3.3] for singular control problems with both finite-and infinite-fuel).
As for the growth estimates, dominated convergence theorem can be applied in the finite-fuel case if, e.g., f , g 1 , g 2 have linear growth and c ± are bounded (thanks also to standard SDE estimates).In the infinite-fuel case, when the total variation of ξ is p-integrable with p ≥ 1, the functions f , g 1 , g 2 are bounded from above and c ± are bounded from below (as assumed in [32]) the resulting value function is bounded.

Main results
Under the standing Assumptions 3.4-3.8we obtain the following versions of the dynamic programming principle (DPP), which are the main results in this paper.The proofs are distilled in Section 6 and build upon a series of technical lemmas and propositions.
The first two theorems state the DPP for deterministic times in both the infinite-and finite-fuel setting, respectively.
We also have a more probabilistic interpretation of the above results.We state it in the next proposition, where we recall the definition of the process M in (3.12) and N in (3.10).Proposition 4.3.For any admissible treble (τ, α, ξ) the process (M ν;α,ξ s∧τ ) s∈[t,T ] defined in (3.12) is a supermartingale in the reference probability system ν.Assume further that the treble (τ * , α * , ξ * ) is optimal and that Then, the associated process (M ν;α * ,ξ * s∧τ * ) s∈[t,T ] is a martingale.The results hold for both finite and infinite fuel.
Finally, under the additional Assumption 3.9, we obtain the DPP for stopping times.
In the next sections we will develop the theoretical framework that allows to prove the main results stated above.The key steps are two: (1) Showing the equivalence of strong and weak formulation via the so-called independence of the reference probability system (Section 5); (2) Combining the use of strong and weak formulation with the use of regular conditional probabilities to arrive at the DPP (Section 6).For our analysis we follow closely the approach and main ideas in [19,Chapter 2], where the DPP is obtained in an infinite-dimensional setting.In [19] only classical controls are considered and without discretionary stopping or exit times from a given domain.As stated in [19, Remark 2.15], the fine technical details of the proofs are extremely sensitive to variations in the problem setting and particularly to conditions imposed on the class of admissible controls that go beyond their measurability (for instance, left-continuity, bounded variation and the integrability/finite-fuel condition for singular controls, in our case).Additional difficulties arise from the discretionary stopping and the exit time ρ O .Thus, we develop specific arguments to address our needs.

Independence of the reference probability system
In this section we show that the problem is independent of the choice of the reference probability system ν ∈ V t and thus that the strong formulation (3.5) and the weak formulation (3.6) coincide (respectively, for finite-fuel, (3.8) and (3.9)).In particular, for every (t, x) ∈ [0, T ] × R d and ν ∈ V t given and fixed, we are going to show that we have (5.1) and analogously for the finite-fuel case.We develop all arguments in this section for the infinite-fuel problem for notational simplicity and, when necessary, we show that their analogue for the finitefuel setting holds with obvious changes.From now on, let (t, x) ∈ [0, T ] × R d be fixed (analogously are fixed in the finite-fuel setting).Unless stated otherwise, ν ∈ V t is an arbitrary reference probability system.It is useful to note an equivalent representation of stopping times in terms of a non-decreasing process.Indeed, for τ ∈ T ν t we can define η τ s = 1 {s>τ } for s ∈ [t, T ] so that (η τ s ) s∈[t,T ] is nondecreasing, left-continuous, with a single jump at time τ .Motivated by this simple observation, given u ∈ [t, T ], we denote by E ν u the collection of processes η = (η s ) s∈[t,T ] such that (i) η is {F t s }-adapted.(ii) η is left-continuous and non-decreasing P-a.s.(iii) η s ∈ {0, 1} for every s ∈ [t, T ] and η s = 0 for every s ∈ [t, u], P-a.s.Then, η τ ∈ E ν t for τ ∈ T ν t and, conversely, given η ∈ E ν t we define a {F t s }-stopping time Clearly E ν t ⊂ X ν t , so that properties which we will prove below for elements of X ν t immediately hold for elements of E ν t as well.Noticing that 1 {s≤τ } = 1 − η τ s and 1 {s<τ } = 1 − η τ s+ we can also rewrite the objective function in another convenient form.For every admissible treble (τ, α, ξ) we have Conversely, for every (η, α, ξ) ∈ E ν t × A ν t × X ν t , we have (5.4)I ν t,x (η, α, ξ) = J ν t,x (τ η , α, ξ).For the finite-fuel case J ν t,x,z (τ, α, ξ) = I ν t,x,z (η τ , α, ξ) and I ν t,x,z (η, α, ξ) = J ν t,x,z (τ η , α, ξ) by the exact same argument.For simplicity, but with a slight abuse of notation, we sometimes say that (η τ , α, ξ) is an admissible treble belonging to either Adm ν t or Adm ν t,z−z , provided that (τ, α, ξ) is such.The first difficulty in establishing the equivalence in (5.1) is that null-sets and probabilistic properties vary along with the underlying reference probability systems.We now show that while all processes are adapted to the P-augmented Brownian filtration {F t s }, it is however possible to select representatives (in a suitable sense) that are adapted to the raw Brownian filtration {F t,0 s }.For classical controls the result can be found directly in [19], hence we do not prove it here.We refer the reader to the final item in our Section 2 for a formal definition of {F t,0 s }-predictable process used below.
A slightly stronger result, that we prove in the next lemma, holds for general left-continuous {F t s }-adapted processes and hence also for those in X ν t and E ν t , for the state process X and for the fuel process Z.An analogous result is stated without proof in [14] after Theorem IV.78.Lemma 5.2.Let ν = (Ω, F, P, {F t s }, W ) ∈ V t and γ be an R d -valued, {F t s }-adapted process which is P-a.s.left-continuous.Then, there exists an {F t,0 s }-predictable process γ 0 which is indistinguishable from γ.
Our next goal is to show that any {F t,0 s }-predictable process on a reference probability system ν can be expressed as a deterministic, measurable function of the Brownian paths.For that we must introduce the canonical reference probability system: ) and B t s is the augmentation of B t,0 s with the P * -null sets.
Since Ω * with the usual supremum norm is a Polish space, then (Ω * , B(Ω * )) is a standard measurable space (see, e.g., [37, Chapter V, Theorem 2.2]) and so ν * is a standard reference probability system.We denote by P Ω * the σ-algebra of {B t,0 s }-predictable sets (see the last item in Section 2).
Lemma 5.3.Let ν = (Ω, F, P, {F t s }, W ) ∈ V t and γ be an R d -valued, {F t,0 s }-predictable process.There exists a Since the sets of the form A r s,u and A t generate P Ω , we obtain that [25,Lemma 1.13]).
Next we show how the map ψ : [t, T ] × Ω * → R d in (5.5) can be used to connect admissible controls in different reference probability systems that have the same law.The first lemma below is a standard result on distributional properties induced by measurable maps.Proof.We use a monotone class theorem.Let Q 0 be the collection of all P Ω * /B(R n )-measurable bounded functions ψ and let Q 1 ⊂ Q 0 be the subset of those ψ's for which L P (γ) = L P(γ).Clearly Q 1 contains indicators of the sets that generate P Ω * and functions that are finite products of such indicators.Let us denote the latter set by Q 2 , so that Q 2 is closed under multiplication and by dominated convergence.Therefore, by [14, Th.I.21, p.14] Q 1 contains all bounded σ(Q 2 )measurable functions and since σ(Q 2 ) = P Ω * we obtain the claim for bounded and P Ω * /B(R n )measurable functions.Given a generic P Ω * /B(R n )-measurable function ψ which is not necessarily bounded, we can approximate it with bounded functions ψ M by truncation so that ψ M → ψ pointwise as M → ∞.Then, where γ M s = ψ M (s, W • ) and γM s = ψ M (s, W• ), and the second equality holds by the first part of the proof.
Setting ξ = ξ+ − ξ− , it remains to prove integrability (or finite-fuel) property of ξ.If ξ ∈ X ν t (z), one simply adds the P-a.s.condition V [t,T ] (ξ) = d i=1 ξ i,+ T + ξ i,− T ≤ z in (5.9) and thus obtains that where Ẽ denotes the expectation under the measure P.This will follow once we prove (5.6).By an analogous construction to the one for ξ, we also find a { Ft,0 s }-predictable process ητ ∈ E ν t from which we obtain the stopping time τ as in (5.2).Next we show that the equality in law (5.6)holds, by proving that all finite-dimensional distributions of (α, ξ, η τ , W ) and (α, ξ, ητ , W ) are the same (see, e.g., [25, Proposition 2.2.]).Fix a finite sequence of times {s k } n k=1 ⊆ [t, T ] and sequences of vectors Then, by dominated convergence and left-continuity of η τ and ξ, we obtain where the limit is understood as a limit over sequences {r j k } j∈N such that r j k ↑ s k for each k and r j k ∈ D for each (j, k).By (5.10) we know that ξs (ω), ητ Therefore by (5.7), (5.8) and Lemma 5.4 we have lim where the final equality is by dominated convergence and P-a.s.left-continuity of ( ξ, ητ ).Then (5.6) holds.That also implies Ẽ The equality in law under different reference probability systems extends also to the processes X and Z and the stopping time ρ O , as illustrated in the next lemma, where we use the same notation as in Lemma 5.5.The proof of the lemma relies on a result by Kurtz [31] about strong solutions of stochastic equations, that generalises the classical results by Yamada and Watanabe [23, Ch.III].Informally, [31] states that if Y is a stochastic input and X is a stochastic output of an equation of the form (5.11) Γ(X, Y ) = 0, then pointwise uniqueness (in the language of [31]) implies X = Ψ(Y ) for some measurable function Ψ.Moreover, Ψ is uniquely determined in the sense that if ( X, Ỹ ) is another pair solving (5.11) with Ỹ = Y in law, then X = Ψ( Ỹ ).We will apply this result in the case that X and Y are càglàd processes and, therefore, pointwise uniqueness coincides with pathwise uniqueness.

Dynamic programming principle
Using regular conditional probabilities, in this section we prove the main results of the paper: Theorems 4.1 and 4.2, Proposition 4.3 and Theorems 4.4 and 4.5.In Appendix A we provide a detailed digression on regular conditional probabilities, for completeness, while here we only introduce the related notation.
Unless otherwise stated, we fix (t, x, z) ∈ [0, T ] × R d × [0, z] and a standard reference probability system ν = (Ω, F, P, {F t s }, W ) ∈ V t .Then, we also fix u ∈ (t, T ) and denote by P ω the regular conditional probability on (Ω, F) given F t,0 u .That is, P ω (A) = P(A|F t,0 u )(ω) for every A ∈ F for P-a.e. ω.The expectation with respect to P ω is denoted by E ω and the σ-algebra F ω is the completion of F 0 with the P ω -null sets.To be precise, we should use P u ω instead of P ω in order to keep track of the time u with respect to which we evaluate the regular conditional probabilities.However, u will be fixed throughout and so we can use a simpler notation.Let W u s := W s − W u for s ∈ [u, T ] be the increments of W after time u and let F u,0 s := σ(W u r : r ∈ [u, s]) be the raw filtration generated by such increments.We denote by F u,ω s the augmentation of F u,0 s with the P ω -null sets.Now, for P-a.e. ω ∈ Ω, we can define a standard reference probability system ν ω ∈ V u as which will be frequently needed in the proofs below (it is indeed shown in Proposition A.4 that ν ω ∈ V u and so, in particular, W u is a {F u,ω s }-Brownian motion under P ω for P-a.e ω ∈ Ω).In order to simplify notations in some proofs, for s ∈ [t, T ], on the event {s ≤ τ ∧ ρ O }, let us set where (X, Z) = (X ν;α,ξ , Z ν;ξ ) for a given couple of admissible controls (α, ξ) on ν.Then, from the same arguments as in (5.3) we have (6.3) and, setting s = t, Clearly, removing the state variable Z in the definitions of Γ and Λ we obtain analogous expressions for the infinite-fuel case.
Remark 6.1.In the proofs of this section we often use the expression " P-a.e. ω" to indicate that a certain property Π i holds on a set Ω i ∈ F with P(Ω i ) = 1.Clearly, the nature of the set Ω i depends, in general, on the property Π i of interest.Since in our proofs we only consider a finite number of properties Π 1 , Π 2 , . . ., Π n then the expression " P-a.e. ω" refers to a universal set Ω ′ := ∩ n i=1 Ω i ∈ F with P(Ω ′ ) = 1 and such that all properties Π 1 , Π 2 , . . ., Π n hold for all ω ∈ Ω ′ .
Here we prove Theorem 4.2.The proof of Theorem 4.1 is similar but easier as it involves one fewer state variable, so we omit it in order to avoid repetitions.
The proof is split into two steps.
By indistinguishability of X α 0 ,ξ 0 and X 0 we have P-a.s., for all s ∈ [u, T ], Since (X 0 u , Z 0 u ) is F t,0 u -measurable, by a well-known property of regular conditional probabilities (see (A.2) in the Appendix), we have It is also recalled in Lemma A.1 that if P(Ω 1 ) = 1 for some Ω 1 ∈ F, then Ω 1 ∈ F ω and P ω (Ω 1 ) = 1 for a.e.ω ∈ Ω.Thus, taking Ω 1 as the set where the SDE (6.10) holds, we have that P ω -a.s., for P-a.e. ω ∈ Ω for every s ∈ [u, T ].Likewise P ω (ρ O = ρ 0 O ) = 1 for P-a.e. ω ∈ Ω.There are two subtle points related to (6.11): (i) a first one (raised in [11]) is that the stochastic integral in (6.11) is constructed with respect to the regular conditional probability P ω and it is P-indistinguishable from the original one (see Lemma A.5 for details); (ii) a second one is that it is possible to check that the treble (η 0,u , α 0 , ξ 0,u ) belongs to the admissible class Adm νω u,z−Z 0 (ω) for P-a.e. ω ∈ Ω (see Proposition A.4). Finally, X 0 is {F t,0 s }-adapted and therefore it is {F u,ω s }-adapted for P-a.e. ω (see Lemma A.3).In conclusion, up to P ω -indistinguishability, the process (X 0 , Z 0 ) is the unique solution of (6.11) in the reference probability system ν ω , for P-a.e. ω ∈ Ω. Consistently with the notation introduced in Section 3 around the SDE (3.1) and the process Z in (3.7), we should say that for P-a.e. ω the process (X 0 s , Z 0 s ) s∈[u,T ] is indistinguishable from the pair X u,X 0 u (ω);νω;α 0 ,ξ 0,u and Z u,Z 0 u (ω);νω;ξ 0,u but we avoid such heavy notation as no confusion shall arise.
Notice that, on the event {ρ 0 O ≥ u}, we have so that ρ 0 O is equal to the first time after time u when the process (X 0 s , Z 0 s ) s∈[u,T ] leaves the set O. Thus, ρ 0 O defines a stopping time in the reference probability system ν ω .This fact will be used in the next group of equations, without further mention.
Then, continuing from (6.9), by tower property we obtain where the second equality is by property (A.1) of the measure P ω (and Assumption 3.6), the third one uses the definition of I νω (see (5.3)) and the fact that the processes (η 0,u , α 0 , ξ 0,u , X 0 , Z 0 ) are well-defined in the reference probability system ν ω for P-a.e. ω (see Proposition A.4); the inequality follows from admissibility of the treble (η 0,u , α 0 , ξ 0,u ) ∈ Adm νω u,z−Z 0 u (ω) for P-a.e. ω and Proposition 5.7; in the final expression we recall that (X 0 u , Z 0 u , η τ,0 , ρ 0 O ) and (X u , Z u , η τ , ρ O ) are equal P-a.s.Combining (6.7), (6.9) and (6.12), we obtain (6.6) which, together with (6.5), gives Taking the supremum over all admissible trebles we obtain the first inequality Step 2. (inequality ≥).At the start of the proof we fixed (t, x, z), u ∈ [t, T ] and ν ∈ V t .Let now (τ, α, ξ) ∈ Adm ν t,z−z be arbitrary but fixed too.The idea is to construct another admissible treble (τ , α, ξ) ∈ Adm ν t,z−z which coincides with (τ, α, ξ) up to time u and whose restriction to [u, T ] is δ-optimal with respect to v(u, X α,ξ u , Z ξ u ) for some δ > 0. Let us denote Assumption 3.7 and (3.11), for any ε > 0 we can pick then (6.13) holds too.Thus, we can also choose a partition {B 2 n } n∈N of B 2 \ B 1 into countable disjoint Borel sets with diam(B 2 n ) < δ 2 for every n ∈ N so that (6.13) holds for all (x 1 , z 1 ), (x 2 , z 2 ) ∈ B 2 n .Iterating this argument, we find a partition {B k n } k,n∈N of R d × [0, z] into countable disjoint Borel sets such that for arbitrary n, k ∈ N, given (x 1 , z 1 ), (x 2 , z 2 ) ∈ B k n with z 1 ≤ z 2 , the condition (6.13) holds.With a slight abuse of notation we relabel the partition {B k n } k,n∈N simply by {B n } n∈N .Let us fix a reference probability system ν = ( Ω, F , P, { Fu s }, W ) ∈ V u , which will be used as an auxiliary system when dealing with regular conditional probabilities.Let n, m ∈ N and take (x n , z n ) ∈ B n .For convenience, we pick z n so that z n ≥ z for all (x, z) ∈ B n .That is, with no loss of generality, the partition of [0, z] is of the form {{0}, (0, a], (a, b], . ..}.By Proposition 5.7, there exist (τ n;(m) , α n;(m) , ξ n;(m) ) ∈ Adm ν u,z−zn such that For the moment m is fixed and we drop the superscript for notational convenience.Hence, we write (τ n , α n , ξ n ) instead of (τ n;(m) , α n;(m) , ξ n;(m) ).
Plugging (6.22) and (6.23) back into (6.21) and using (6.16), we obtain Now recall that the treble (τ n , α n , ξ n ) ∈ Adm ν u,z−zn is 1 m -optimal.Thus, where the second inequality uses once again (6.13) and that (X u (ω), Z u (ω)) ∈ B n for ω ∈ O n .Combining (6.24) and (6.25) and summing over the indicator functions, we arrive at On the event U c k we have ηk u+ = 1, which corresponds to τ k = u.Then, in (6.20), where the final equality is by (6.18).Putting together (6.26)-(6.27)and (6.19) we arrive at Letting k → ∞ we have U k ↑ Ω and U c k ↓ ∅.Moreover, v ≥ g 2 by definition and both g 1 and g 2 are bounded from below by Assumption 3.6.Then, by Fatou's lemma, we obtain Letting m → ∞ and ε → 0, then taking the supremum over (τ, α, ξ) ∈ T ν t × A ν t × X ν t in (6.28) we obtain our second inequality.Recalling the result in part 1 of this proof we conclude.Remark 6.2.The proof of Theorem 4.1 follows by identical arguments but dropping the dependence on the state process Z (i.e., the state dynamics is only given by the couple (t, X)).The only note of caution is that instead of the finite-fuel condition, in the infinite-fuel problem one must check the admissibility condition E |V [t,T ] ( ξk )| p < ∞.That is not hard because Clearly, Moreover, recalling (6.15) we also obtain, for all n ≤ k, where the final equality is by P-indistinguishability of ξ n and ξ n,0 and the inequality is by ξ n ∈ X ν u .Remark 6.3.In the proof of Theorem 4.1 we use the equivalence between weak formulation and strong formulation of the stochastic control problem (see Proposition 5.7).For example, when we pass to the reference probability system ν ω at time u ∈ [t, T ] we need the weak formulation to argue that the inequality in (6.12) holds.Similar technical steps in other parts of the proof use the same argument.
Our next goal is to extend the dynamic programming principle to stopping times σ ∈ (t, T ).In the process we also prove Proposition 4.3, which turns out to be a useful tool in the proof of Theorems 4.4 and 4.5.
Proof of Proposition 4.3.We perform the full proof only in the case of finite fuel.The analogue for the infinite-fuel problem follows by the same arguments.Following [38,Ch. II.1] we need to show that E[(M ν;α,ξ; s∧τ ) − ] < ∞ for all s ∈ [t, T ] and E[M ν;α,ξ s∧τ |F t u ] ≤ M ν;α,ξ u∧τ for all t ≤ u ≤ s ≤ T .The integrability condition is immediate from Assumption 3.6.Let us now prove the supermartingale inequality.
We can now prove the dynamic programming principle for stopping times (Theorems 4.4 and 4.5).Once again we only provide the full proof for the finite-fuel problem as the one for the infinite-fuel case follows the same arguments but with one fewer state variable.
Proof of Theorem 4.5.The proof is split into two steps and it uses a standard approximation argument for stopping times, combined with Theorem 4.2.Fix (t, x, z) ∈ [0, T ] × R d × [0, z].For every n ∈ N let Π n := {t i n } 2 n i=0 be the n-th dyadic partition of [t, T ], i.e., t i n := t + i/2 n (T − t).
In the next proposition, for P-a.e. ω ∈ Ω, we generate reference probability systems ν ω starting at time u ∈ [t, T ], i.e., ν ω ∈ V u and then we construct admissible controls on ν ω starting from ones in the standard reference probability system ν ∈ V t .The probability measure on the reference probability systems ν ω is the regular conditional probability P ω .To obtain these results, it is convenient to recall the notation from Section 5. Recall also that given any admissible treble (τ, α, ξ), either in Adm ν t or Adm ν t,z−z , and letting η τ be the increasing process associated to τ , it is possible to construct a {F t,0 s }-predictable treble (η τ,0 , α 0 , ξ 0 ) which is also admissible (see Lemma 5.1 and 5.2).
the terminal gain when the controlled process leaves the set O prior to τ , g 2 : [0, T ] × R d → R the terminal gain when τ occurs prior to ρ O and c ± : [0, T ] × R d → R d are the vectors of cost per unit of singular control exerted.
t f (s, t f (s, c Combined with step 1 above, we conclude.