Non-Markovian Impulse Control Under Nonlinear Expectation

We consider a general type of non-Markovian impulse control problems under adverse non-linear expectation or, more specifically, the zero-sum game problem where the adversary player decides the probability measure. We show that the upper and lower value functions satisfy a dynamic programming principle (DPP). We first prove the dynamic programming principle (DPP) for a truncated version of the upper value function in a straightforward manner. Relying on a uniform convergence argument then enables us to show the DPP for the general setting. Following this, we use an approximation based on a combination of truncation and discretization to show that the upper and lower value functions coincide, thus establishing that the game has a value and that the DPP holds for the lower value function as well. Finally, we show that the DPP admits a unique solution and give conditions under which a saddle point for the game exists. As an example, we consider a stochastic differential game (SDG) of impulse versus classical control of path-dependent stochastic differential equations (SDEs).


Introduction
We solve a robust impulse control problem where the aim is to find an impulse control, u * , that solves where E u [•] := sup P∈P(u) E P [•] is a non-linear expectation and the terminal reward ϕ is a random function that maps impulse controls u = (τ j , β j ) ∞ j=1 (N := max{j : τ j < T }) to values of the real line and is measurable with respect to F ⊗ B( D), where B( D) is the Borel σ-field of the space D where the control u takes values and F is the σ-field generated by the canonical process on the space of continuous trajectories starting at 0. The intervention cost c is defined similarly to ϕ but is in addition assumed to be progressively measurable in the last intervention-time and bounded from below by a positive constant.
In fact, we take this formulation one step further by showing that, under mild conditions, the zerosum game where we play an impulse control while the adversary player (nature) chooses a probability measure has a value when allowing our adversary to play strategies, i.e. that inf u∈U sup P∈P(u) where P S is the set of non-anticipative strategies mapping impulse controls to probability measures, and derive additional conditions under which a saddle-point, (u * , P * ,S ) ∈ U × P S , for the game exists.
Our approach relies on the tower property of non-linear expectations discovered in [22] and applied in [23] to solve an optimal stopping problem under adverse non-linear expectation.In this regard we need to assume that ϕ and c are uniformly bounded and uniformly continuous under a suitable metric.
To indicate the applicability of the results we consider the special case when (1.1) corresponds to the stochastic differential game (SDG) of impulse versus classical control inf u∈U sup α∈A E T 0 φ(s, X u,α s )ds + ψ(X u,α T ) − N j=1 ℓ(τ j , X where A is a set of classical controls and X u,α solves an impulsively-continuously controlled pathdependent stochastic differential equation (SDE) that implements u in feedback form.To assure sufficient regularity in this setting, we impose an additional L 2 -Lipschitz condition on the coefficients of the SDE.
The main contributions of the present work are threefold.First, we show that the game (1.2) has a solution when the sets (P(u) : u ∈ U ) satisfy standard conditions translated to our setting and the functions ϕ and c are bounded and uniformly continuous.Second, we extract a saddle-point under additional weak-compactness assumptions on the family (P(u) : u ∈ U ). Finally, we give a set of conditions under which the cost/reward pair defined by (1.4)- (1.5) satisfies the assumptions in the first part of the paper, enabling us to show that the path-dependent SDG of classical versus impulse control in (1.3) has a value.

Related literature
The optimal stopping problem under adverse non-linear expectation was considered by Nutz and Zhang [23] and by Bayraktar and Yao in [5], where the latter allows for a slightly more general setting not having to assume a uniform bound on the rewards.Nevertheless, as explained above, [23] is based on the tower property of non-linear expectations developed in [22] and is, therefore, more closely related to the present work.
Non-Markovian impulse control under standard (linear) expectation was first considered by Hamadène et.al. in [9], where it was assumed that the impulses do not affect the dynamics of the underlying process.This approach was extended to incorporate delivery lag in [17] and, more recently, also to an infinite horizon setting in [10].A different approach to non-Markovian impulse control was initiated in [25] and then further developed in [18] where interconnected Snell envelopes indexed by controls were used to find solutions to problems with impulsively controlled path-dependent SDEs.We mention also the general formulation of impulse controls in [26], which can be seen as a linear expectation version of the present work, and the work on impulse control of path-dependent SDEs under g-expectation (see [24]) and related systems of backward SDEs (BSDEs) in [28].Although the latter work considers a path-dependent SDG of impulse versus classical control, the classical control only enters the drift term.Effectively this corresponding to the situation when the set P(u) in our framework is dominated and the extension to non-dominated sets would have to go through the incorporation of second order BSDEs (2BSDEs) (see [8,32]).
The idea of having one player implement a strategy in the zero-sum game setting was first proposed by Elliot and Kalton [14] to counter the unrealistic idea that one of the players have to give up their control to the opponent in games of control versus control.The approach was combined with the theory of viscosity solutions to find a representation of the upper and lower value functions in deterministic differential games as solutions to Hamilton-Jacobi-Bellman-Isaacs (HJBI) equations by Evans and Souganidis [15].Using a discrete time approximation technique, this was later translated to the stochastic setting by Flemming and Souganidis [16].Notable is that, while the framework of [14], that has been the prevailing formulation in the literature since its introduction, assumes that the first player to act always implements a strategy, our formulation only allows the adversary player to implement a strategy when she acts first, while our impulse control is always open loop control.The reason that our approach is still successful lies in the weak formulation of the game, effectively turning the impulse control into a feedback control when we turn to the SDG in the latter part of the paper.Moreover, our approach avoids the asymmetric information structure that results from implementing the game formulation in [14] which enables use to derive saddle-points.
Related to the SDG of impulse versus classical control that we consider is the work of Azimzadeh [1] when the intervention costs are deterministic and by Bayraktar et.al. [2] that considers a robust impulse control problem when the impulse control is of switching type.Both of these are restricted to the Markovian setting.The latter implements the switching control in feedback form while the classical control is open loop.In this sense it is probably the closes work to the present one that can be found in the literature.However, the approach used in [2] is based on the Stochastic Perron Method of Bayraktar and Sîrbu [3,4] (see also [31] for another application to SDGs) that do not easily translate to a pathdependent setting.The setup in [1] was later extended in [27] to allow for stochastic intervention costs and g-nonlinear expectation.However, this extension also falls within the Markovian framework and uses standard BSDEs to define the cost/reward.
Previous works on non-Markovian SDGs in the standard framework of classical versus classical control can be found in Pham and Zhang [29] and Possamaï et.al. [30], where the latter uses a weak-formulation of feedback versus feedback control and shows that the game has a value under the Isaac's condition and uniqueness of solutions to the corresponding path-dependent HJBI-equation [12,13].We remark that an interesting further development of the present work would be to consider the corresponding pathdependent quasi-variational inequalities, reminiscent of the relation between Markovian impulse control problems and classical quasi-variational inequalities [6].

Outline
In the next section we give some preliminary definitions, recall some prior results (in particular, the solution to the optimal stopping problem under non-linear expectation in [23]) and give an immediate extension of the tower property in [22] to our setting.Then, in Section 3 we show that a type of dynamic programming principle holds for the upper value function.In Section 4, we use an approximation routine applied to both value functions to conclude that the game has a value, i.e. that (1.2) holds.In Section 5 it is shown that our DPP admits a unique solution and conditions are given under which an optimal pair can be extracted from the value function.Finally, in Section 6 we relate the result to path-dependent SDGs of impulse versus classical control.

Notation
Throughout, we shall use the following notation, where we set a fixed horizon T ∈ (0, ∞): • We fix a positive integer d, define the sample space Ω := {ω ∈ C(R + → R d ) : ω 0 = 0} and set Λ := [0, T ] × Ω.For t ∈ [0, T ] we introduce the pseudo-norm ω t := sup s∈[0,t] |ω s | and extend the corresponding distance to Λ by defining • The set of all probability measures on Ω equipped with the topology of weak convergence, i.e. the weak topology induced by the bounded continuous functions on Ω, is denoted P(Ω).
• We let B denote the canonical process, i.e.B t (ω) := ω t and denote by P 0 the probability measure under which B is a Brownian motion and let E be expectation with respect to P 0 .
• F := {F t } 0≤t≤T is the natural (raw) filtration generated by B and F * := {F * t } 0≤t≤T , where F * t is the universal completion of F t .
• We let T be the set of all F-stopping times and for each η ∈ T we let T η be the set of all τ ∈ T such that τ (ω) ≥ η(ω) for all ω ∈ Ω.For fixed t ∈ [0, T ], we let T t be all τ ∈ T t such that ω → τ (ω) is independent of ω| [0,t] .
• For t ∈ [0, T ], we let U t (resp U k t ) be the subset of U (resp U k ) with all controls for which τ 1 ≥ t and denote by U t the subset of U t with all controls u such that ω Furthermore, for t ≥ 0 we let N (t) := max{j ≥ 0 : τ j ≤ t} and define u t := [u] N (t) and u t := (τ j , β j ) N j=N (t)+1 .
• For each κ ≥ 0, we introduce the pseudo-distance We stretch the definition of uniform continuity to maps on Λ × D as follows: Definition 2.1.We say that a map We define the concatenation of For technical reasons we extend the composition to pairs (v, v ′ ) when v has infinite length by letting v • v ′ = v in this case.The extension allows us to decompose any control u ∈ U as u = u τ • u τ .For τ ∈ T and ω, ω ′ ∈ Ω we introduce the composition on Ω as The results we present rely on the notion of regular conditional probability: Any P ∈ P(Ω) has a regular conditional probability distribution (P ω τ ) ω∈Ω given F τ satisfying for all ω ∈ Ω (see e.g.page 34 in [33]).We define the probability measure P τ,ω ∈ P(Ω) as for all A ∈ F, where ω be a family of subsets of P(Ω) such that Moreover, we let P(τ, ω, u) ) and set P(u) := P(0, ω, u).
We recall that a subset of a Polish space is analytic if it is the image of a Borel subset of another Polish space under a Borel map and that an R-valued function f is upper semi-analytic if the set {f > c} is analytic for each c ∈ R.
For P ∈ P(Ω) we let E P t [f ](ω) := E P [f t,ω ] (here E P is expectation with respect to P) and define the non-linear expectation for all (τ, ω, u) ∈ T ×Ω×U and all upper semi-analytic functions f : Ω → R.Then, E u τ [f ] is F * τ -measurable and upper semi-analytic [22].
The idea is that the adversary player, given a trajectory ω| [0,t] and a sequence of impulses v ∈ D [0,t] chooses a probability measure on Ω t,ω to maximize (1.1).In this regard we introduce the set of nonanticipative maps from controls to probability measures: Definition 2.3.We denote by P S the set of non-anticipative maps P S : U → P(Ω) mapping u ∈ U to P = P S (u) ∈ P(u).By non-anticipativity, we mean that if u τ − = ũτ− for some τ ∈ T and u, ũ ∈ U , then Moreover, for (t, ω) ∈ Λ we let P S (t, ω) denote the set of all non-anticipative maps P S : U → P(Ω) such that P S (u) ∈ P(t, ω, u).Often, we suppress dependence of ω and write P S t for P S (t, ω).
d) Let ũ ∈ U t be such that u τ − = ũτ− , then there is a P ′ ∈ P(t, ω, v • ũ) such that P ′ = P on F θ .
In the above assumption, conditions a)-c) are standard (see e.g.[20]) and basically means that dynamic programming holds.The last condition implies that the non-anticipativity postulated in Definition 2.3 is achievable.
Assumption 2.6.The functions ϕ : Ω × D → R and c : Ω × D → [δ, ∞) (with δ > 0) are uniformly bounded, i.e. there is a constant for all (ω, v) ∈ Ω × D. For each κ ≥ 0, there is a modulus of continuity function ρ ϕ,κ such that for all ω, ω ′ ∈ Ω and v, v ′ ∈ D κ with t j ≤ t ′ j .Moreover, c(u) is F τ N -measurable and for each κ ≥ 0 there is a modulus of continuity function ρ c,κ such that ), where t κ and t ′ κ are the times of the last interventions in v and v ′ , respectively.Finally, for any v ∈ D and b ∈ U we have for all ω ∈ Ω.
Remark 2.7.In view of (2.3) it is never optimal to intervene on the system at time T .In light of this and to simplify notation later on we will assume that ϕ is such that ϕ Assumption 2.8.For each i ≥ 0, there is a modulus of continuity ρ E,i such that for all κ, k ≥ 0, v ′ , v ∈ D κ , t ∈ [0, T ] and ω, ω ′ ∈ Ω and u ∈ U k t , there is a u ω ∈ U k t such that Assumption 2.9.For each modulus of continuity ρ there is a modulus of continuity ρ ′ such that for any τ ∈ T , we have sup P∈P(t,ω,u) E P (ρ(ε

The tower property for our family of nonlinear expectations
The following trivial extension of Theorem 2.3 in [22] follows immediately from the above assumptions: Proposition 2.10.For each τ, η ∈ T , with τ ≤ η, v ∈ D and u ∈ U τ and ũ ∈ U η we have for each upper semi-analytic function f : Ω → R.
Proof.We first note that since E v•u•ũ satisfies Assumption 2.1 in [22], Theorem 2.3 of the same article gives that gives that for each P ∈ P(t, ω, v •u• ũ), there is a P ′ ∈ P(t, ω, v • u) such that P = P ′ on F η (and thus on F * η ), and vice versa.Finally, as η -measurable the result follows.

Optimal stopping under non-linear expectation
We recall the following result on optimal stopping under non-linear expectation P, E (satisfying the assumptions on, say, P(∅), E ∅ above).
Theorem 2.11.(Nutz and Zhang (2015) [23]) Assume that the process (X t ) 0≤t≤T has càdlàg paths, is progressively measurable and bounded and satisfies for some modulus of continuity function ρ X and all (t, ω), ii) The game has a value: iii) If P(t, ω) is weakly compact for each (t, ω) ∈ Λ, then there is a

A dynamic programming principle
We define the upper value function as1 for all ω ∈ Ω and set Y = Y ∅ .Moreover, we define the lower value function as for all ω ∈ Ω.
Remark 3.1.As noted in the introduction, it may appear as though the setup is somewhat asymmetric as the optimization problem for the lower value function contains a strategy, whereas the corresponding problem for the upper value function does not.However, as the impulse controls in U are F-adapted and the opponent controls the probability measure (effectively deciding the likelihoods for different trajectories), this can be seen as the impulse player implementing a type of non-anticipative strategy as well.This becomes more evident when turning to the application in Section 6.
In this section we will concentrate on the upper value function and the main result is the following dynamic programming principle: Theorem 3.2.The map Y is bounded, uniformly continuous and satisfies the recursion The proof of Theorem 3.2 is given through a sequence of lemmata where the main obstacle that we need to overcome is to show uniform continuity of the map (t, ω, v) → Y v t (ω).This will be obtained through a uniform convergence argument and we introduce the truncated upper value function defined as for all (t, ω, v) ∈ Λ × D and k ≥ 0. Note that in the definition of Y v,k , the impulse controller is restricted to using a maximum of k impulses.The following approximation result is central: Proof.We first note that

Now, as
for any u ∈ U t , this implies that in the infimum of (3.3) we can restrict our attention to impulse controls for which for each k ≥ 0, since any impulse control not satisfying (3.5) is dominated by u = ∅.For any u ∈ U t satisfying (3.5) we have, To arrive at the last inequality above, we have used that since ( from which the inequality is immediate by (3.5).Now, as Consequently, Since u was an arbitrary impulse control satisfying (3.5) and [u] k ∈ U k t , the result follows.
Proof.We have with u ω as in Assumption 2.8.This immediately gives that by the same assumption.
As usual, the dynamic programming principle for Y •,k will be proved by leveraging regularity and we introduce the following partition of the set Λ × U : Definition 3.6.For ε > 0: • Finally, we let (U ε l ) 1≤l≤n ε U be a Borel-partition of U such that the diameter of U ε l does not exceed ε for l = 1, . . ., n ε U and let (b ε l ) In the next lemma we show that for any ε > 0, the impulse size can be chosen ε-optimally.Lemma 3.7.Let g : Λ × D → R be bounded and uniformly continuous.Then, for each k ≥ 0, ε > 0, u ∈ U k and τ ∈ T τ k , there is a F τ -measurable random variable β with values in U , such that sup Step 1.We first prove the result for u = v ∈ D k .For arbitrary ε 1 > 0 there is, by properties of the supremum, a double sequence (b i,j ) j≥1 (ω ′ ) we find that , where ρ g is the modulus of continuity of g and the result, for deterministic u, follows from Assumption 2.9.
Step 2. We turn to the general setting with u ∈ U k and let .
By definition, it follows that Then β is F τ -measurable and we find that Finally, by choosing a concave modulus of continuity that dominates ρ g the result follows from using Assumption 2.9, taking nonlinear expectations on both sides and choosing ε 1 sufficiently small.Lemma 3.8.For each k ≥ 0, the map (t, ω, v) → Y v,k t (ω) is bounded and uniformly continuous.Moreover, the family (Y •,k ) k≥0 satisfies the recursion ) Proof.We note that |Y v,k t | ≤ C 0 and so boundedness clearly holds.The remaining assertions will follow by induction in k and we note by a simple argument that Y •,0 is uniformly continuous and bounded.In fact, for 0 ≤ t ≤ s ≤ T we have by the tower property that that together with Lemma 3.5 yields the desired continuity.We thus assume that for each κ ≥ 0, the map Y •,k−1 : Λ × D κ → R is uniformly continuous with modulus of continuity ρ κ,k−1 and prove that then (3.6) holds and that this, in turn, implies uniform continuity of Y •,k .Let us consider the right-hand side of (3.6) that we denote Ŷ v,k t (ω).By our induction assumption and Theorem 2.11.i) we have, where Step 1.We show that Ŷ ≥ Y .For this we fix κ ≥ 0 and v ∈ D κ and note that for each ε > 0, our induction assumption together with Lemma 3.7 implies the existence of a ) and ρ ′ is a modulus of continuity function.On the other hand, for each (i, j), there is a Moreover, Assumption 2.8 implies the existence of a for all ω ∈ E ε i,j .By once again using uniform continuity of Y •,k−1 we have that for all ω ∈ E ε i,j , and we conclude that for all ω ∈ E ε i,j .We can combine the above impulse controls into The tower property now gives that for some modulus of continuity ρ and it follows that Ŷ v,k (ω) since ε > 0 was arbitrary.
Step 2. We now show that Ŷ ≤ Y .Pick û = (τ j , βj ) k j=1 ∈ U k t and note that Moreover, for each ω ∈ Ω we have We conclude by the tower property that and it follows that Ŷ v,k (ω) since this time û ∈ U k t was arbitrary.
Step 3. It remains to show that Y •,k is uniformly continuous.As uniform continuity in (ω, v) follows from Lemma 3.5 we only need to consider the time variable and find a modulus of continuity that is independent of v. Let 0 ≤ t ≤ s ≤ T and note that by the preceding steps and (4.4) of [23] |Y By arguing as in (3.7) we have for some ρ independent of v. Concerning the second term we have, since for some modulus of continuity ρ ′ and we conclude that (t, ω) → Y v,k t (ω) is uniformly continuous with a modulus of continuity that does not depend on v. Uniform continuity of the map (t, ω, v) → Y v,k t (ω) then follows by Lemma 3.5.This completes the induction-step.
Proof of Theorem 3.2.The sequence (Y v,k t (ω)) k≥0 is bounded, uniformly in (t, ω, v), and since U k ⊂ U k+1 it follows by Lemma 3.4 . Moreover, again by Lemma 3.4 the convergence is uniform and Lemma 3.8 gives that Y v t (ω) is bounded and uniformly continuous.Concerning the dynamic programming principle (3.3), we have by (3.6) and uniform convergence that On the other hand, monotonicity implies that for each k ≥ 0 and the result follows by letting k → ∞.

Value of the game
We show that the game has a value by proving that As a bonus, this immediately gives that the lower value function Z satisfies the DPP in (3.3).
The approach is once again to arrive at the result through a sequence of lemmata.Similarly to the above, we define the truncated lower value function as for all (t, ω, v) ∈ Λ × D and have the following: There is a C > 0 such that for all (t, ω, v) ∈ Λ × D and each k ≥ 0.
Proof.We have Given P S ∈ P S (t, ω) and ε > 0, let where N ε := max{j ≥ 0 : τ j < T }.Then, since ϕ is bounded and c ≥ δ > 0, Now, since P S is a non-anticipative map, Put together, we find that inf As ε > 0 was arbitrary and C does not depend on either P S or ε, the result follows by first letting ε → 0 and then taking the supremum over all P S ∈ P S (t, ω).
is uniformly continuous with a modulus of continuity that does not depend on t.
Proof.We once again have with u ω as in Assumption 2.8.This gives that by again using the same assumption.
Step 1.We first show that Ȳ v,k,ε satisfies the recursion step in (4.4).By definition we have, Ȳ v,k,ε By the tower property we have To arrive at the opposite inequality we introduce the impulse control u * := (τ * j , β * j ) N * j=1 , where Then, u * ∈ Ū k,ε t i ′ and it easily follows by repeated use of (4.4) and the tower property that and we conclude that Ȳ v,k,ε satisfies the recursion step in (4.4).
Step 2. For the sake of completeness, we also prove rigorously that Zv,k,ε By the tower property for E P and the induction hypothesis we have sup and we conclude that Zv,k,ε .
In particular, our induction assumption then implies that Zv,k,ε Step 3. Finally, we show that Zv,k,ε . We do this by showing that for any ε ′ > 0, there is a ) is Borel-measurable and thus upper semi-analytic.Given u ∈ Ū k,ε t ε i ′ we can then, by repeatedly arguing as in the proof of Lemma 4.11 in [23], find a sequence of measures (P u i ) This implies the existence of a strategy P S,ε ′ ∈ P S (t ε i ′ , ω) by setting By repeating this process n ε t − i ′ times followed by using (4.6) and the tower property for and since u ∈ Ū k t ε i ′ was arbitrary, we conclude that Ȳ v,k,ε (ω) + ε ′ from which the assertion follows as ε ′ > 0 was arbitrary.This concludes the induction step.
Proof.Repeating the above proof we find that Ȳ •,k,ε and Z•,k,ε both satisfy the recursion Zv,k,ε Theorem 4.6.The game has a value, i.e.Y ≡ Z.
Proof.Applying an argument identical to the one in the proof of Lemma 4.3 gives that Ȳ v,k,ε −Y v,k T → 0 as ε → 0. By uniqueness of limits we find that Y •,k = Z •,k and taking the limit as k tends to infinity, the result follows by Lemma 3.4 and Lemma 4.1.
Corollary 4.7.The map Z is bounded, uniformly continuous and satisfies the recursion for all (t, ω, v) ∈ Λ × D.
Proof.Since Z ≡ Y the statement in Theorem 3.2 applies to Z as well.

A verification theorem
In the previous two sections we have shown that the dynamic programming principle is a necessary condition for a map to be the value function of (1.1): if Y is an upper or lower value function then it satisfies (3.3).In this section we turn to sufficiency and its implications, starting with the following uniqueness result: Proposition 5.1.Suppose that there is a progressively measurable map Y : Λ × D → R that is uniformly continuous and bounded, such that Proof.For any û = (τ j , βj ) k j=1 ∈ U k t we have Repeating this argument k times we find, since û ∈ U k t was arbitrary, that On the other hand, from Theorem 2.11 and Lemma 3.7 there is for each ε > 0 a pair (τ ε Moreover, Step 1 in the proof of Lemma 3.8 implies the existence of a (

Repeating this argument indefinitely gives us an infinite sequence
for each k ≥ 1 and we find by applying the tower property that Taking the limit as k → ∞ on the right hand side thus gives Now, since Y and ϕ are uniformly bounded and c ≥ δ > 0, arguing as in the proof of Lemma 3.4 gives In particular, P {τ ε k < T, ∀k ≥ 0} = 0 for any P ∈ P(t, ω, v • u ε ) and we can apply Fatou's lemma in the above inequality to conclude that Since ε > 0 was arbitrary, it follows that Y ≡ Y .
Having proved that the value function is the unique solution to the dynamic programming equation (3.3) we show that, under additional measurability and compactness assumptions, Y can be used to extract an optimal control/strategy pair.
Theorem 5.2.Assume that u * = (τ * j , β * j ) ∞ j=1 is such that: • the sequence (τ * j ) ∞ j=1 is given by using the convention that inf ∅ = ∞, with τ * 0 = 0; Then, u * ∈ U is an optimal impulse control for (1.1) in the sense that Moreover, if P(t, ω, v) is weekly compact for each (t, ω, v) ∈ Λ × D and P P τ (u) := {P ′ ∈ P(u) : P ′ = P on F τ } is weekly compact for all u ∈ U , P ∈ P(u) and τ ∈ T , then there is an optimal response P * ,S ∈ P S for which Proof of (5.4).Repeated use of the dynamic programming principle for Y gives that Taking the limit as k → ∞ and repeating the argument in the second part of the proof of Proposition 5.1, (5.4) follows.
To prove the second statement we need two lemmas, the first of which (loosely speaking) shows that for any measure/control pair (P, u) with u ∈ U k and P ∈ P(u), we can extend P optimally from τ k until the time that (Y u t ) τ k ≤t≤T hits the corresponding barrier.This extension then acts as a minimizer up until the first hitting time in (3.3).The main obstacle we face is that Y u is not necessarily continuous in ω, disqualifying direct use of Lemma 4.5 in [11] as in, for example, Lemma 4.13 of [23].
and assume that for some P ∈ P(u) the set P P τ k (u) is weakly compact.Then, there is a P ⋄ ∈ P P τ k (u) such that Proof.To simplify notation we let and consider the following sequence of stopping times η l := inf{s ≥ η l−1 : S u s − Y u s ≤ 1/l} ∧ T for l ≥ 1 with η 0 := τ k .Then, by Proposition 7.50.b) of [7] and a standard approximation result there is for each l ≥ 0 a F τ k -measurable kernel ν l : Ω → P(Ω) such that ν l (•) ∈ P(τ k , •, u) and for P-a.e. ω ∈ Ω.We then define the measure P l ∈ P(u) as for each A ∈ F. Since P P τ k (u) is weakly compact we may assume, by possible going to a subsequence, that P l → P ⋄ weakly for some P ⋄ ∈ P P τ k (u).We need to show that P ⋄ satisfies (5.6).We do this over two steps: Step 1.We first find approximations of u and η l that allow us to take the limit as l → ∞ on the sequence P l and use weak convergence.Let (c m ) m≥1 be a sequence of positive numbers.We note that we can (by approximating stopping times from the right and U by a finite set) find a discrete approximation Then, with η i l := {t ≥ η i l−1 : /l} ∧ T , we can, since S v i and Y v i are both uniformly continuous, repeat the argument in Step 1 in the proof of Theorem 3.3 in [11] to find that there are continuous [0, T ]-valued random variables (θ i l ) l≥1 and F T -measurable sets Ω i l ⊂ Ω such that sup Hence, as S û ηl − Y û ηl = 1/l on [η l < T ] by continuity, we have that However, there is a modulus of continuity function ρ ′ such that and thus also Similarly, for Ω + l := {ω ∈ Ω : ηl ≥ η l+1 } we have sup We can thus choose c l such that 8l 2 ρ ′ (c l ) ≤ 2 −l for all l ≥ 1 and letting θl := Hence, since η l (ω) → τ ⋄ (ω) for all ω ∈ Ω, we get that ( θl ) l≥1 is a sequence of random variables such that θl is continuous on A l i for i = 1, . . ., M l and by the Borel-Cantelli lemma, θl → τ ⋄ , P ′ -a.s. for all P ′ ∈ P(u).
Next, given (c ′ l ) l≥1 there are P-continuity sets (( Then, for each P ′ ∈ P P τ k (u), the sets in the double sequence ((D l i ) M l i=1 ) l≥1 are also P ′ -continuity sets and also a P ′ -continuity set for each P ′ ∈ P P τ k (u) and we can define 2 −l we thus find that θ ′ l → τ ⋄ , P ′ -a.s. for each P ′ ∈ P P τ k (u), moreover, θ ′ l (and therefore also Y ) is continuous on D l i for i = 0, . . ., M l .
Step 2. Using the approximation constructed in Step 1, we now show that (5.6) holds.Since for any t ∈ [τ k , T ], we have where Ω ∆ l := ∪ M l i=1 A l i ∆D l i and τ ′ l,k is the time of the k th intervention in u ′ l , with τ ′ k := T on D l 0 , we find that there is a C > 0 such that for all l ≥ 2, we have where we have used the fact that, as Ω ∆ l ∈ F τ k , we have lim sup On the other hand, by Theorem 2.11.i),Y u •∧τ ⋄ is a P ′ -supermartingale for each P ′ ∈ P(u) and we have that where, to reach the last inequality, we have used (5.7) and the definition of η m to conclude that and the right hand side of the last inequality tends to E P Y u τ k as m → ∞.Hence, where we again use (5.8) and (5.9).Now, since η l is not continuous we cannot immediately proceed to take the limit.However, as where Ω ′ l := Ωl \ Ω ∆ l , setting we find that ψ l is continuous on D l i for i = 0, . . ., M l and thus lim sup we thus find that from which the assertion follows since Y u •∧τ ⋄ is a P ⋄ -supermartingale.
Since stopping before τ ⋄ in the above lemma is never optimal, this means that the P ⋄ in Lemma 5.3 is optimal until τ ⋄ .It remains to decide a continuation after τ ⋄ such that, at time τ ⋄ , stopping is optimal.Lemma 5.4.Let P(t, ω, v) be weakly compact for all (t, ω, v) ∈ Λ × D and let u ∈ U k .There is a F * τ k -measurable kernel ν : Ω → P(Ω) such that ν(ω) ∈ P(τ k , ω, u) and for all ω ∈ Ω.
Proof.Although the statement in the lemma differs from that of Lemma 4.16 in [23], the proof is almost identical and we give the main steps for the sake of completeness.We simplify notation by letting for v ∈ D k .Then by Lemma 4.15 of [23], the map P → V (v, ω, P) is upper semi-continuous for every (ω, v) ∈ Ω × D. To use a measurable selection theorem we need to ascertain that the map (ω, P) → V (u(ω), ω, P) is Borel.Since u ∈ U k , we have that ω → u(ω) is Borel and this will be obtained by showing that (v, ω) → V (v, ω, P) is Borel for any P ∈ P(Ω).However, for any τ ∈ T we have that is Borel, by Fubini's theorem.Hence, taking the infimum over a countable set of stopping times that is dense outside of a P-null set the assertion follows.
Step 1.We first construct a candidate for an optimal strategy.Note that by Theorem 2.11, there is a P * 0 ∈ P(∅) such that We fix u ∈ U and define (P u k ) k≥1 iteratively where we start by setting P u 0 := P * 0 .For k = 1, . .., we construct P u k from P u k−1 by first letting P u,⋄ k ∈ P be as in the statement of Lemma 5.3 with P = P u k−1 and u ← [u] k , then there is by Lemma 5.4 a F * τ ⋄ -measurable kernel ν u,k : Ω → P(Ω) such that ν u,k (ω) ∈ P(τ k , ω, [u] k ) and inf (5.12) for all ω ∈ Ω and k ≥ 1.We can thus define the measure for all A ∈ F, where νu,k is an F τ k -measurable kernel such that (5.12) holds P u,⋄ k -a.s.Combining Lemma 5.3 and Lemma 5.4 we realize that our above construction extends (5.11) yielding that for each k ≥ 0, we have where ))} ∧ T and P u −1 = P 0 .Our candidate strategy is then P * ,S (u) := P u , where P u is such that (a subsequence of) (P u k ) k≥0 converges weakly to P u .Since P ) is weakly compact, we note that P u = P u k on F τ k for each k ≥ 0 and as P u k ∈ P([u] k ) it follows by Assumption 2.4 that P u ∈ P(u).Furthermore, it is clear by the above construction that the strategy P * ,S is non-anticipative, and we conclude that P * ,S ∈ P S .
Step 2. Next, we show that if a subsequence of (P u * k ) k≥0 converges weakly to some P * , then P * is an optimal response to the optimal impulse control u * .We thus assume that u in the previous step equals u * .By (5.11) and the fact that Y (T,β * 1 ) T = ϕ(∅) (see Remark 2.7) we get that However, (5.13) gives that Repeating this procedure k ≥ 1 times we find that In fact, as P * k = P * l on F τ k+1 ∧τ l+1 this holds when replacing the measure P * k with P * l whenever l ≥ k and we have that By weak compactness of P P * l τ * l (u * ) for each l ≥ 0, by possibly going to a subsequence, P * l converges weekly to some P * ∈ P(u * ) such that P * = P * l on F τ * l for each l ≥ 0 and we find that by dominated convergence under P * .
Step 3. It remains to show that u * is an optimal impulse control under the strategy P * ,S .For arbitrary u ∈ U we have by again using that Y (T,β 1 ) T = ϕ(∅).Now, (5.13) gives that Repeating k times we find that .
Arguing as in Step 2 gives that there is a subsequence under which P u k converges weakly to some P u ∈ P(u) such that P u = P u l on F τ l for each l ≥ 0 and then . Now either u makes infinitely many interventions on some set of positive size under P u in which case the right hand side equals +∞ or we can use dominated and monotone convergence under P u to find that Combined, this proves (5.5).
Remark 5.5.Theorem 5.2 presumes existence of a F τ j -measurable minimizer β * , we note that such a minimizer always exists if U is a finite set.The canonical example under which the compactness assumption holds is when the uncertainty/ambiguity stems from the set of all Itô processes t 0 b s + t 0 σ s dB s with (b, σσ ⊤ ) ∈ A for some compact, convex set A ⊂ R d × S + (here S + denotes the set of positive semidefinite symmetric d × d-matrices).

Application to path-dependent zero-sum stochastic differential games
In the present section we extend the literature on stochastic differential games by considering the application of the above developments to zero-sum games of impulse versus continuous control in a pathdependent setting.For simplicity we only consider the driftless setting as this is sufficient to capture the main features when applying the above results to controlled path-dependent SDEs.
Throughout this section, we only consider impulse controls for which τ j (ω) → T as j → ∞ for all ω ∈ Ω.We will make frequent use of the following decomposition of càdlàg paths: Definition 6.1.For x ∈ D (the set of R d -valued càdlàg paths) we let J (x) = (η j , ∆x j ) n(T ) j=1 where η j is the time of the j th jump of x, ∆x j := x η j − x η j − the corresponding jump (with x 0− := 0) and for each t ∈ [0, t], n(t) is the number of jumps of x in [0, t].Moreover, we let C(x) be the path without jumps, i.e. (C(x)) s = x s − n(s) i=1 ∆x i .

Problem formulation
We let A be the set of all progressively measurable càdlàg processes α := (α s ) 0≤s≤T , taking values in a bounded Borel-measurable subset A of R d and let A S : U → A be the corresponding set of non-anticipative strategies.We then consider the problem of showing that where J is the cost functional (recall that E is expectation with respect to P 0 , the probability measure under which B is a Brownian motion) Proposition 6.4.Under Assumption 6.3 there is for each k, κ ≥ 0 a modulus of continuity ρ X,κ+k such that and set X l := Xt,ω,[v] l ,α and Xl := Xt,ω ′ ,[v ′ ] l ,α for l = 0, . . ., κ + k.Moreover, we let δX l := X l − Xl and set δX := δX κ+k .For s ∈ [η l , T ] with ηl := η l ∨ η ′ l we have Now, as X l = X j on [0, η j+1 ) for j ≤ l, we have |Γ(η j , X j , γ j ) − Γ(η ′ j , Xj , γ ′ j )| ≤ C(d j−1 [(η j , C(X j−1 ), J (X j−1 )), (η ′ j , C( Xj−1 ), J ( Xj−1 ))] + and induction gives that |Γ(η j , X j , γ j ) − Γ(η ′ j , Xj , γ ′ j )| ≤ C( We proceed by induction and assume that there is a C such that But then, repeating the above argument and using (6.8), gives that and the statement of the proposition follows by Jensen's inequality.
Remark 6.5.Note that the constant C > 0 in the proof of Proposition 6.4 generally depends on k + κ and may tend to infinity as either k or κ goes to infinity.This is the main reason that we needed to go through the process of truncating the maximal number of interventions to obtain the dynamic programming principle in Section 3 and to prove that the game has a value in Section 4.
Finally, we have the following result: Lemma 6.7.Assumption 2.8 holds in this setting.
Proof.We fix k ≥ 0, u ∈ U t,k and ω ′ ∈ Ω and note that x → I u (x) maps progressively measurable (resp.càdlàg) processes to progressively measurable (resp.càdlàg) processes and then so does the composition σ u := σ(•, I u (•), •).We can thus repeated the argument in the appendix to [23] to show that there is a F t ⊗ F-measurable map (ω, ω) → u ω (ω) such that u ω ∈ U t,k for all ω ∈ Ω and u ω (0 ⊗ t Xu,α,t,ω ) = u(0 ⊗ t Xu,α,t,ω ′ ), P 0 -a.s.Now, since the set of open loop controls is at least as rich as the set of feedback controls, we have for v, v ′ ∈ D κ , that for some modulus of continuity ρ ′′ .The result now follows by taking the supremum over all u ∈ U and using Proposition 6.4 combined with a standard estimate for SDEs.

Corollary 4 . 5 .
We can extend the definition of Ȳ to Λ × D by letting