On the Finite Horizon Optimal Switching Problem with Random Lag

We consider an optimal switching problem with random lag and possibility of component failure. The random lag is modeled by letting the operation mode follow a regime switching Markov-model with transition intensities that depend on the switching mode. The possibility of failures is modeled by having absorbing components. We show existence of an optimal control for the problem by applying a probabilistic technique based on the concept of Snell envelopes.


Introduction
The standard optimal switching problem (sometimes referred to as starting and stopping problem) is a stochastic optimal control problem of impulse type that arises when an operator controls a dynamical system by switching between the different members in a set of switching modes I = {b 1 , . . . , b m }. In the two-modes case (m = 2) the modes may represent, for example, "operating" and "closed" when maximizing the revenue from mineral extraction in a mine as in [8]. In the multi-modes case the operating modes may represent different levels of power production in a power plant when the owner seeks to maximize her total revenue from producing electricity [10] or the states "operating" and "closed" of single units in a multi-unit production facility as in [7].
In optimal switching the control takes the form u = (τ 1 , . . . , τ N ; β 1 , . . . , β N ), where τ 1 ≤ τ 2 ≤ · · · ≤ τ N is a sequence of (random) times when the operator intervenes on the system and β j ∈ I is the switching mode that the operator switches to at time τ j . The standard multi-modes optimal switching problem in finite horizon This work was supported by the Swedish Energy Agency through Grants Numbers 42982-1 and 48405-1.
B Magnus Perninge magnus.perninge@lnu.se 1 Department of Physics and Electrical Engineering, Linnaeus University, Växjö, Sweden (T < ∞) can then be formulated as finding the control that maximizes is the operation mode (when starting in a predefined mode β 0 ∈ I), ψ b and ϒ b are the running and terminal revenue in mode b ∈ I, respectively and c b,b (t) is the cost of switching from mode b to mode b at time t ∈ [0, T ].
The standard optimal switching problem has been thoroughly investigated in the last decades after being popularised in [8]. In [19] a solution to the two-modes problem was found by rewriting the problem as an existence and uniqueness problem for a doubly reflected backward stochastic differential equation. In [13] existence of an optimal control for the multi-modes optimal switching problem was shown by a probabilistic method based on the concept of Snell envelopes. Furthermore, existence and uniqueness of viscosity solutions to the related Bellman equation was shown for the case when the switching costs are constant and the underlying uncertainty is modeled by a stochastic differential equation (SDE) driven by a Brownian motion. In [14] the existence and uniqueness results of viscosity solutions was extended to the case when the switching costs depend on the state variable. Since then, results have been extended to Knightian uncertainty [11,20,21] and non-Brownian filtration and signed switching costs [32]. For the situation when the underlying uncertainty can be modeled by a diffusion process, generalization to the case when the control enters the drift and volatility term was treated in [17]. This was further developed to include state constraints in [26]. Another important generalization is to the case when the operator only has partial information about the present state of the diffusion process as treated in [30].
As many physical systems do not immediately respond to changes in the control variables, including delays is an important aspect when seeking to derive applicable results in optimal control. General impulse control problems with deterministic lag have been considered in a variety of different settings including the novel paper [3], where an explicit solution to an inventory problem with uniform delivery lag is found by taking the current stock plus pending orders as one of the states. Similar approaches are taken in [2] where explicit optimal solutions of impulse control problems with uniform delivery lags are derived for a large set of different problems and in [9] where an iterative algorithm is proposed. In [33] the authors propose a solution to general impulse control problems with lag, by defining an operator that circumvents the delay period. The optimal switching problem with non-uniform (but deterministic) lag and ramping was solved in [34] by state space augmentation in combination with the probabilistic approach initially developed in [13].
The aim of the present article is to extend the applicability of optimal switching further by considering the case of random lag and component failure during startup. As in [34] we consider the problem of operating n > 0 different production units, that can be either in operation or turned off, and thus let the switching modes be the set of all n-dimensional vectors of zeroes and ones, i.e. I := {0, 1} n . To model the random lags and failures we let the operation mode, α u t , be a continuous-time, finite-state, observable Markov-process taking values in A := {−1, 0, 1} n , where −1 represents "malfunction", 0 represents "off" and 1 represents "operating". We assume that the transition intensities of α u t depend on the control both through the present switching mode, ξ t := N j=1 β j 1 [τ j ,τ j+1 ) (t), but also through the time of the last switch from off to operating in each of the different production units. As opposed to the situation in the standard optimal switching problem, the switching mode and the operation mode may thus differ due to the lag.
We will consider the problem of finding a strategy u that maximizes where the process θ u is such that the ith component gives the elapsed time in the present "on"-cycle for Plant i. The process θ u will allow us to model increased production costs during startup or lower production during ramp-up periods (see e.g. [35] for a situation where ramping is important). The results presented will be derived under the assumption that the ψ a , ϒ a and c b,b are adapted w.r.t. a filtration generated by a Brownian motion. However, these results readily extend to more general (quasileft continuous) filtrations, e.g. a filtration generated by a Brownian motion and an independent Poisson random measure. The remainder of the article is organized as follows. In the next section we state the problem, set the notation used throughout the article and detail the set of assumptions that are made. Then, in Sect. 3 a verification theorem is derived. This verification theorem is an extension of the original verification theorem for the multi-modes optimal switching problem developed in [13]. In Sect. 4 we show that there exists a family of processes that satisfies the requirements of the verification theorem, thus proving existence of an optimal control for the optimal switching problem with random lag. Then, in Sect. 5 we focus on the case when the underlying uncertainty in the processes ψ a and ϒ a can be modeled by an SDE and derive a dynamic programming relation for the corresponding value functions.

Preliminaries
We consider the finite horizon problem and thus assume that the terminal time T is fixed with T < ∞. We will assume that turning off a unit gives immediate results on the operation mode and we have α t ≤ ξ t for all t ∈ [0, T ]. The state space for (α, ξ ) is then J := {(a, b) ∈ A × I : a ≤ b}. Furthermore, we define the following sets 1 : • For each b ∈ I, we let A b := {a ∈ A : a ≤ b} and for each a ∈ A we let I a := {b ∈ I : b ≥ a}. • For each (a, b) ∈ J we let A a,b := {a ∈ A b : |a i | ≥ |a i | and a i = a i when a i ∈ {−1, −b i }}.
• For each b ∈ I we let I −b := I \ {b} and for each a ∈ A a,b we let A −a a,b := Note here that A a,b is the set of all a ∈ A b that the operation mode may transition to from a when ξ = b and A abs b is the set of all states in A that are absorbing for α when ξ = b.
We let ( , G, G, P) be a probability space endowed with a d-dimensional Brownian motion (B t : 0 ≤ t ≤ T ) whose augmented natural filtration is F : be a mixed Markov chain (sometimes also referred to as a stochastic hybrid system [31], see [5,6,23] for applications in credit-risk models and [4,22] for continuous-time conditionally-Markov chains) with càdlàg sample paths and state-space A a,b . We assume that A t,ν,a,b s = a for s ∈ [0, t ∨ max i ν i ] and that on (t ∨ max i ν i , T ], the transition rate from a to a , λ ν,b a,a (s), is F-progressively measurable for all a ∈ A a,b . For a / ∈ A a,b we let λ ν,b a,a (s) ≡ 0. We assume that G := (G t ) 0≤t≤T is the augmented natural filtration generated by B and the family ((A t,ν,a,b s ) 0≤s≤T : (t, ν, a, b) ∈ D A ), satisfying the usual conditions in addition to being quasi-left continuous (more information about enlargement of filtrations can be found in e.g. Chapter 6 of [36]).
Recall here the concept of left continuity in expectation: A process (X t : 0 ≤ t ≤ T ) is strongly left continuous in expectation (SLCE) if for each stopping time γ and each sequence of stopping times γ k γ we have lim Throughout we will use the following notation: • P F (resp. P G ) is the σ -algebra of F-progressively measurable (G-progressively measurable) subsets of [0, T ] × . • We let S 2 be the set of all R-valued, P G -measurable, càdlàg processes We let S 2 e (resp. S 2 c ) be the subset of processes that are non-negative and SLCE (resp. continuous).
• We let S 2 F , S 2 F,e and S 2 F,c be the subset of S 2 , S 2 e and S 2 c , respectively, of processes that are P F -measurable.
• We let T (T F ) be the set of all G-(F-)stopping times and for each γ ∈ T (T F ) we let T γ (T F γ ) be the subset of stopping times τ such that τ ≥ γ , P-a.s. • We let U be the set of all u = (τ 1 , . . . , τ N ; β 1 , . . . , β N ), where (τ j ) N j=1 is an increasing sequence of G-stopping times and β j ∈ I −β j−1 is G τ j -measurable.
• We let U f be the subset of controls u ∈ U for which N is finite P-a.s. (i.e. U f := {u ∈ U : P [{ω ∈ : N (ω) > k, ∀k > 0}] = 0}) and U k the subset of controls for which N ≤ k, P-a.s. For γ ∈ T let U γ (resp. U f γ and U k γ ) be the subset of U (resp. U f and U k ) with τ 1 ∈ T γ . Our problem will be characterized by four objects: We make the following assumptions: Furthermore, we assume that there are constants k ψ > 0 and k ϒ > 0 such that, P-a.s. (where the exception set does not depend on the tuple (t, z, z )).
a,a (·) ∈ H ∞ F and we assume that there is a constant K λ > 0, such that dP ⊗ ds-a.e. Furthermore, we assume that each element of λ ν,b is Lipschitz continuous in ν: for all s ∈ [0, T ], P-a.s. (where the exception set does not depend on the tuple (s, ν, ν )).
The above assumptions are mainly standard assumptions for optimal switching problems. Assumptions i and iia together imply that the expected maximal reward is finite. Assumption iib implies that there is always a positive switching cost associated to making a loop of switches and iii implies that it is never optimal to switch at time T .
Each control u = (τ 1 , . . . , τ N ; β 1 , . . . , β N ) defines the switching mode starting in b ∈ I which is a process (ξ b t : 0 ≤ t ≤ T ) given by with τ N +1 = ∞ (for notational simplicity we will write ξ for ξ 0 ). The switching mode thus, in some sense, tells us the preferred operation state. For each initial mode b and each vector ν ∈ [0, T ] b , we let the control u define the sequence 3 Given that the operating mode at time t is a, the switching mode is b ∈ I and given the vector of activation times ν, such that (t, ν, a, b) ∈ D A , the family of mixed Markov-chains (A ·,·,·,· s : 0 ≤ s ≤ T ) defines the sequence of operating modes, α 0 , . . . ,ᾱ N , at the intervention times asᾱ 0 := a and then recursivelȳ For notational simplicity we use the same shorthand as above and write α u for α 0,0,0,0,u .
when a ∈ A −a a,b and λ ν,b a,a (s) = − a ∈A −a a,b λ ν,b a,a (s).
We let ((θ t,ν,z,a,b s ) 0≤s≤T : (t, ν, z, a, b) ∈ D) be given by To define the time in present on-mode for a control u = (τ 1 , . . . , τ N ; β 1 , . . . , β N ) ∈ U t starting in z at time t we let θ 0 s := θ t,ν,z,a,b s and then recursively define This allows us to define and again we let θ u := θ 0,0,0,0,0,u . In addition, to simplify notation in some of the proofs, we let
We are now ready to state the optimal switching problem with random lag: (2.1)

Remark 2.4 Note that we have
Hence, we can without loss of generality assume that for each a ∈ A, ϒ a and ψ a are both non-negative.
The following proposition is a standard result for optimal switching problems and is due to the "no-free-loop" condition.

Proposition 2.5
Suppose that there is a u * ∈ U such that J (u * ) ≥ J (u) for all u ∈ U. Then u * ∈ U f . Proof Assume that u ∈ U \ U f and let B := {ω ∈ : N (ω) > k, ∀k > 0}, then P[B] > 0. Furthermore, if B holds then the switching mode must make an infinite number of loops and we have by Assumption 2.1 (i) and (ii). Now, by the above non-negativity assumption on ϒ and ψ we have J (u) ≥ 0 for u = ∅ and the assertion follows.
We end this section with two useful lemmas: a ,a (s)|ds and the last part goes to zero as m → ∞.

Lemma 2.7 For any (t, ν, z, a, b) ∈ D and any s ∈ [t, T ] we have
s. for all r ∈ [0, T ] and the result follows. Assume instead that a / ∈ A abs b and let η and η be the first transition times of A t,ν,a,b and A t,ν ,a,b , respectively. For all (t, ν, z, a, b) ∈ D and all r ∈ [t, s] we let t,ν,z,a,b r a,a and the fact that , P-a.s. for any Z ∈ S 2 F (see e.g. Corollary 5.1.3 and the preceding comment on p. 148 in [5]). This implies that Now, as r ,ν ,z,a,b r − r ,ν,z,a,b r = 0, P-a.s., whenever a ∈ A abs b we can use an induction argument to deduce that |ψ a (r , ζ )|dr F t and the assertion follows as the last term is P-a.s. bounded by Assumption 2.1.i and Doob's maximal inequality.

The Snell Envelope
In this section we gather the main results concerning the Snell envelope that will be useful later on. When presenting the theory we introduce an auxiliary probability space (P, ,F , (F t ) 0≤t≤T ) that we assume satisfies the usual conditions in addition to the filtration,F := (F t ) 0≤t≤T , being quasi-left continuous. For anyF-stopping time η, we letT η be the set ofF-stopping times τ such that η ≤ τ ≤ T , P-a.s. and recall that a progressively measurable process U t is of class [D] if the set of random variables {X τ : τ ∈T 0 } is uniformly integrable.
Theorem 2.8 (The Snell envelope) Let U = (U t ) 0≤t≤T be anF-adapted, R-valued, càdlàg process of class [D]. Then there exists a unique (up to indistinguishability), R-valued càdlàg process Z = (Z t ) 0≤t≤T called the Snell envelope, such that Z is the smallest supermartingale that dominates U . Furthermore, the following holds: (i) For any stopping time γ , Furthermore, in this setting the Snell envelope, Z , is left continuous in expectation, i.e. K d ≡ 0, and Z is a martingale on [η, τ * η ]. (iv) Let U k be a sequence of càdlàg processes converging increasingly and pointwisely to the càdlàg process U and let Z k be the Snell envelope of U k . Then the sequence Z k converges increasingly and pointwisely to a process Z and Z is the Snell envelope of U . (v) We have the following dynamic programming relation: For any γ ∈T 0 and anỹ F-stopping time η with η ≥ γ , P-a.s., we have In the above theorem (i)-(iii) are standard. Proofs can be found in [15] (see [29] for an English version), Appendix D in [18,25,27] and in the appendix of [12]. Statement (iv) was proved in [13]. The last statement follows by noting that ess sup To get the second equality above we note that if (τ j ) j≥0 is an increasing maximizing sequence for the outer supremum and (τ j ) j≥0 an increasing maximizing sequence for the inner supremum in the second expression on the first row, thenτ j = 1 [τ j <η] τ j + 1 [τ j ≥η] τ j is a maximizing sequence for the expression on the second row and the two values must equal.
The Snell envelope will be the main tool in showing that Problem 1 has a unique solution.

A Verification Theorem
The method for solving Problem 1 will be based on deriving an optimal control under the assumption that a specific family of processes exists, and then (in the next section) showing that the family indeed does exist. We will refer to any such family of processes as a verification family. Definition 3. 1 We define a verification family to be a family of càdlàg supermartingales The purpose of the present section is to reduce the solution of Problem 1 to showing existence of a verification family. This is done in the verification theorem below. First we give a lemma that will be used in the proof of the verification theorem: Proof For s ∈ [0, t] the result follows immediately from property c) and Doob's maximal inequality. We thus let s ∈ (t, T ] and note that if a ∈ A abs then which tends to 0 as ( p, q) → (0, 0) by property c), where to get the last inequality we have applied Doob's maximal inequality. For a ∈ A b \ A abs b we find, by arguing as in the proof of Lemma 2.7, that for (t, ν, z) and (t, where η and η are the first transition times of A t,ν,a,b and A t,ν ,a,b , respectively. Using the relation ab We have the following verification theorem:  :  (t, ν, z, a, b) ∈ D). Then ((Y t,ν,z,a, : (t, ν, z, a, b) ∈ D) is unique (i.e. there is at most one verification family, up to indistinguishability) and: (i) Satisfies Y 0,0,0,0,0 0 = sup u∈U J (u).
Step 1 We start by showing that for all (t, ν, z, a, b) ∈ D the recursion (3.1) can be written in terms of stopping times. From (3.1) we have that, for each (t, ν, z, a, b) ∈ D, is the smallest supermartingale that dominates the process We will show that under assumptions (a)-(c) the dominated process satisfies the assumptions of Theorem 2.8.iii after which the assertion follows. Fix b ∈ I −b and note that for all s ≤ s ≤ T we have the following trivial relation If γ m is a sequence of stopping times such that γ m γ ∈ T , P-a.s., we thus have where the first equality follows by (a) and (c) and the second equality follows from the L 2 -boundedness assumed in (b) in combination with Lemma 2.6.
The dominated process is thus L 2 -bounded by Assumption 2.1 and (b), positive by Remark 2.4 and SLCE on [0, T ). At time T it may have a jump but the jump has to be positive by Assumption 2.1.iii. Theorem 2.8.iii now implies that, for each γ ∈ T , there is a stopping time, τ γ ∈ T γ , such that: Step 2 We now show that P-a.s., for each τ * j−1 ≤ s ≤ T . We now show that the same equality holds for j + 1. This will be done over two sub-steps: )dr is the product of a G t -measurable positive r.v. and a supermartingale, thus, it is a supermartingale for s ≥ t. Hence, as is the sum of a finite number of supermartingales it is also a supermartingale. By Lemma 3.2 we have that as M → ∞. This implies that there is a subsequence (M ι ) ι≥1 such that P-a.s. as ι → ∞. In particular, we note that (Y t,M ι t : τ * j ≤ t ≤ T ) is a sequence of càdlàg processes that converges P-a.s. to (Y is a càdlàg process. Furthermore, by Lemma 2.7 and dominated convergence we get where we have used the supermartingale property to reach the inequality. Hence, We note that Y t,M s dominates the càdlàg process uniformly in t as ι → ∞ and we conclude that U is a càdlàg process. Appealing once again to Lemma 2.7 and property (c) the statement follows. Sub-step c) U ∈ S 2 e . We note that the results we obtained in Step 1 implies that for any sequence where the last term can be made arbitrarily small and we, thus, have that P-a.s. By induction we get that for each K ≥ 0 where τ * N * +1 = τ * N * +2 = · · · = ∞. Now, arguing as in the proof of Proposition 2.5 we find by property b) that u * ∈ U f . Letting K → ∞ we conclude that Y 0,0,0,0,0 0 = J (u * ).
Step 3 It remains to show that the strategy u * is optimal. To do this we pick any other strategyû : = (τ 1 , . . . ,τN ;β 1 , . . . ,βN ) ∈ U f and let the triple (θ j ,ẑ j ,â j ) 1≤ j≤N be defined by the recursionsθ j :=θ j−1β j + (β j −β j−1 ) +τ j , but in the same way P-a.s. By repeating this argument and using the dominated convergence theorem we find that J (u * ) ≥ J (û) which proves that u * is in fact optimal.

Remark 3.4
Note that the above proof can be trivially extended to arbitrary initial × (a, b).  : (t, ν, z, a, b) ∈ D). To obtain a satisfactory solution to Problem 1, we thus need to establish that there exists a family of processes satisfying properties (a)-(d) in the definition of a verification family. We will follow the standard existence proof which goes by applying a Picard iteration (see [10,13,20]). We thus define a sequence ((Y t,ν,z,a,b,k s ) 0≤s≤T : (t, ν, z, a, b) ∈ D) k≥0 of families of processes as

Existence
and for k ≥ 1.
In this section we will show that the limiting family, ((Ỹ t,ν,z,a,b s ) 0≤s≤T : (t, ν, z, a, b) ∈ D), obtained when letting k → ∞ is a verification family, thus proving existence of an optimal control for Problem 1. This will be done over a number of steps where we start by showing that for each k the family defined by the above recursions satisfy properties (a)-(c). We then show that property (d) follows from Theorem 2.8.iv. However, we start by showing that the above defined family is uniformly L 2 -bounded. We letψ := max a ∈A max z∈[0,T ] a |ψ a (·, z)|,Ῡ := max a ∈A max z∈[0,T ] a |ϒ a (z)| and defineȲ We have the following: Proposition 4.1 For each k ≥ 0, the family of processes Y t,ν,z,a,b,k : (t, ν, z, a, for all k ≥ 0. Proof LetȲ k = sup (t,ν,z,a,b)∈D |Y t,ν,z,a,b,k |. Since Y t,ν,z,a, where the right hand side is bounded by Assumption 2.1.
It should be noted that the above bound is uniform in k which implies that the limit family (if it exists) satisfies the same inequality. In particular, we conclude that property (b) holds for all k. Properties (a) and (c) will be shown by induction and we make the following induction hypothesis: family ((Y t,ν,z,a,b,k s ) 0≤s≤T : (t, ν, z, a, b) ∈ D) is such that: every (a, b) ∈ J , we have the following continuity property We note that, under the induction hypotheses H.0-H.k, arguing as in the proof of where the supremum is attained by a control u * ,k+1 ∈ U k+1 s . Furthermore, the characterisation of Y t,ν,z,a,b,k t in terms of the recursion (4.1) and (4.2) can be further simplified by noting that the possible transitions of A t,ν,a,b · forms paths in a directed acyclic graph where the leafs are the members of the set A abs b . Letting η ∈ T t denote the first transition time of A t,ν,a,b · , Theorem 2.8.v allows us to write (recall that θ t,z,a where in both equations the fact that η > t, P-a.s., allows us to take conditional expectation with respect to F t instead of G t . Actually, since A t,ν,a,b is a pure jump Markov process, we have that σ (A t,ν,a,b s : 0 ≤ s ≤ ·) is generated by sets of the type {s < η} on [0, η). Hence, for each τ ∈ T t there is aτ ∈ T F t such thatτ ∧ η = τ ∧ η, P-a.s. and we only need to take the essential supremum over T F t (the set of F-stopping times τ ≥ t) in (4.5).
Furthermore, we note that when a ∈ A abs b , then by the definition of A t,ν,a,b we have η = ∞ and thus and Y t,ν,z,a,b,0 (4.8) We start by showing that the induction hypothesis holds for k = 0: From this it is immediate that H.0.ii holds whenever a ∈ A abs b . Concerning H.0.i, we note that for the general case is the sum of a continuous process and a martingale in a quasi-left continuous filtration, hence, it has a version which is càdlàg and SLCE and, thus, belongs to S 2 e by Proposition 4.1.
Let (t, ν, z) ∈ D (a,b) and assume that t we may have a transition in A t,ν,a,b (the probability of which is bounded by 1 − e −K λ (t −t) ) and thus have Note that the first two terms on the right hand side in the above equation satisfies: P-a.s. as |t − t| → 0 by P-a.s. boundedness ofψ andȲ . Concerning the third term let us again consider the case when a ∈ A abs b . Define K t,t (X ) := |E X F t − E X F t |, then K t,t is a subadditive operator and where δ(M) is the diameter of a partition (G M l ) M l=1 of [0, T ] a + and z M l ∈ G M l and we note that the integrals are well defined by Assumption 2.1.i. The last of the above inequalities follows by noting that for all z, z ∈ [0, T ] a + K t,t (ψ a (s, z)) = K t, t (ψ a (s, z ) + (ψ a (s, z) − ψ a (s, z ))) ≤ K t,t (ψ a (s, z )) + K t,t (ψ a (s, z) − ψ a (s, z ))) (ψ a (s, z )) + 2k ψ |z − z|.   where the integral w.r.t. ds is well defined since the integrand is P-a.s. equal to that on the last row of (4.9). Noting that P-a.s., where δ(M) can be made arbitrarily small. We conclude that H.0 holds for all a ∈ A abs b . We now apply an induction scheme and assume that for some a ∈ A b hypothesis H.0 holds and for all a ∈ A −a a,b . We consider continuity in ν. We have, with η the first transition time of A t,ν,a,b and η the first transition time of A t,ν ,a,b , Proof Since we have already shown that the induction hypothesis holds for k = 0 we will assume that H.0-H.k hold for some k ≥ 0. Now, as noted above, this implies the existence of a control u := (τ 1 , . . . , τ N ; β 1 , . . . , β N ), with N ≤ k +1, P-a.s. such that Fix (a, b) ∈ J . We will prove the Proposition in 3 different steps: Step 1 We first show continuity in z.
and in particular |θ Repeating this argument and noting that k is finite we conclude that for all s ∈ [0, T ]. By the above we find that, P-a.s. for all t ∈ [0, T ], where we note that the constant C does not depend on k.
Step 2 Next we show continuity in ν. Again we apply an induction argument and assume that for some a ∈ A b hypothesis H.k+1 holds for all a ∈ A −a a,b . We have when a ∈ A abs b . Relying on the induction argument and the fact that the control u ∈ U f for all k, we conclude by symmetry that for all t ∈ [0, T ], P-a.s. Applying Doob's maximal inequality, we find that, for p ∈ R n E sup where again the constant C does not depend on k. Put together, this implies that H.k+1.ii holds for a.
Step 3 To show that H.k+1.i holds we note that is the Snell envelope of a process in S 2 e and thus itself belongs to S 2 e . Subtracting the continuous process (the probability of which is bounded by 1 − e −K λ (t −t) ), it may be optimal to switch to another mode or neither of the above. We conclude that We are now ready to show that the limit family, lim k→∞ ((Y t,ν,z,a,b,k s ) 0≤s≤T : (t, ν, z, a, b, k) ∈ D), exists and satisfies the properties of a verification family. We start with existence: each (t, ν, z, a, b) ∈ D, the limitỸ t,ν,z,a,b := lim k→∞ Y t,ν,z,a,b,k , exists as an increasing pointwise limit, P-a.s. Furthermore, the process (Ỹ t,ν,z,a,b t : we have by (4.4) that, P-a.s., where the right hand side is bounded in L 2 . Hence, the sequence ((Y t,ν,z,a,b,k s ) 0≤s≤T : (t, ν, z, a, b) ∈ D) converges P-a.s. for all s ∈ [0, T ].
Concerning the second claim, note that by Proposition 4.1 there is for each δ > 0 and p ∈ (1, 2) a constant K > 0, such that the set has probability P(B) ≥ 1 − δ. By the "no-free-loop" condition (Assumption 2.1.(ii)) and the finiteness of I we get that for any control (τ 1 , . . . , τ N ; β 1 , . . . , β N ), N j=1 c β j ,β j−1 (τ j ) ≥ (N − m)/m, P-a.s. Hence, there is a P-null set N ⊂ such that for all ω ∈ B \ N , . This implies that for k > 0 we have, Furthermore, for all t ∈ [0, T ] and all 0 ≤ k ≤ k. We conclude that for all ω ∈ B \ N , the sequence is a sequence of continuous functions that converges uniformly in t which implies that the limit is continuous. Since δ > 0 was arbitrary we conclude that P-almost all trajectories of (Ỹ t,ν,z,a,b t : 0 ≤ t ≤ T ) are continuous.  : (t, ν, z, a, b) ∈ D) is a verification family.
Proof As noted above, property b) of a verification family follows immediately by Proposition 4.1. We now show that the limit satisfies the additional properties of the verification theorem as well, starting with the recursion. )dr is the limit of an increasing sequence of càdlàg supermartingales it is also a càdlàg supermartingale (see e.g. [24]).
It remains to show that the limit is SLCE. Rather than appealing to a uniform convergence argument as in the proof of Proposition 4.5 we give a direct, more intuitive proof, along the lines of [13] and [32]. We will look for a contradiction and let (γ j ) j≥1 be a sequence of G-stopping times such that γ j γ ∈ T and assume that lim j→∞ E Ỹ t,ν,z,a,b Repeating this argument l times we find that there is a sequence of measurable sets M 1 ⊃ M 2 ⊃ · · · ⊃ M l with P(M l ) > 0 and a sequenceβ 2 ,β 3 , . . . ,β l of F γmeasurable random variables such thatβ i+1 ∈ Iβ i for i = 1, . . . , l − 1 and lim j→∞Ỹ γ j ,νβ i +γ j (β i −β i ) + ,θ t,z,a,b We will consider the problem of finding a feedback strategy u ∈ U t that, for each (t, x, ν, z, a,  where for each a ∈ A the deterministic function ϕ a : [0, T ]×R m ×[0, T ] a + → R + is locally Lipschitz and of polynomial growth in x, Lipschitz in z and such that ϕ a (·, x, z) is a càdlàg function and the deterministic function h a : R m ×[0, T ] a + → R + is locally Lipschitz and of polynomial growth in x and Lipschitz in z. Furthermore, we assume that the switching costs are deterministic and that, for each (a, b) ∈ J and ν ∈ [0, T ] b and each a ∈ A a,b , the transition rates λ ν,b a,a (r ) := ρ ν,b a,a (r , X t,x r ), (5.4) where ρ ν,b a,a : [0, T ] × R m → R is Lipschitz in ν, bounded and locally Lipschitz in x and ρ ν,b a,a (·, x) is càdlàg. We note that using the results in Sects. 3 and 4 there exists a control u * ∈ U t such that J (a,b) (t, x, ν, z; u * ) ≥ J (a,b) (t, x, ν, z; u), Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.