A Limited-Feedback Approximation Scheme for Optimal Switching Problems with Execution Delays

We consider a type of optimal switching problems with non-uniform execution delays and ramping. Such problems frequently occur in the operation of economical and engineering systems. We first provide a solution to the problem by applying a probabilistic method. The main contribution is, however, a scheme for approximating the optimal control by limiting the information in the state-feedback. In a numerical example the approximation routine gives a considerable computational performance enhancement, when compared to a conventional algorithm.


Introduction
Consider a set of n production units F := {1, . . . , n} where each unit can be operated at two different levels, {0, 1}, representing "off" and "on". We assume that a central operator can switch production between the two operating levels in each unit. Following a switch from "off" to "on" in Unit i the output will, in general, not immediately jump to installed capacity,p i . Rather we assume that the production ramps up during a delay period [0, δ i ], with δ i > 0. We thus assume that the output of Unit i following a switch from "off" to "on" is described by a Lipschitz continuous function R i : [0, δ i ] → [0,p i ], with R i (0) = 0 and R i (δ i ) =p i . Turning off the unit is, on the other hand, assumed to render an immediate halt of production.
We consider the problem where the central operator wants to maximize her return over a predefined operation period [0, T ] (with T < ∞) that can represent, for example, the net profit from electricity production in n production units or mineral extraction from n mines. The profit depends on the operatingmode and the output from the n units, but also on an observable diffusion process (X t : 0 ≤ t ≤ T ).
For i = 1, . . . , n we let 0 ≤ τ i 1 ≤ · · · ≤ τ i N i < T represent the times that the operator intervenes on Unit i. We assume, without loss of generality, that all units are off at the start of the period so that intervention τ i 2j−1 turns operation on, while intervention τ i 2j turns operation to the "off"-mode. We define the operating-mode (ξ t : 0 ≤ t ≤ T ) of the system to be the J := {0, 1} n -valued process representing the evolution of the operation modes for the n units. The operation-mode of Unit i, at time t ∈ [0, T ], is then (where ⌈a⌉ is the smallest integer k such that k ≥ a) and the output of the same unit is with the convention that τ i N i +1 = ∞. Each intervention on Unit i renders a cost c 0 i : [0, T ] → R + when turning operation from "off" to "on" and a cost c 1 i : [0, T ] → R when the intervention is turning off the unit. We assume that a given operation strategy u := (τ 1 1 , . . . , τ 1 N 1 ; . . . ; τ n 1 , . . . , τ n Nn ) gives the total reward where, for each b := (b 1 , . . . , b n ) ∈ J , ψ b : [0, T ]×R m ×R n + → R and h b : R m ×R n + → R are deterministic, locally Lipschitz continuous functions of at most polynomial growth and ⌊a⌋ is the largest integer k such that k ≤ a.
The problem of finding a maximizer of (1.1) is a multi-modes optimal switching problem with execution delays. The multi-modes optimal switching problem was popularized by Carmona and Ludkovski in [7], where they suggested an application to valuation of energy tolling agreements (see also the paper by Deng and Xia [8]).
A formal solution to the multi-modes optimal switching problem, without delays, was derived by Djehiche, Hamadène and Popier in [9]. The authors adopted a probabilistic approach by defining a verification theorem for a family of stochastic processes that specifies sufficient conditions for optimality. They further proved existence of a family of processes that satisfies the verification theorem and showed that these processes can be used to define continuous value functions that form solutions, in the viscosity sense, to a set of variational inequalities. El-Asri and Hamadène [10] extended the approach to switching problems where the switching costs are functions also of the state and proved uniqueness of the viscosity solutions.
Previous work on more general impulse control problems with execution delays include the novel paper by Bar-Ilan and Sulem [3], where an explicit solution to an inventory problem with uniform delivery lag is found by taking the current stock plus pending orders as one of the states. Similar approaches are taken by Aïd et. al. in [2] where explicit optimal solutions of impulse control problems with uniform delivery lags are derived for a large set of different problems and by Bruder and Pham [6] who propose an iterative algorithm. Øksendal and Sulem [17] propose a solution to general impulse control problems with execution delays, by defining an operator that circumvents the delay period.
A state space augmentation approach to switching problems with non-uniform delays and ramping is taken by Perninge and Söder in [19] and by Perninge in [18] where application to real-time operation of power systems is considered. In these papers numerical solution algorithms are proposed by means of the regression Monte Carlo approach (see Longstaff and Schwartz [16]), that has previously been proposed to solve multi-modes switching problems by Carmona and Ludkovski [7] and by Aïd et. al. [1].
Although many approaches have been proposed to give solutions, both exact and approximate, to impulse control problems with execution delays, they either consider models where delays only enter through uniform lags, or they propose methods that become intractable for systems with many production units. A computational difficulty that arises when trying to find a maximizer of (1.1) by augmenting the state with a suitable set of "times since last intervention" is the curse of dimensionality which may become apparent already with a relatively low number of production units [5].
In this paper we take a different approach by limiting the feedback used in the optimization. This seems to be a computationally efficient approximation that does not sacrifice to much accuracy by deviating from optimality. Furthermore, we extend some of the main results in [9] to problems with non-uniform execution delays and ramping.

Preliminaries
Throughout, we will assume that (X t : 0 ≤ t ≤ T ) is an R m -valued stochastic process, living in the filtered probability space (Ω, F, P), defined as the strong solution to a stochastic differential equation and for some constant C > 0. We let F := (F t ) 0≤t≤T denote the filtration (F 0 t ) 0≤t≤T completed with all P-null sets.
• For each b ∈ J we let δ b ∈ R n be given by (δ b ) i := b i δ i , for i = 1, . . . , n and let D b • We define D p to be the domain of the production vector. Hence, D p := {p ∈ R n : 0 ≤ p i ≤ p i , for i = 1, . . . , n}. Furthermore, for each b ∈ J we define D b p := {p ∈ R n : 0 ≤ p i ≤ b ipi , for i = 1, . . . , n}.
• For each b ∈ J and each u ∈ U we extend the definition of ξ s to general initial conditions by defining the càdlàg process (ξ b • We let S 2 be the set of all progressively measurable, continuous processes (Z t : • We say that a family of processes ((Y y t ) 0≤t≤T : y ∈ R k ) is continuous in the parameter y if and use the notation Further, we assume that: • We assume that the switching costs, c 0 i : [0, T ] → R + and c 1 i : [0, T ] → R + are Lipschitz continuous functions such that min t∈[0,T ] c 0 i (t) + min t∈[0,T ] c 1 i (t) > 0, for i = 1, . . . , n.
• We make the additional assumption that the terminal rewards (h b ) b∈J satisfy which rules out any switching at time T .
To be able to consider feedback-control formulations we will, for all t ∈ [0, T ] and x ∈ R m , define the process (X t,x s ; 0 ≤ s ≤ T ) as the strong solution to A standard result (see e.g. Theorem 6.16, p. 49 in [20]) is that, for any θ ≥ 1, there exist constants C X 1 > 0 and C X 2 > 0 such that and for all t ′ ∈ [0, T ] and all As mentioned above we will assume that ψ b and h b are locally Lipschitz continuous and of polynomial growth, for all b ∈ J . Hence, there exist constants C ψ > 0, C h > 0 and γ ≥ 1 such that |ψ b (t, x, p)| ≤ C ψ (1 + |x| γ ) and |h b (x, p)| ≤ C h (1 + |x| γ ), for all (x, t, p, b) ∈ R m × D p . Now, (2.2) implies that, for each θ ≥ 1, there are constants C ψ 1 (= C ψ 1 (θ)) and C h Hence, we have and in particular Local Lipschitz continuity implies that, for every ρ > 0, there exist C ψ ρ , C h ρ > 0 such that, where we have used Markov's inequality (see e.g. Gut [12, p. 120]) in the last step. Now, since ρ > 0 was arbitrary we get and by a similar argument we have Furthermore, the Lipschitz continuity of R i implies that there is a constant C R > 0 such that The above estimates will be used to provide a solution to the operators problem defined as: To facilitate the solution of Problem 1 we use the following proposition, which is a standard result for optimal switching problems with strictly positive switching costs.

Solution by state space augmentation
The problem of finding a control that minimizes (1.1) is non-Markovian in the state (t, X t , ξ t ) due to the delays, which prevents us from uniquely determining p(t) from the operating mode ξ t . To remove delays in impulse control problems with uniform delivery lags it was proposed in [4] to augment the state space with the additional state, capacity of "projects in the pipe". With non-uniform delays and ramping this approach is not applicable. However, we can still apply a state space augmentation to remove the delays (see e.g. [5]).
By adding the càdlàg, F t -adapted process (ζ t : 0 ≤ s ≤ T ) defined as we retain a Markov problem in the state (t, X t , ξ t , ζ t ). The output vector can now be written
is an optimal strategy for Problem 1.
Proof. Note that the proof amounts to showing that for all (t, z, b) ∈ D ζ , we have P-a.s. and β 0 = b. Then uniqueness is immediate, (i) follows from Proposition 2.1 and (ii) follows from repeated use of the definition of the Snell envelope (see e.g. Appendix D of Karatzas and Shreve [14] or Proposition 2 of Djehiche, Hamadène and Popier [9]).
First, define Then by Proposition 2 of [9] Z s is the smallest supermartingale that dominates is the smallest supermartingale that dominates the process , and a supermartingale, thus, it is a supermartingale for s ≥ τ * j ′ . Hence, as is the sum of a finite number of supermartingales it is also a supermartingale. By the continuity of Y t,z,b s in (t, z) and the continuity of R and ψ b we get where the first part follows from the supermartingale property and the second inequality follows from Since this holds for all (t, z, b) ∈ D ζ we get where (τ * 0 , β * 0 ) = (0, 0). Letting N → ∞ while assuming that u * ∈ U f we find that Y 0,0,0 0 = J(u * ).
It remains to show that the strategy u * is optimal. To do this we pick any other strategyû := (τ 1 , . . . ,τN ;β 1 , . . . ,βN ) ∈ U f and let (ẑ j ) 1≤j≤N be defined by the recursionẑ j := (ẑ j−1 +(τ j −τ j−1 )β j−1 )∧ δβ j . By the definition of Y 0,0,0 0 in (3.1) we have but in the same way P-a.s. By repeating this argument and using the dominated convergence theorem we find that J(u * ) ≥ J(û) which proves that u * is in fact optimal and thus belongs to U f .

Existence
Theorem 3.1 presumes existence of the families ((Y t,z,b s ) 0≤s≤T : (t, z, b) ∈ D ζ ). To obtain a satisfactory solution to Problem 1, we thus need to establish existence. The general existence proof (see [7,9]) goes by defining a sequence ((Y t,z,b,k s ) 0≤s≤T : (t, z, b) ∈ D ζ ) k≥0 of families of processes as for k ≥ 1, and then showing that this sequence converges to a family ((Ỹ t,z,b s ) 0≤s≤T : (t, z, b) ∈ D ζ ) of S 2processes that satisfy the verification theorem. First we note that by letting U k t := {(τ 1 , . . . , τ N ; β 1 , . . . , β N ) ∈ U t : N ≤ k} and using a reasoning similar to that in the proof of Theorem 3.1 it follows that with β 0 = b. Proof. Mean square integrability can be deduced by noting that (3.7) and Doob's maximal inequality implies that there is a constant C > 0, such that, for k ≥ 0, and the right hand side is bounded by (2.4) and (2.5). Now, to show that b) holds we note that, for any control u ∈ U , we have P-a.s. Using symmetry we find that the same inequality holds for and the right hand side goes to 0 as (t ′ , z ′ ) → (t, z) by (2.8) and (2.9).
It remains to show that, for each (t, z, b) ∈ D ζ , the process (Y t,z,b,k s : 0 ≤ s ≤ T ) is continuous for all k ≥ 0. To do this we will also show that for each k ≥ 0 and each (t, z, b) ∈ D ζ : First consider the case k = 0. We have Hence, (Y t,z,b,0 s : 0 ≤ s ≤ T ) is the sum of a continuous process and a martingale w.r.t. the Brownian filtration and is thus continuous. Furthermore, for all s ≤ s ′ ≤ T and all b ′ ∈ J , Hence, continuity of (Y s,(z+(s−t)b) + ∧δ b ′ ,b ′ ,0 s : 0 ≤ s ≤ T ) follows from continuity of (Y t,z,b,0 s : 0 ≤ s ≤ T ) and continuity of ψ, h and R. Moving on we assume that a)-c) hold for some k ≥ 0. The process Y t,z,b,k+1 s + s 0 ψ b (r, X r , R(z + (r − t)b)) dr : 0 ≤ s ≤ T is the Snell envelope of the process It is well known that the Snell envelope of a process (U s : 0 ≤ s ≤ T ) is continuous if U only has positive jumps. Now, s 0 ψ b (r, X r , R(z + (r − t)b)) dr : 0 ≤ s ≤ T is continuous and, since Y s,(z+(s−t)b) + ∧δ β ,β,k s : 0 ≤ s ≤ T was assumed continuous, for all β ∈ J , in c), : 0 ≤ s ≤ T is a continuous process.
By a similar argument, since (Y c) holds for k + 1. But, then a)-c) hold for k + 1 as well. By an induction argument the proposition now follows.
Next we show that the limiting family, lim k→∞ ((Y t,z,b,k s ) 0≤s≤T : (t, z, b) ∈ D ζ ), exists and satisfies the verification theorem.
Proof. We need to show that the limit family ((Ỹ t,z,b s ) 0≤s≤T : (t, z, b) ∈ D ζ ) exists as a member of S 2 , that it is continuous in (t, z) and that it satisfies (3.1). This is done in four steps as follows.
we have that, P-a.s., where the right hand side is bounded P-a.s. by the estimates of Section 2. Hence, the sequence ((Y t,z,b,k s ) 0≤s≤T : (t, z, b) ∈ D ζ ) is increasing and P-a.s. bounded, thus, it converges P-a.s. for all s ∈ [0, T ].
(ii) Limit satisfies (3.1). Applying the convergence result to the right hand side of (3.6) and using (iv) of Proposition 2 in [9] we find that Using the same reasoning as above we find that there exists a constant C > 0, such that, which is bounded by the estimates of Section 2. To prove continuity in s we note thatỸ t,z,b s + s 0 ψ b (r, X r , R((z+(s−t)b) + ∧δ b ))dr is the limit of an increasing sequence of continuous supermartingales and thus càdlàg [13]. Now, for each b ∈ J and each (t, z, b) ∈ D ζ the processes s 0 ψ b (r, X r , R((z + (s − t)b) + ∧ δ b ))dr : 0 ≤ s ≤ T are continuous. Hence, by the properties of the Snell envelope, ifỸ t,z,b s has a (necessarily negative) jump at s 1 ∈ [0, T ], then, for some β 1 ∈ J −b ,Ỹ s 1 ,(z+(s 1 −t)b) + ∧δ β 1 ,β 1 s also has a jump at s 1 andỸ t,z,b has a (negative) jump at s 1 , then for some β 2 ∈ J −b , the processỸ will have a negative jump at s 1 andỸ Repeating this argument we get a sequence (β k ) k≥0 , with β 0 = b and β k ∈ J −β k−1 for k ≥ 1, such that for any j > k ≥ 0 we havẽ is a decreasing sequence that takes values in a finite set and J is a finite set, there are j > k ≥ 0 such that j l=1 δ β l = k l=1 δ β l and β j = β k . But then . . , n}. Hence,Ỹ t,z,b s must be continuous and thus belongs to S 2 .
(iv) Limit continuous in (t, z). By the dominated convergence theorem we have, This finishes the proof.
We have thus far derived a verification theorem for the solution of Problem 1, and shown that there exists a (unique) family of processes satisfying the verification theorem. To finish the solution of Problem 1 we show that the families of processes in the verification theorem defines continuous value functions.

Value function representation
We first extend the definition of the families of processes in the verification theorem to a full statefeedback form by introducing general initial conditions as follows. For all (r, x) ∈ [0, T ] × R m we let ((Y r,x,t,z,b s ) 0≤s≤T : (t, z, b) ∈ D ζ ) be the family of processes that satisfies the verification theorem for the process (X r,x s : 0 ≤ s ≤ T ) and let ((Y r,x,t,z,b,k s ) 0≤s≤T : (t, z, b) ∈ D ζ ) k≥0 be the corresponding versions of ((Y t,z,b,k s ) 0≤s≤T : (t, z, b) ∈ D ζ ) k≥0 defined by (3.5) and (3.6) with X replaced by X r,x . The following estimates hold: Furthermore, Y r,x,t,z,b r is deterministic and Proof. For the first part we note that, again using Doob's maximal inequality, there exists a C > 0 such that . For the second part we pick any control u = (τ 1 , . . . , τ N ; β 1 , . . . , β N ) ∈ U r and let u ′ = (τ l ∨ r ′ , τ l+1 . . . , τ N ; β l , β l+1 , . . . , β N ), where l := max{j ≥ 1 : τ j ≤ r ′ } ∨ 1 with max{∅}=0. Then u ′ ∈ U r ′ and we have sup P-a.s. and, by Lipschitz continuity of the c 0 i and c 1 i , the switching costs obey for some C c > 0. Hence, since u was arbitrary we have Considering (2.4) we see that the first two integrals on the right hand side go to zero as r → r ′ . By arguing as in part (iv) of the proof of Theorem 3.3 we find that the remainder goes to 0 as (r ′ , x ′ , t ′ , z ′ ) → (r, x, t, z). Now, by symmetry this applies to Y r ′ ,x ′ ,t ′ ,z ′ ,b r ′ − Y r,x,t,z,b r as well and the second inequality follows.
Repeated use of Theorem 8.5 in [11] shows that for k ≥ 0, there exist functions (3.8)

Limited feedback
When searching for a numerical solution to Problem 1, by means of a lattice or a Monte Carlo approximation of the value function in (3.8), the curse of dimensionality will generally become apparent through an explosion in the computational burden as the number of units increase. To limit this effect we present an alternative, sub-optimal, scheme where only a part of the available state information is considered when making decisions. Assume that, at time t, the system is operated in mode b ∈ J with ζ t = δ b when one or more units are intervened on giving us the new mode b ′ ∈ J −b . The production in the period [t, T ] can then be written We define the intervention times τ t,x,b,k 1 , . . . , τ t,x,b,k N and the corresponding sequence of active units β t,x,b,k 1 , . . . , β t,x,b,k N as τ t,x,b,k For each b ∈ J and each (t, x, z) ∈ [0, T ] × R m × D b ζ we define the cost-to-go when applying the control u ∈ U t , as an extension of J in the formulation of Problem 1, r , p b )dr : 0 ≤ s ≤ T is the Snell envelope of a càdlàg process and thus càdlàg. Now, assume that, for some k ≥ 1, (Ŷ t,x,b,k is càdlàg as the Snell envelope of a càdlàg process. From the proof of Theorem 1 in [9] and the Markov property we get where, for each b, b ′ ∈ J , the discrete-time delay revenueΓ b b ′ : Π × R m → R is given bŷ with, for all i ∈ I(b ′ ),  1  150  5000  2000  3  2  7  2  125  3000  2000  4  1  6  3  100  2000  1500  4  2  5  4  75  1500  1000  4  1  4  5  50  750  1000  5  1  3  6 25 500 1000 7 1 2 Table 1: Data for the production units in the example.
The problem is numerically solved by means of a Markov-Chain approximation of the process (X t : 0 ≤ t ≤ T ) as prescribed in [15]. We use a time-discretization with N Π = 241 points and discretize the state space of (X t : 0 ≤ t ≤ T ) using 201 grid-points.
With this dicsretization, the numerical solution was obtained in 4, 18 and 720 seconds for the limited feedback algorithm. For the fully augmented solution method the first two settings with two and three units where solved in around 220 and 12000 seconds, respectively (it seemed computationally impossible to obtain a solution with the full system of six units). Figures 2-4 show the expected operation costs at time zero for the limited feedback approach (solid blue lines) and the corresponding minimal operation costs obtained by state space augmentation (dashed magenta lines), for the three different forecasts. In all cases the expected operation costs decreased with more units, in particular the expected operation cost with units {3, 5} was always higher than the expected operation cost with units {2, 4, 6}.
In Figures 5-7 the relative error of the limited feedback approximation is plotted for the three different  Note that the level of sub-optimality induced by the limited feedback approximation depends on the properties of the process (X t : 0 ≤ t ≤ T ) but also on the available units. The seemingly higher error with three units (F 2 ) compared to with two units (F 1 ) can, however, be partially explained by the lower operation cost for F 2 leading to a higher weight of the absolute error in the relative error.