Mathematical Methods of Operations Research

, Volume 87, Issue 3, pp 347–382

# A limited-feedback approximation scheme for optimal switching problems with execution delays

• Magnus Perninge
Open Access
Article

## Abstract

We consider a type of optimal switching problems with non-uniform execution delays and ramping. Such problems frequently occur in the operation of economical and engineering systems. We first provide a solution to the problem by applying a probabilistic method. The main contribution is, however, a scheme for approximating the optimal control by limiting the information in the state-feedback. In a numerical example the approximation routine gives a considerable computational performance enhancement when compared to a conventional algorithm.

## Keywords

Optimal switching Impulse control Real options Delivery lag Execution delay Stopping time Snell envelope Numerical algorithm

## 1 Introduction

Consider a set of n production units $$F:=\{1,\ldots ,n\}$$ where each unit can be operated at two different levels, $$\{0,1\}$$, representing “off” and “on”. We assume that a central operator can switch production between the two operating levels in each unit. Following a switch from “off” to “on” in Unit i the output will, in general, not immediately jump to the installed capacity, $${\bar{p}}_i$$. Rather we assume that the production ramps up during a delay period $$[0,\delta _i]$$, with $$\delta _i>0$$. We thus assume that the output of Unit i following a switch from “off” to “on” is described by a Lipschitz continuous function $$R_i:[0,\delta _i]\rightarrow [0,{\bar{p}}_i]$$, with $$R_i(0)=0$$ and $$R_i(\delta _i)={\bar{p}}_i$$. Turning off the unit is, on the other hand, assumed to render an immediate halt of production.

We consider the problem where a central operator wants to maximize her return over a predefined operation period [0, T] (with $$T<\infty$$) that can represent, for example, the net profit from electricity production in n production units or mineral extraction from n mines. The profit depends on the operating-mode and the output from the n units, but also on an observable diffusion process $$(X_t: 0\le t\le T)$$.

For $$i=1,\ldots ,n$$ we let $$0\le \tau ^i_1\le \cdots \le \tau ^i_{N_i}< T$$ represent the times that the operator intervenes1 on Unit i. We assume, without loss of generality, that all units are off at the start of the period so that intervention $$\tau ^i_{2j-1}$$ turns operation on, while intervention $$\tau ^i_{2j}$$ turns operation to the “off”-mode. We define the operating-mode $$(\xi _t: 0\le t \le T)$$ of the system to be the $${\mathcal {J}}:=\{0,1\}^n$$-valued process representing the evolution of the operation modes for the n units. The operation-mode of Unit i, at time $$t\in [0,T]$$, is then
\begin{aligned} (\xi _t)_i:=\sum _{j=1}^{\lceil N_i/2 \rceil }\mathbb {1}_{[\tau ^i_{2j-1},\tau ^i_{2j})}(t), \end{aligned}
(where $$\lceil a \rceil$$ is the smallest integer k such that $$k\ge a$$) and the output of the same unit is
\begin{aligned} p_i(t):=\sum _{j=1}^{\lceil N_i/2 \rceil }\mathbb {1}_{[\tau ^i_{2j-1},\tau ^i_{2j})}(t)R_i\left( \left( t-\tau ^i_{2j-1}\right) \wedge \delta _i\right) , \end{aligned}
with the convention that $$\tau ^i_{N_i+1}=\infty$$. Each intervention on Unit i renders a cost $$c_i^{0}:[0,T]\rightarrow {\mathbb {R}}_+$$, when turning operation from “off” to “on” and a cost $$c_i^{1}:[0,T]\rightarrow {\mathbb {R}}$$, when the intervention is turning off the unit. We assume that a given operation strategy $$u:=(\tau ^1_1,\ldots ,\tau ^1_{N_1};\ldots ;\tau ^n_1,\ldots ,\tau ^n_{N_n})$$ gives the total reward
\begin{aligned} J(u)&:={\mathbb {E}}\left[ \int _0^T \psi _{\xi _t}\left( t,X_t,p(t)\right) dt+h_{\xi _T}\left( X_T,p(T)\right) \right. \nonumber \\&\quad \;\left. -\sum _{i=1}^n\left\{ \sum _{j=1}^{\lceil N_i/2 \rceil }c_i^{0}\Big (\tau ^i_{2j-1}\Big )+\sum _{j=1}^{\lfloor N_i/2 \rfloor }c_i^{1}\Big (\tau ^i_{2j}\Big )\right\} \right] , \end{aligned}
(1)
where, for each $$\mathbf{b}:=(b_1,\ldots ,b_n)\in {\mathcal {J}}$$, $$\psi _{\mathbf{b}}:[0,T]\times {\mathbb {R}}^m \times {\mathbb {R}}^n_+\rightarrow {\mathbb {R}}$$ and $$h_{\mathbf{b}}:{\mathbb {R}}^m \times {\mathbb {R}}^n_+\rightarrow {\mathbb {R}}$$ are deterministic, locally Lipschitz continuous functions of at most polynomial growth and $$\lfloor a \rfloor$$ is the largest integer k such that $$k\le a$$.

The problem of finding a maximizer of (1) is a multi-modes optimal switching problem with execution delays. The multi-modes optimal switching problem was popularized by Carmona and Ludkovski (2008), where they suggested an application to valuation of energy tolling agreements (see also the paper by Deng and Xia 2005).

A formal solution to the multi-modes optimal switching problem, without delays, was derived by Djehiche et al. (2009). The authors adopted a probabilistic approach by defining a verification theorem for a family of stochastic processes that specifies sufficient conditions for optimality. They further proved existence of a family of processes that satisfies the verification theorem and showed that these processes can be used to define continuous value functions that form solutions, in the viscosity sense, to a set of variational inequalities. El Asri and Hamadéne (2009) extended the approach to switching problems where the switching costs are functions also of the state and proved uniqueness of the viscosity solutions.

Previous work on more general impulse control problems with execution delays include the novel paper by Bar-Ilan and Sulem (1995), where an explicit solution to an inventory problem with uniform delivery lag is found by taking the current stock plus pending orders as one of the states. Similar approaches are taken by Aïd et al. (2015) where explicit optimal solutions of impulse control problems with uniform delivery lags are derived for a large set of different problems and by Bruder and Pham (2009) who propose an iterative algorithm. Øksendal and Sulem (2008) propose a solution to general impulse control problems with execution delays, by defining an operator that circumvents the delay period.

A state space augmentation approach to switching problems with non-uniform delays and ramping is taken by Perninge and Söder (2014) and by Perninge (2015) where application to real-time operation of power systems is considered. In these papers numerical solution algorithms are proposed by means of the regression Monte Carlo approach (see Longstaff and Schwartz 2001), that has previously been proposed to solve multi-modes switching problems by Carmona and Ludkovski (2008) and by Aïd et al. (2014).

Although many approaches have been proposed to give solutions, both exact and approximate, to impulse control problems with execution delays, they either consider models where delays only enter through uniform lags, or they propose methods that become intractable for systems with many production units. When trying to find a maximizer of (1) by augmenting the state with a suitable set of “times since last intervention” the state space dimension increases with the number of active units. The curse of dimensionality (see e.g. Bertsekas 2005) thus renders numerical solution intractable already at a relatively low number of production units.

In this paper we first extend the existence and uniqueness results in Djehiche et al. (2009) to problems with non-uniform execution delays and ramping. Further, we propose an approximation routine based on limiting the feedback information used in the optimization. More specifically, we assume that all decisions are taken based on the assumption that the last switch from “off” to “on” of Unit i was made more than $$\delta _i$$ time units ago. Adding a correction term in the switching costs we still manage to retain an unbiased estimate of the future costs. This seems to be a computationally efficient approximation that does not sacrifice to much accuracy by deviating from optimality.

## 2 Preliminaries

Throughout we will assume that $$(X_t:0\le t\le T)$$ is an $${\mathbb {R}}^m$$-valued stochastic process, living in the filtered probability space $$({\varOmega },{\mathcal {F}},{\mathbb {P}})$$, defined as the strong solution to a stochastic differential equation (SDE) as follows
\begin{aligned} dX_t&=a(t,X_t)dt+\sigma (t,X_t)dW_t,\quad t\in [0,T],\\ X_0&=x_0, \end{aligned}
where $$(W_t; 0\le t \le T)$$ is an m-dimensional Brownian motion whose natural filtration is $$({\mathcal {F}}^0_t)_{ 0\le t\le T}$$, $$x_0\in {\mathbb {R}}^m$$ and $$a:[0,T]\times {\mathbb {R}}^m \rightarrow {\mathbb {R}}^m$$ and $$\sigma :[0,T]\times {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{m\times m}$$ are two deterministic, continuous functions that satisfy
\begin{aligned} |a(t,x)|+|\sigma (t,x)|\le C(1+|x|) \end{aligned}
and
\begin{aligned} |a(t,x)-a(t,x')|+|\sigma (t,x)-\sigma (t,x')|\le C|x-x'|, \end{aligned}
for some constant $$C>0$$. We let $${\mathbb {F}}:=({\mathcal {F}}_t)_{0\le t\le T}$$ denote the filtration $$({\mathcal {F}}^0_t)_{ 0\le t\le T}$$ completed with all $${\mathbb {P}}$$-null sets.
We will use the following notations throughout the paper:
• We let $${\mathcal {U}}$$ be the set of all $$u:=(\tau ^1_1,\ldots ,\tau ^1_{N_1};\ldots ;\tau ^n_1,\ldots ,\tau ^n_{N_n})$$ where the $$\tau ^i_j$$ are $${\mathbb {F}}$$-stopping times and define $${\mathcal {U}}_t:=\{(\tau ^1_1,\ldots ,\tau ^1_{N_1};\ldots ;\tau ^n_1,\ldots ,\tau ^n_{N_n})\in {\mathcal {U}}: \tau ^i_1\ge t, \text { for }i=1,\ldots ,n\}$$.

• It is sometimes convenient to represent a control $$u=(\tau ^1_1,\ldots ,\tau ^1_{N_1};\ldots ;\tau ^n_1,$$ $$\ldots ,\tau ^n_{N_n})\in {\mathcal {U}}$$ by a sequence of intervention times $$0\le \tau _1<\cdots<\tau _N< T$$, and a sequence of corresponding interventions $$\beta _1,\ldots ,\beta _N$$ where $$\tau _1:=\min _{(i,j)} \tau ^i_j$$ and $$\tau _j:=\min _{(i,j)} \{\tau ^i_j: \tau ^i_j>\tau _{j-1}\}$$ for $$j=2,\ldots ,N$$, and $$\beta _j:=\xi _{\tau _j}$$ for $$j=1,\ldots ,N$$. With this notation we may write the operation-mode in the more familiar form
\begin{aligned} \xi _t:=\beta _0\mathbb {1}_{[0,\tau _{1})}(t)+\sum _{j=1}^{N}\beta _j\mathbb {1}_{[\tau _{j},\tau _{j+1})}(t), \end{aligned}
with $$\beta _0:=\mathbf{0}$$ and using the convention that $$\tau _{N+1}=\infty$$.
• We define $$D_p$$ to be the domain of the production vector. Hence, $$D_p:=\{p\in {\mathbb {R}}^n: 0\le p_i\le {\bar{p}}_i, \text { for }i=1,\ldots ,n\}$$. Furthermore, for each $$\mathbf{b}\in {\mathcal {J}}$$ we define $$D_p^\mathbf{b}:=\{p\in {\mathbb {R}}^n: 0\le p_i\le b_i{\bar{p}}_i, \text { for }i=1,\ldots ,n\}$$.

• For each $$\mathbf{b}\in {\mathcal {J}}$$ we let $$\delta ^{\mathbf{b}}\in {\mathbb {R}}^n$$ be given by $$(\delta ^{\mathbf{b}})_i:=b_i \delta _i$$ for $$i=1,\ldots ,n$$ and let $$D_\zeta ^{\mathbf{b}}:=\{z\in {\mathbb {R}}^n: 0\le z_i\le \delta ^{\mathbf{b}}_i, \text { for }i=1,\ldots ,n\}$$. We define the sets $${\mathcal {I}}(\mathbf{b}):=\{i\in \{1,\ldots ,n\} : b_i=1\}$$, $${\mathcal {J}}^{-\mathbf{b}}:=\{\mathbf{b}'\in {\mathcal {J}}:\mathbf{b}'\ne \mathbf{b}\}$$, $${\mathcal {D}}_{\zeta }:=[0,T]\times \cup _{\mathbf{b}\in {\mathcal {J}}} (D_\zeta ^{\mathbf{b}}\times \{\mathbf{b}\})$$ and $${\mathcal {D}}_{p}:=[0,T]\times \cup _{\mathbf{b}\in {\mathcal {J}}} (D_p^{\mathbf{b}}\times \{\mathbf{b}\})$$.

• We extend the ramp functions $$R_i$$ by defining $$R:{\mathbb {R}}^n\rightarrow D_p$$ as $$(R(z))_i:=R_i(z_i^+\wedge \delta _i)$$ for $$i=1,\ldots ,n$$, with $$s^+=\max (s,0)$$, and define the residual to the ramp function $${\tilde{R}}_i:{\mathbb {R}}\rightarrow [0,{\bar{p}}_i]$$ as $${\tilde{R}}_i(s):={\bar{p}}_i-R_i(s^+\wedge \delta _i)$$ for $$i=1,\ldots ,n$$.

• For each $$\mathbf{b},\mathbf{b}'\in {\mathcal {J}}$$ we let $$c^\mathbf{b}_i:=c^{b_i}_i$$ and $$c^{\mathbf{b}}_{\mathbf{b}'}:=\sum _{i=1}^n \mathbb {1}_{[b_i\ne b'_i]}c^{\mathbf{b}}_i$$.

• For each $$\mathbf{b}\in {\mathcal {J}}$$ and each $$u\in {\mathcal {U}}$$ we extend the definition of $$\xi _s$$ to general initial conditions by defining the càdlàg process $$(\xi ^{\mathbf{b}}_s: 0\le s\le T)$$ as $$\xi ^{\mathbf{b}}_s:= \mathbf{b}\mathbb {1}_{[0,\tau _1)}(s)+\sum _{j=1}^N \beta _j \mathbb {1}_{[\tau _j,\tau _{j+1})}(s)$$.

• We let $${\mathcal {S}}^2$$ be the set of all progressively measurable, continuous processes $$(Z_t: 0\le t\le T)$$ such that $${\mathbb {E}}\left[ \sup _{t\in [0,T]} |Z_t|^2\right] <\infty$$.

• We say that a family of processes $$((Y_t^y)_{0\le t\le T}: y\in {\mathbb {R}}^k)$$ is continuous in the parameter y if
\begin{aligned} \lim _{y'\rightarrow y}{\mathbb {E}}\bigg [\sup _{t\in [0,T]} |Y_t^{y'}-Y_t^y|\bigg ]\rightarrow 0,\quad \forall y\in {\mathbb {R}}^k, \end{aligned}
and use the notation $$\Vert Y^{y'}-Y^y\Vert _{\theta }:={\mathbb {E}}\Big [\sup _{t\in [0,T]} |Y_t^{y'}-Y_t^y|^{\theta }\Big ]$$.
Further, we assume that:
• The switching costs, $$c_i^{0}:[0,T]\rightarrow {\mathbb {R}}_+$$ and $$c_i^{1}:[0,T]\rightarrow {\mathbb {R}}_+$$, are Lipschitz continuous functions such that $$\min _{t\in [0,T]}c_i^{0}(t) + \min _{t\in [0,T]} c_i^{1}(t) > 0$$ for $$i=1,\ldots ,n$$.

• The terminal rewards $$(h_\mathbf{b})_{\mathbf{b}\in {\mathcal {J}}}$$ satisfy
\begin{aligned} h_{\mathbf{b}}(x,p)\ge \max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}} \left\{ -c^\mathbf{b}_\beta (T) + h_{\beta }(x,p\wedge R(\delta ^{\beta }))\right\} ,\quad \forall (x,p)\in {\mathbb {R}}^m\times D_p^{\mathbf{b}}, \end{aligned}
(2)
which rules out any switching at time T.
To be able to consider feedback-control formulations we will, for all $$t\in [0,T]$$ and $$x\in {\mathbb {R}}^m$$, define the process $$(X_s^{t,x};0\le s\le T)$$ as the strong solution to
\begin{aligned} dX_s^{t,x}&=a(s,X^{t,x}_s)ds+\sigma (s,X^{t,x}_s)dW_s,\quad \forall s\in [t,T],\\ X^{t,x}_s&=x,\quad \forall s\in [0,t]. \end{aligned}
A standard result (see e.g. Theorem 6.16 in Chap. 1 of Yong and Zhou 1999) is that, for any $$\theta \ge 1$$, there exist constants $$C^X_1>0$$ and $$C^X_2>0$$ such that
\begin{aligned} {\mathbb {E}}\bigg [\sup _{s\in [0,T]}|X^{t,x}_s|^\theta \bigg ] \le C^X_1 (1+|x|^{\theta }) \end{aligned}
(3)
and for all $$t'\in [0,T]$$ and all $$x'\in {\mathbb {R}}$$
\begin{aligned}&{\mathbb {E}}\bigg [\sup _{s\in [0,T]}|X^{t,x}_s-X^{t',x'}_s|^\theta \bigg ]\le C^X_2 (1+|x|^\theta )(|x-x'|^\theta +|t'-t|^{\theta /2}). \end{aligned}
(4)
As mentioned above we will assume that $$\psi _{\mathbf{b}}$$ and $$h_{\mathbf{b}}$$ are locally Lipschitz continuous and of polynomial growth for all $$\mathbf{b}\in {\mathcal {J}}$$. Hence, there exist constants $$C^\psi >0$$, $$C^h>0$$ and $$\gamma \ge 1$$ such that $$|\psi _{\mathbf{b}}(t,x,p)|\le C^\psi (1+|x|^\gamma )$$ and $$|h_{\mathbf{b}}(x,p)|\le C^h(1+|x|^\gamma )$$ for all $$(x,t,p,\mathbf{b})\in {\mathbb {R}}^m\times {\mathcal {D}}_{p}$$.
Now, (3) implies that, for each $$\theta \ge 1$$, there are constants $$C^{\psi }_1$$ $$(=C^{\psi }_1(\theta ))$$ and $$C^h_1$$ $$(=C^{h}_1(\theta ))$$ such that, for all $$(x,t,p,\mathbf{b})\in {\mathbb {R}}^m\times {\mathcal {D}}_{p}$$,
\begin{aligned} {\mathbb {E}}\bigg [\sup _{s\in [t,T]} |\psi _{\mathbf{b}}(s,X_s^{t,x},p)|^\theta \bigg ]&\le C^{\psi }_1(1+|x|^{\gamma \theta }) \end{aligned}
(5)
and
\begin{aligned} {\mathbb {E}}\left[ |h_{\mathbf{b}}(X_T^{t,x},p)|^\theta \right]&\le C^h_1(1+|x|^{\gamma \theta }). \end{aligned}
(6)
Hence, we have
\begin{aligned} {\mathbb {E}}\bigg [\int _{0}^{T} \max _{\mathbf{b}\in {\mathcal {J}}} \sup _{p\in D_p^{\mathbf{b}}}|\psi _{\mathbf{b}}(s,X_s^{t,x},p)|^\theta ds\bigg ] \le T C^{\psi }_1(1+|x|^{\gamma \theta }) \end{aligned}
(7)
and in particular
\begin{aligned} {\mathbb {E}}\bigg [\int _{0}^{T}\max _{\mathbf{b}\in {\mathcal {J}}} \sup _{p\in D_p^{\mathbf{b}}}|\psi _{\mathbf{b}}(s,X_s,p)|^\theta ds\bigg ] \le T C^{\psi }_1(1+|x_0|^{\gamma \theta }). \end{aligned}
(8)
Local Lipschitz continuity implies that, for every $$\rho >0$$, there exist $$C^{\psi }_\rho ,C^{h}_\rho >0$$ such that,
\begin{aligned} |\psi _{\mathbf{b}}(t,x,p)-\psi _{\mathbf{b}}(t,x',p')|^2\mathbb {1}_{[|x|\vee |x'| \le \rho ]}\le C^{\psi }_\rho (|x-x'|+|p-p'|) \end{aligned}
and
\begin{aligned} |h_{\mathbf{b}}(x,p)-h_{\mathbf{b}}(x',p')|^2\mathbb {1}_{[|x|\vee |x'| \le \rho ]}\le C^{h}_\rho (|x-x'|+|p-p'|) \end{aligned}
for all $$(x,t,p,\mathbf{b})\in {\mathbb {R}}^m\times {\mathcal {D}}_{p}$$ and $$(x',t',p',\mathbf{b})\in {\mathbb {R}}^m\times {\mathcal {D}}_{p}$$. We thus have
\begin{aligned}&{\mathbb {E}}\bigg [\sup _{s\in [0,T]} |\psi _{\mathbf{b}}(s,X_s^{t,x},p)-\psi _{\mathbf{b}}(s,X_s^{t',x'},p')|^2\bigg ] \\&\quad = {\mathbb {E}}\bigg [\sup _{s\in [0,T]} |\psi _{\mathbf{b}}(s,X_s^{t,x},p)-\psi _{\mathbf{b}}(s,X_s^{t',x'},p')|^2\mathbb {1}_{[|X_s^{t,x}|\vee |X_s^{t',x'}|\le \rho ]} \\&\qquad +|\psi _{\mathbf{b}}(s,X_s^{t,x},p)-\psi _{\mathbf{b}}(s,X_s^{t',x'},p')|^2\mathbb {1}_{[|X_s^{t,x}|\vee |X_s^{t',x'}|> \rho ]}\bigg ] \\&\quad \le {\mathbb {E}}\bigg [\sup _{s\in [t\vee t',T]} \Big \{C^{\psi }_\rho (|X_s^{t,x}-X_s^{t',x'}|+|p-p'|) \\&\qquad +C^\psi (2+|X_s^{t,x}|^{2\gamma }+|X_s^{t',x'}|^{2\gamma })\mathbb {1}_{[|X_s^{t,x}|\vee |X_s^{t',x'}|> \rho ]}\Big \}\bigg ] \\&\quad \le C^{\psi }_\rho (C^X_2 (1+|x|)(|x-x'|+|t'-t|^{1/2})+|p-p'|) \\&\qquad + C^\psi (2+C^X_1(2+|x|^{2\gamma }+|x'|^{2\gamma }))\frac{C^X_1(2+|x|^{2\gamma }+|x'|^{2\gamma })}{\rho }, \end{aligned}
where we have used Markov’s inequality (see e.g. Gut 2005, p. 120) in the last step. Now, since $$\rho >0$$ was arbitrary we get
\begin{aligned} \lim _{(t',x',p')\rightarrow (t,x,p)}{\mathbb {E}}\bigg [\sup _{s\in [0,T]} |\psi (s,X_s^{t,x},p)-\psi (s,X_s^{t',x'},p')|^2\bigg ]=0 \end{aligned}
(9)
and by a similar argument we have
\begin{aligned} \lim _{(t',x',p')\rightarrow (t,x,p)}{\mathbb {E}}\left[ |h(X_T^{t,x},p)-h(X_T^{t',x'},p')|^2\right] =0. \end{aligned}
(10)
Furthermore, the Lipschitz continuity of $$R_i$$ implies that there is a constant $$C^R>0$$ such that $$|R_i(t)-R_i(s)|\le C^R|t-s|$$ for all $$(t,s)\in [0,\delta _i]^2$$.

The above estimates will be used to provide a solution to the operators problem defined as:

### Problem 1

Let $${\mathcal {U}}$$ be the set of all $$u:=(\tau ^1_1,\ldots ,\tau ^1_{N_1};\ldots ;\tau ^n_1,\ldots ,\tau ^n_{N_n})$$ where the $$\tau ^i_j$$ are $${\mathbb {F}}$$-stopping times. Find $$u^*\in {\mathcal {U}}$$, such that
\begin{aligned} J(u^*)=\sup _{u\in {\mathcal {U}}} J(u). \end{aligned}
(11)

The following proposition is a standard result for optimal switching problems with strictly positive switching costs.

### Proposition 1

Let $${\mathcal {U}}^f$$ be the set of finite strategies, i.e. $${\mathcal {U}}^f:=\{u\in {\mathcal {U}}:\, {\mathbb {P}}\left[ (\omega : \sum _{i=1}^n N_i(\omega )>k, \,\forall k>0)\right] =0\}$$. Then,
\begin{aligned} \sup _{u\in {\mathcal {U}}} J(u)=\sup _{u\in {\mathcal {U}}^f} J(u). \end{aligned}
(12)

### Proof

Assume that $$u\in {\mathcal {U}}\setminus {\mathcal {U}}^f$$ and let $$B:=(\omega : \sum _{i=1}^n N_i(\omega )>k, \,\forall k>0)$$, then $${\mathbb {P}}[B]>0$$ and we have
\begin{aligned} J(u)&\le {\mathbb {E}}\bigg [\int _0^T \max _{\mathbf{b}\in {\mathcal {J}}} \sup _{p\in D_p^{\mathbf{b}}} |\psi _{\mathbf{b}}(s,X_s,p)|ds \\&\quad \;-\mathbb {1}_B\sum _{i=1}^n{\lfloor N_i/2 \rfloor }\Big (\min _{t\in [0,T]}c_i^{0}(t)+\min _{t\in [0,T]}c_i^{1}(t)\Big )\bigg ]=-\infty , \end{aligned}
since $$\min _{t\in [0,T]}c_i^{0}(t) + \min _{t\in [0,T]} c_i^{1}(t) > 0$$. Now, by (8) there is a constant $$C>0$$ such that $$J(u)>-C(1+|x_0|^\gamma )$$ for $$u=\emptyset$$, and (12) follows. $$\square$$

## 3 Solution by state space augmentation

The problem of finding a control that minimizes (1) is non-Markovian in the state $$(t,X_t,\xi _t)$$ due to the delays, which prevents us from uniquely determining p(t) from the operating mode $$\xi _t$$. To remove delays in impulse control problems with uniform delivery lags it was proposed in Bar-Ilan et al. (2002) to augment the state space with the additional state, capacity of “projects in the pipe”. With non-uniform delays and ramping this approach is not applicable. However, we can still apply a state space augmentation to remove the delays (see e.g. Ch. 1.4, pp. 35–36 in Bertsekas 2005).

We let $$(\zeta _t:0\le t\le T)$$ denote the càdlàg, $${\mathcal {F}}_t$$-adapted process
\begin{aligned} (\zeta _t)_i:=\sum _{j=1}^{\lceil N_i/2 \rceil }\left( \left( t-\tau ^i_{2j-1}\right) \wedge \delta _i\right) \mathbb {1}_{[\tau ^i_{2j-1},\tau ^i_{2j})}(t). \end{aligned}
The output vector can now be written2 $$p(t)=R(\zeta _t)$$ and we call $$\zeta _t$$ the ramp-time. We now retain a Markov problem in the augmented state $$(t,X_t,\xi _t,\zeta _t)$$.
To utilize the Markov property we want to be able to start the problem “afresh” at any given time. For each $$\mathbf{b}\in {\mathcal {J}}$$ and each $$z\in D_\zeta ^\mathbf{b}$$ we therefore define the càdlàg, $${\mathcal {F}}_t$$-adapted process $$(\zeta _s^{t,z,\mathbf{b}}:0\le s\le T)$$ as3
\begin{aligned} (\zeta _s^{t,z,\mathbf{b}})_i&:=\,\mathbb {1}_{[b_i=0]}\sum _{j=1}^{\lceil N_i/2 \rceil }\left( \left( s-\tau ^i_{2j-1}\right) \wedge \delta _i\right) \mathbb {1}_{[\tau ^i_{2j-1},\tau ^i_{2j})}(s) \\&\quad \;+\mathbb {1}_{[b_i=1]}\Big \{\left( (s-t+z_i)^+\wedge \delta _i\right) \mathbb {1}_{[t,\tau ^i_{1})}(s) \\&\quad \;+\sum _{j=1}^{\lfloor N_i/2 \rfloor }\left( \left( s-\tau ^i_{2j}\right) \wedge \delta _i\right) \mathbb {1}_{[\tau ^i_{2j},\tau ^i_{2j+1})}(s)\Big \}. \end{aligned}

### 3.1 Verification theorem

The following verification theorem is an adaptation of Theorem 1 in Djehiche et al. (2009) to the case with execution delays:

### Theorem 1

Assume that there exists a family of processes $$((Y^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ each in $${\mathcal {S}}^2$$ such that $$Y^{t,z,\mathbf{b}}_s$$ is continuous in (tz) and
\begin{aligned} Y^{t,z,\mathbf{b}}_s&:=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_{s}} {\mathbb {E}}\bigg [\int _s^{\tau \wedge T}\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr \nonumber \\&\quad +\;\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \nonumber \\&\quad +\;\mathbb {1}_{[\tau < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(\tau )+Y^{\tau ,(z+(\tau -t)\mathbf{b})^+\wedge \delta ^{\beta },\beta }_\tau \right\} \Big | {\mathcal {F}}_s\bigg ]. \end{aligned}
(13)
Then $$((Y^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ is unique and
1. (i)

Satisfies $$Y_0^{0,0,0}=\sup _{u\in {\mathcal {U}}} J(u)$$.

2. (ii)
Defines the sequence $$(\tau _1^*,\ldots ,\tau _{N^*}^*;\beta _1^*,\ldots ,\beta ^*_{N^*})$$, where $$(\tau _j^*)_{1\le j\le {N^*}}$$ is a sequence of $${\mathbb {F}}$$-stopping times given by
\begin{aligned} \tau ^*_1:=\inf \Big \{s\ge 0:\, Y_s^{0,0,0}=\max _{\beta \in {\mathcal {J}}^{-0}}\left\{ -c^{0}_{\beta }(s)+Y^{s,0,\beta }_s\right\} \Big \} \end{aligned}
(14)
and
\begin{aligned} \tau ^*_j&:=\inf \Big \{s \ge \tau ^*_{j-1}:\,Y_s^{\tau ^*_{j-1},z^*_{j-1},\beta ^*_{j-1}} \nonumber \\&=\max _{\beta \in {\mathcal {J}}^{-\beta ^*_{j-1}}}\Big \{-c^{\beta ^*_{j-1}}_{\beta }(s)+Y^{s,(z^*_{j-1}+(s-\tau ^*_{j-1})\beta ^*_{j-1})\wedge \delta ^{\beta },\beta }_s\Big \}\Big \}, \end{aligned}
(15)
for $$j\ge 2$$, and $$(\beta _j^*)_{1\le j\le {N^*}}$$ is defined as a measurable selection of
\begin{aligned} \beta ^*_j\in \mathop {\arg \max }_{\beta \in {\mathcal {J}}^{-\beta _{j-1}^*}}\Big \{-c^{\beta _{j-1}^*}_{\beta }(\tau ^*_j) +Y^{\tau ^*_j,(z^*_{j-1}+(\tau ^*_j-\tau ^*_{j-1})\beta ^*_{j-1})\wedge \delta ^{\beta },\beta }_{\tau ^*_j}\Big \}, \end{aligned}
(16)
where $$z^*_{j}:=(z^*_{j-1}+(\tau ^*_j-\tau ^*_{j-1})\beta ^*_{j-1})\wedge \delta ^{\beta ^*_{j}}$$, with $$z^*_0:=0$$ and $$\beta ^*_0=0$$; and $$N^*:=\max \{j:\tau _j^*<T\}$$. Then $$u^*=(\tau _1^*,\ldots ,\tau _{N^*}^*;\beta _1^*,\ldots ,\beta _{N^*}^*)$$ is an optimal strategy for Problem 1.

### Proof

Note that the proof amounts to showing that for all $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$, we have
\begin{aligned} Y^{t,z,\mathbf{b}}_s&:=\mathop {{\mathrm{ess}}\sup }\limits _{u\in {\mathcal {U}}_s^{f}} {\mathbb {E}}\bigg [\int _s^{T}\psi _{\xi _r^{\mathbf{b}}}\left( r,X_r,R(\zeta ^{t,z,\mathbf{b}}_r)\right) dr \\&\quad \;+h_{\xi _T^{\mathbf{b}}}\left( X_T,R(\zeta ^{t,z,\mathbf{b}}_T)\right) -\sum _{j=1}^N c^{\beta _{j-1}}_{\beta _{j}}\Big |{\mathcal {F}}_s\bigg ], \end{aligned}
for all $$s\in [t,T]$$, where $${\mathcal {U}}^{f}_t$$ is the subset of $${\mathcal {U}}^f$$ with $$\tau _1\ge t$$, $${\mathbb {P}}$$-a.s. and $$\beta _0=\mathbf{b}$$. Then uniqueness is immediate, (i) follows from Proposition 1 and (ii) follows from repeated use of the definition of the Snell envelope (see e.g. Appendix D of Karatzas and Shreve 1998 or Proposition 2 in Djehiche et al. 2009).
First, define
\begin{aligned} Z_s:=Y_s^{0,0,0}+\int _0^s\psi _0(r,X_r,0)dr. \end{aligned}
Then by Proposition 2 in Djehiche et al. (2009) $$Z_s$$ is the smallest supermartingale that dominates
\begin{aligned}&\left( \int _0^{s}\psi _0\left( r,X_r,0\right) dr+\mathbb {1}_{[s=T]}h_0\left( X_T,0\right) \right. \\&\quad \left. + \mathbb {1}_{[s < T]}\max _{\beta \in {\mathcal {J}}^{-0}}\left\{ -c^{0}_{\beta }+Y^{s,0,\beta }_s\right\} :0\le s\le T\right) \end{aligned}
and
\begin{aligned} Y_0^{0,0,0}&=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_0} {\mathbb {E}}\bigg [\int _0^{\tau \wedge T}\psi _0\left( r,X_r,0\right) dr+\mathbb {1}_{[\tau \ge T]}h_0\left( X_T,0\right) \\&\quad + \mathbb {1}_{[\tau< T]}\max _{\beta \in {\mathcal {J}}^{-0}}\left\{ -c^{0}_{\beta }(\tau )+Y^{\tau ,0,\beta }_\tau \right\} \bigg ] \\&={\mathbb {E}}\bigg [\int _0^{\tau ^*_1\wedge T}\psi _0\left( r,X_r,0\right) dr+\mathbb {1}_{[\tau ^*_1\ge T]}h_0\left( X_T,0\right) \\&\quad + \mathbb {1}_{[\tau ^*_1< T]}\max _{\beta \in {\mathcal {J}}^{-0}}\left\{ -c^{0}_{\beta }(\tau ^*_1)+Y^{\tau ^*_1,0,\beta }_{\tau ^*_1}\right\} \bigg ] \\&={\mathbb {E}}\bigg [\int _0^{\tau ^*_1\wedge T}\psi _0\left( r,X_r,0\right) dr+\mathbb {1}_{[\tau ^*_1\ge T]}h_0\left( X_T,0\right) \\&\quad + \mathbb {1}_{[\tau ^*_1 < T]}\left\{ -c^{0}_{\beta ^*_1}(\tau ^*_1)+Y^{\tau ^*_1,z_1^*,\beta ^*_1}_{\tau ^*_1}\right\} \bigg ] \end{aligned}
Now suppose that, for some $$j'>0$$ we have, for all $$j\le j'$$,
\begin{aligned} Y_s^{\tau ^*_{j-1},z^*_{j-1},\beta ^*_{j-1}}&={\mathbb {E}}\bigg [\int _s^{\tau ^*_j\wedge T}\psi _{\beta ^*_{j-1}}\left( r,X_r,R(z^*_{j-1}+(r-\tau ^*_{j-1})\beta ^*_{j-1})\right) dr \\&\quad +\mathbb {1}_{[\tau ^*_j \ge T]}h_{\beta ^*_{j-1}}\left( X_T,R(z^*_{j-1}+(T-\tau ^*_{j-1})\beta ^*_{j-1})\right) \\&\quad + \mathbb {1}_{[\tau ^*_j < T]}\left\{ -c^{\beta ^*_{j-1}}_{\beta ^*_{j}}(\tau ^*_j)+Y^{\tau ^*_j,z^*_{j},\beta ^*_{j}}_{\tau ^*_j}\right\} \Big | {\mathcal {F}}_s\bigg ], \end{aligned}
under $$\Vert \cdot \Vert _1$$, for each $$\tau ^*_{j-1}\le s\le T$$. By the definition of $$Y^{t,z,\mathbf{b}}_s$$ in (13) we have that, for each $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$,
\begin{aligned} Z^{t,z,\mathbf{b}}:=\Big (Y_s^{t,z,\mathbf{b}}+\int _0^s\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr:\, 0\le s\le T\Big ) \end{aligned}
is the smallest supermartingale that dominates the process
\begin{aligned}&\left( \int _0^s\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr+\mathbb {1}_{[s=T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \right. \\&\quad \left. + \mathbb {1}_{[s < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(s)+Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\beta },\beta }_s\right\} :\, 0\le s\le T\right) . \end{aligned}
For all $$M\ge 1$$, let $$(G_l^M)_{1\le l \le M}$$ be an $$\epsilon (M)$$-partition of $$D_\zeta ^\mathbf{b}$$ (with $$\epsilon (M)\rightarrow 0$$ as $$M\rightarrow \infty$$) and let $$(z_{l}^M)_{1\le j\le M}$$ be a sequence of points such that $$z_{l}^M\in G_l^M$$ for $$l=1,\ldots ,M$$. For $$M,N\ge 1$$ and $$s\ge \tau ^*_{j'}$$, define
\begin{aligned} {\hat{Y}}^{M,N}_s:=\sum _{\mathbf{b}\in {\mathcal {J}}}\mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\sum _{k=0}^{N-1}\mathbb {1}_{[kT/N\le \tau ^*_{j'} <(k+1)T/N]}\sum _{l=1}^M\mathbb {1}_{[z^*_{j'}\in G_l^M]} Y_s^{kT/N,z_{l}^M,\mathbf{b}}. \end{aligned}
Now, $$\mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\mathbb {1}_{[kT/N\le \tau ^*_{j'} <(k+1)T/N]}\mathbb {1}_{[z^*_{j'}\in G_l^M]}\Big (Y^{kT/N,z_{l}^M,\mathbf{b}}_s+\int _{\tau ^*_{j'}}^s\psi _{\mathbf{b}}(r,X_r,R(z_{l}^M+(r-kT/N)\mathbf{b}))dr\Big )$$ is the product of a $${\mathcal {F}}_{\tau _{j'}^*}$$-measurable positive r.v., $$\mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\mathbb {1}_{[kT/N\le \tau ^*_{j'} <(k+1)T/N]}\mathbb {1}_{[z^*_{j'}\in G_l^M]}$$, and a supermartingale, thus, it is a supermartingale for $$s\ge \tau ^*_{j'}$$. Hence, as
\begin{aligned}&\left( {\hat{Y}}^{M,N}_s+\sum _{k=0}^{N-1}\mathbb {1}_{[kT/N\le \tau ^*_{j'} <(k+1)T/N]}\sum _{l=1}^M \mathbb {1}_{[z^*_{j'}\in G_l^M]} \right. \\&\quad \left. \cdot \int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}(r,X_r,R(z_{l}^M+(r-kT/N)\beta ^*_{j'}))dr:\tau ^*_{j'}\le s\le T\right) \end{aligned}
is the sum of a finite number of supermartingales it is also a supermartingale.
By the continuity of $$Y^{t,z,\mathbf{b}}_s$$ in (tz) and the continuity of R and $$\psi _\mathbf{b}$$ we get
\begin{aligned}&Y_s^{\tau ^*_{j'},z^*_{j'},\beta ^*_{j'}}+\int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}(r,X_r,R(z^*_{j'} + (r-\tau ^*_{j'})\beta ^*_{j'}))dr \\&\quad =\mathop {\lim \,\inf }_{N,M\rightarrow \infty }\Big \{{\hat{Y}}^{M,N}_s+\sum _{k=0}^{N-1}\mathbb {1}_{[kT/N\le \tau ^*_{j'} <(k+1)T/N]}\sum _{l=1}^M \mathbb {1}_{[z^*_{j'}\in G_l^M]} \\&\qquad \cdot \int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}(r,X_r,R(z_{l}^M+(r-kT/N)\beta ^*_{j'}))dr\Big \}, \end{aligned}
under $$\Vert \cdot \Vert _1$$, for all $$s\in [\tau ^*_{j'},T]$$. For all $$\tau ^*_{j'}\le t\le s$$ we have
\begin{aligned}&\mathop {\lim \,\inf }_{N,M\rightarrow \infty }\Big \{{\hat{Y}}^{M,N}_t+\sum _{k=0}^{N-1}\mathbb {1}_{[kT/N\le \tau ^*_{j'}<(k+1)T/N]}\sum _{l=1}^M \mathbb {1}_{[z^*_{j'}\in G_l^M]} \\&\qquad \cdot \int _{\tau ^*_{j'}}^t\psi _{\beta ^*_{j'}}(r,X_r,R(z_{l}^M+(r-kT/N)\beta ^*_{j'}))dr\Big \} \\&\quad \ge \mathop {\lim \,\inf }_{N,M\rightarrow \infty }{\mathbb {E}}\bigg [{\hat{Y}}^{M,N}_s+\sum _{k=0}^{N-1}\mathbb {1}_{[kT/N\le \tau ^*_{j'}<(k+1)T/N]}\sum _{l=1}^M \mathbb {1}_{[z^*_{j'}\in G_l^M]} \\&\qquad \cdot \int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}(r,X_r,R(z_{l}^M+(r-kT/N)\beta ^*_{j'}))dr\Big |{\mathcal {F}}_t\bigg ] \\&\quad \ge {\mathbb {E}}\bigg [\mathop {\lim \,\inf }_{N,M\rightarrow \infty }{\hat{Y}}^{M,N}_s+\sum _{k=0}^{N-1}\mathbb {1}_{[kT/N\le \tau ^*_{j'} <(k+1)T/N]}\sum _{l=1}^M \mathbb {1}_{[z^*_{j'}\in G_l^M]} \\&\qquad \cdot \int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}(r,X_r,R(z_{l}^M+(r-kT/N)\beta ^*_{j'}))dr\Big |{\mathcal {F}}_t\bigg ]. \end{aligned}
where the first part follows from the supermartingale property and the second inequality follows from Fatou’s lemma. Hence, $$\Big (Y_s^{\tau ^*_{j'},z^*_{j'},\beta ^*_{j'}}+\int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}(r,X_r,$$ $$R(z^*_{j'}+(r-\tau ^*_{j'})\beta ^*_{j'}))dr:\, \tau ^*_{j'}\le s\le T\Big )$$ is a supermartingale that dominates
\begin{aligned}&\left( \int _{\tau ^*_{j'}}^s\psi _{\beta ^*_{j'}}\left( r,X_r,R(z^*_{j'}+(r-\tau ^*_{j'})\beta ^*_{j'})\right) dr \right. \nonumber \\&\quad +\mathbb {1}_{[s=T]} h_{\beta ^*_{j'}}\left( X_T,R(z^*_{j'}+(T-\tau ^*_{j'})\beta ^*_{j'})\right) \nonumber \\&\quad \left. + \mathbb {1}_{[s< T]}\max _{\beta \in {\mathcal {J}}^{-\beta ^*_{j'}}}\Big \{-c^{\beta ^*_{j'}}_{\beta }(s)+Y^{s,(z^*_{j'}+(t-\tau ^*_{j'})\beta ^*_{j'})\wedge \delta ^{\beta },\beta }_s\Big \}:\, \tau ^*_{j'}\le s\le T\right) . \end{aligned}
(17)
It remains to show that it is the smallest supermartingale with this property. Let $$(Z_s:\,0\le s\le T)$$ be a supermartingale that dominates (17) for all $$s\in [\tau _j,T]$$. Then for each $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$ and $$s\ge t$$, we have
\begin{aligned}&\mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\mathbb {1}_{[\tau ^*_{j'}=t]}\mathbb {1}_{[z^*_{j'}=z]}Z_s \\&\quad \ge \mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\mathbb {1}_{[\tau ^*_{j'}=t]}\mathbb {1}_{[z^*_{j'}=z]} \Big (\int _{t}^s\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr \\&\qquad +\mathbb {1}_{[s=T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \\&\qquad + \mathbb {1}_{[s < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\Big \{-c^{\mathbf{b}}_{\beta }(s)+Y^{s,(z+(s-t)\mathbf{b})\wedge \delta ^{\beta },\beta }_s\Big \}\Big ), \end{aligned}
which by (13) gives that
\begin{aligned}&\mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\mathbb {1}_{[\tau ^*_{j'}=t]}\mathbb {1}_{[z^*_{j'}=z]}Z_s \\&\quad \ge \mathbb {1}_{[\beta ^*_{j'}=\mathbf{b}]}\mathbb {1}_{[\tau ^*_{j'}=t]}\mathbb {1}_{[z^*_{j'}=z]} \Big (Y^{t,z,\mathbf{b}}_s+\int _t^s \psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr\Big ). \end{aligned}
Since this holds for all $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$ we get
\begin{aligned} Z_s\ge Y^{\tau ^*_{j'},z^*_{j'},\beta ^*_{j'}}_s+\int _{\tau ^*_{j'}}^s \psi \left( r,X_r,R(z^*_{j'}+(r-\tau ^*_{j'})\beta ^*_{j'})\right) dr \end{aligned}
for all $$s\ge \tau ^*_{j'}$$. Hence, $$\Big (Y^{\tau ^*_{j'},z^*_{j'},\beta ^*_{j'}}_s+\int _{\tau ^*_{j'}}^s \psi _{\beta ^*_{j'}}(r,X_r,R(z^*_{j'}+(r-\tau ^*_{j'})\beta ^*_{j'}))dr: \, \tau ^*_{j'}\le s\le T\Big )$$ is the Snell envelope of (17) and
\begin{aligned} Y_s^{\tau ^*_{j'},z^*_{j'},\beta ^*_{j'}}&={\mathbb {E}}\bigg [\int _s^{\tau ^*_{j'+1}\wedge T}\psi _{\beta ^*_{j'}}\left( r,X_r,R(z^*_{j'}+(r-\tau ^*_{j'})\beta ^*_{j'})\right) dr \\&\quad +\mathbb {1}_{[\tau ^*_{j'+1}\ge T]}h_{\beta ^*_{j'}}\left( X_T,R(z^*_{j'}+(T-\tau ^*_{j'})\beta ^*_{j'})\right) \\&\quad + \mathbb {1}_{[\tau ^*_{j'+1} < T]}\left\{ -c^{\beta ^*_{j'}}_{\beta ^*_{j'+1}}(\tau ^*_{j'+1})+Y^{\tau ^*_{j'+1},z^*_{j'+1},\beta ^*_{j'+1}}_{\tau ^*_{j'+1}}\right\} \Big | {\mathcal {F}}_s\bigg ] \end{aligned}
under $$\Vert \cdot \Vert _1$$. By induction we get that for each $$N\ge 0$$
\begin{aligned} Y^{0,0,0}_0&={\mathbb {E}}\bigg [\int _0^{\tau ^*_N \wedge T}\sum _{j=0}^{N\wedge N^*} \mathbb {1}_{[\tau ^*_{j}\le r< \tau ^*_{j+1}]} \psi _{\beta ^*_j}(r,X_r,R(z^*_j+(r-\tau ^*_j)\beta ^*_j))dr \\&\quad +\sum _{j=0}^{N\wedge N^*} \mathbb {1}_{[\tau ^*_{j}< T\le \tau ^*_{j+1}]}h_{\beta ^*_j}(X_T,R(z^*_j+(T-\tau ^*_j)\beta ^*_j)) \\&\quad \;-\sum _{j=1}^{N\wedge N^*}c^{\beta ^*_{j-1}}_{\beta ^*_{j}}(\tau ^*_j)+\mathbb {1}_{[\tau ^*_N < T]}Y^{\tau ^*_N,z^*_{N},\beta ^*_{N}}_{\tau ^*_N}\bigg ], \end{aligned}
where $$(\tau ^*_0,\beta ^*_0)=(0,\mathbf{0})$$. Letting $$N\rightarrow \infty$$ while assuming that $$u^*\in {\mathcal {U}}^f$$ we find that $$Y^{0,0,0}_0=J(u^*)$$.
It remains to show that the strategy $$u^*$$ is optimal. To do this we pick any other strategy $$\hat{u}:=(\hat{\tau }_1,\ldots ,\hat{\tau }_{\hat{N}};\hat{\beta }_1,\ldots ,\hat{\beta }_{\hat{N}})\in {\mathcal {U}}^f$$ and let $$(\hat{z}_j)_{1\le j\le \hat{N}}$$ be defined by the recursion $$\hat{z}_j:=(\hat{z}_{j-1}+(\hat{\tau }_j-\hat{\tau }_{j-1})\hat{\beta }_{j-1})\wedge \delta ^{\hat{\beta }_j}$$. By the definition of $$Y^{0,0,0}_0$$ in (13) we have
\begin{aligned} Y^{0,0,0}_0&\ge {\mathbb {E}}\bigg [\int _0^{{\hat{\tau }}_1\wedge T}\psi _0\left( r,X_r,0\right) dr \\&\quad +\mathbb {1}_{[{\hat{\tau }}_1\ge T]}h_0\left( X_T,0\right) + \mathbb {1}_{[{\hat{\tau }}_1< T]}\max _{\beta \in {\mathcal {J}}^{-0}}\left\{ -c^{0}_{\beta }({\hat{\tau }}_1)+Y^{{\hat{\tau }}_1,0,\beta }_{{\hat{\tau }}_1}\right\} \bigg ] \\&\ge {\mathbb {E}}\bigg [\int _0^{{\hat{\tau }}_1\wedge T}\psi _0\left( r,X_r,0\right) dr \\&\quad +\mathbb {1}_{[{\hat{\tau }}_1\ge T]}h_0\left( X_T,0\right) + \mathbb {1}_{[{\hat{\tau }}_1 < T]}\left\{ -c^{0}_{{\hat{\beta }}_1}({\hat{\tau }}_1)+Y^{{\hat{\tau }}_1,{\hat{z}}_1,{\hat{\beta }}_1}_{{\hat{\tau }}_1}\right\} \bigg ] \end{aligned}
but in the same way
\begin{aligned} Y^{{\hat{\tau }}_1,{\hat{z}}_1,{\hat{\beta }}_1}_{{\hat{\tau }}_1}&\ge {\mathbb {E}}\bigg [\int _{{\hat{\tau }}_1}^{{\hat{\tau }}_2\wedge T}\psi _{{\hat{\beta }}_1} \left( r,X_r,R({\hat{z}}_1+(r-{\hat{\tau }}_1){\hat{\beta }}_1)\right) dr \\&\quad \;+\mathbb {1}_{[{\hat{\tau }}_2 \ge T]}h_{{\hat{\beta }}_1}\left( X_T,R({\hat{z}}_1+(T-{\hat{\tau }}_1){\hat{\beta }}_1)\right) \\&\quad \; + \mathbb {1}_{[{\hat{\tau }}_2 < T]} \left\{ -c^{{\hat{\beta }}_1}_{{\hat{\beta }}_2}({\hat{\tau }}_2)+Y^{{\hat{\tau }}_2,{\hat{z}}_2,{\hat{\beta }}_2}_{{\hat{\tau }}_2}\right\} \Big |{\mathcal {F}}_{{\hat{\tau }}_1}\bigg ], \end{aligned}
$${\mathbb {P}}$$-a.s. By repeating this argument and using the dominated convergence theorem we find that $$J(u^*)\ge J({\hat{u}})$$ which proves that $$u^*$$ is in fact optimal and thus belongs to $${\mathcal {U}}^f$$. $$\square$$

### Remark 1

The main difference between the above theorem and Theorem 1 in Djehiche et al. (2009) is that, due to the fact that the state space trajectory depends on the control policy through the $$\zeta$$-component, we are forced consider an infinite family of processes rather than a q-tuple for some finite positive q.

### 3.2 Existence

Theorem 1 presumes existence of the families $$((Y^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$. To obtain a satisfactory solution to Problem 1, we thus need to establish existence. The general existence proof (see Carmona and Ludkovski 2008; Djehiche et al. 2009) goes by defining a sequence $$((Y^{t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })_{k\ge 0}$$ of families of processes as
\begin{aligned} Y^{t,z,\mathbf{b},0}_s&:={\mathbb {E}}\bigg [\int _s^T\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t))\right) dr \nonumber \\&\quad +\; h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \Big | {\mathcal {F}}_s\bigg ] \end{aligned}
(18)
and
\begin{aligned} Y^{t,z,\mathbf{b},k}_s&:=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_{s}}{\mathbb {E}}\bigg [\int _s^{\tau \wedge T}\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr \nonumber \\&\quad \;+\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \nonumber \\&\quad \;+\mathbb {1}_{[\tau < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(\tau )+Y^{\tau ,(z+(\tau -t)\mathbf{b})^+\wedge \delta ^{\beta },\beta ,k-1}_\tau \right\} \Big | {\mathcal {F}}_s\bigg ] \end{aligned}
(19)
for $$k\ge 1$$, and then showing that this sequence converges to a family $$(({\tilde{Y}}^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ of $${\mathcal {S}}^2$$-processes that satisfy the verification theorem. First we note that by letting $${\mathcal {U}}^k_t:=\{(\tau _1,\ldots ,\tau _{N};\beta _1,\ldots ,\beta _N)\in {\mathcal {U}}_t:\, N\le k\}$$ and using a reasoning similar to that in the proof of Theorem 1 it follows that
\begin{aligned} Y^{t,z,\mathbf{b},k}_s&=\mathop {{\mathrm{ess}}\sup }\limits _{u\in {\mathcal {U}}_s^k} {\mathbb {E}}\bigg [\int _s^{T}\psi _{\xi _{r}^\mathbf{b}}\left( r,X_r,R(\zeta ^{t,z,\mathbf{b}}_r)\right) dr \nonumber \\&\quad +h_{\xi _T^\mathbf{b}}\left( X_T,R(\zeta ^{t,z,\mathbf{b}}_T)\right) -\sum _{j=1}^N c^{\beta _{j-1}}_{\beta _j}(\tau _j)\Big |{\mathcal {F}}_s\bigg ], \end{aligned}
(20)
with $$\beta _0=\mathbf{b}$$.

### Proposition 2

For each $$k\ge 0$$ we have:
1. (a)

The process $$(Y^{t,z,\mathbf{b},k}_s:\,0\le s\le T)$$ belongs to $${\mathcal {S}}^2$$.

2. (b)

The family $$((Y^{t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })_{k\ge 0}$$ is continuous in (tz).

### Proof

We will need the continuity property in b) to show that $$(Y^{t,z,\mathbf{b},k}_s:\,0\le s\le T)$$ is continuous. We therefore start by showing that part b) holds. For any control $$u\in {\mathcal {U}}$$, we have
\begin{aligned} \sup _{s\in [0, T]}|R(\zeta ^{t,z,\mathbf{b}}_s(u))-R(\zeta ^{t',z',\mathbf{b}}_s(u))|\le n C^R \left( |z-z'|+|t-t'|\right) , \end{aligned}
$${\mathbb {P}}$$-a.s. Hence, with $$D^{2,\mathbf{b}}_p(\rho ):=\{(p,p')\in D^{\mathbf{b}}_p\times D^{\mathbf{b}}_p: |p-p'|\le n C^R\rho \}$$ we get for all $$k\ge 0$$,
\begin{aligned}&Y^{t,z,\mathbf{b},k}_s-Y^{t',z',\mathbf{b},k}_s \\&\quad \le \mathop {{\mathrm{ess}}\sup }\limits _{u\in {\mathcal {U}}^k_s} {\mathbb {E}}\bigg [\int _s^{T}\psi _{\xi _r}\left( r,X_r,R(\zeta ^{t,z,\mathbf{b}}_r)\right) -\psi _{\xi _r}\left( r,X_r,R(\zeta ^{t',z',\mathbf{b}}_r)\right) dr \\&\qquad +h_{\xi _T}\left( X_T,R(\zeta ^{t,z,\mathbf{b}}_T)\right) -h_{\xi _T}\left( X_T,R(\zeta ^{t',z',\mathbf{b}}_T)\right) \Big |{\mathcal {F}}_s\bigg ] \\&\quad \le {\mathbb {E}}\bigg [\int _0^{T}\max _{\mathbf{b}\in {\mathcal {J}}}\max _{(p,p')\in D^{2,\mathbf{b}}_p \left( |z-z'|+|t-t'|\right) } |\psi _{\mathbf{b}}\left( r,X_r,p\right) -\psi _{\mathbf{b}}\left( r,X_r,p'\right) |dr \\&\qquad +\max _{\mathbf{b}\in {\mathcal {J}}}\max _{(p,p')\in D^{2,\mathbf{b}}_p \left( |z-z'|+|t-t'|\right) } |h_{\mathbf{b}}\left( X_T,p\right) -h_{\xi _T}(X_T,p')|\Big |{\mathcal {F}}_s\bigg ], \end{aligned}
$${\mathbb {P}}$$-a.s. Using symmetry we find that the same inequality holds for $$Y^{t',z',\mathbf{b},k}_s-Y^{t,z,\mathbf{b},k}_s$$. Now, by Doob’s maximal inequality, there is a $$C>0$$ such that
\begin{aligned}&{\mathbb {E}}\bigg [\sup _{s\in [0,T]}|Y^{t',z',\mathbf{b},k}_s-Y^{t,z,\mathbf{b},k}_s|^2\bigg ] \\&\le C {\mathbb {E}}\bigg [\int _0^{T}\max _{\mathbf{b}\in {\mathcal {J}}}\max _{(p,p')\in D^{2,\mathbf{b}}_p \left( |z-z'|+|t-t'|\right) } |\psi _{\mathbf{b}}\left( r,X_r,p\right) -\psi _{\mathbf{b}}\left( r,X_r,p'\right) |^2 dr \\&\qquad +\max _{\mathbf{b}\in {\mathcal {J}}}\max _{(p,p')\in D^{2,\mathbf{b}}_p \left( |z-z'|+|t-t'|\right) } |h_{\mathbf{b}}\left( X_T,p\right) -h_{\xi _T}(X_T,p')|^2\bigg ] \end{aligned}
and the right hand side goes to 0 as $$(t',z')\rightarrow (t,z)$$ by (9) and (10).
To prove part (a) we need to show that the process $$(Y^{t,z,\mathbf{b},k}_s:\,0\le s\le T)$$ is square integrable and continuous. Square integrability can be deduced by noting that (20) and Doob’s maximal inequality implies that there is a constant $$C>0$$, such that,
\begin{aligned}&{\mathbb {E}}\left[ \sup _{s\in [0,T]}|Y^{t,z,\mathbf{b},k}_s|^2\right] \\&\le C{\mathbb {E}}\bigg [\bigg (\int _0^T \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p}\psi _{\mathbf{b}}(r,X_r,p)dr + \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p} h_{\mathbf{b}}(X_T,p)\bigg )^2 \bigg ] \\&\le 2C{\mathbb {E}}\left[ T\int _0^T \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p}|\psi _{\mathbf{b}}(r,X_r,p)|^2 dr + \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p} |h_{\mathbf{b}}(X_T,p)|^2 \right] \end{aligned}
for $$k\ge 0$$, and the right hand side is bounded by (5) and (6). It remains to show that for each $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$, the process $$(Y^{t,z,\mathbf{b},k}_s: 0\le s\le T)$$ is continuous for all $$k\ge 0$$. The proof will be based on an induction argument where, as an intermediate step, we will show that for each $$k\ge 0$$ and each $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$ the following holds:
1. (c)

For each $$\mathbf{b}'\in {\mathcal {J}}^{-\mathbf{b}}$$ the process $$(Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',k}_s: 0\le s\le T)$$ is continuous.

First consider the case $$k=0$$. We have
\begin{aligned} Y^{t,z,\mathbf{b},0}_s&=\,{\mathbb {E}}\bigg [\int _0^T\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr \\&\quad +h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \Big | {\mathcal {F}}_s\bigg ] \\&\quad \;-\int _0^s\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr. \end{aligned}
Hence, $$(Y^{t,z,\mathbf{b},0}_s: 0\le s\le T)$$ is the sum of a continuous process and a martingale w.r.t. the Brownian filtration and is thus continuous. Furthermore, for all $$s\le s'\le T$$ and all $$\mathbf{b}'\in {\mathcal {J}}$$,
\begin{aligned}&|Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_s - Y^{s',(z+(s'-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_{s'}| \\&\quad \le |Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_s - Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_{s'}| \\&\qquad + |Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_{s'} - Y^{s',(z+(s'-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_{s'}|. \end{aligned}
Hence, continuity of $$(Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',0}_s: 0\le s\le T)$$ follows from continuity of $$(Y^{t,z,\mathbf{b},0}_s: 0\le s\le T)$$ and continuity of $$\psi$$, h and R. Statements a–c) thus hold for $$k=0$$.
Moving on we assume that (a)–(c) hold for some $$k\ge 0$$. The process ($$Y^{t,z,\mathbf{b},k+1}_s + \int _0^s \psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr : 0\le s\le T$$) is the Snell envelope of the process
\begin{aligned}&\bigg (\int _0^s\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr+\mathbb {1}_{[s=T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \\&\quad + \mathbb {1}_{[s < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(s)+Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\beta },\beta ,k}_s\right\} :\, 0\le s\le T\bigg ). \end{aligned}
It is well known that the Snell envelope of a process $$(U_s:0\le s\le T)$$ is continuous if U only has positive jumps. Now, $$\Big (\int _0^s\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr:\, 0\le s\le T\Big )$$ is continuous and, since $$\Big (Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\beta },\beta ,k}_s:\, 0\le s\le T\Big )$$ was assumed continuous, for all $$\beta \in {\mathcal {J}}$$, in (c),
\begin{aligned}&\bigg (\mathbb {1}_{[s=T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \\&\quad + \mathbb {1}_{[s < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(s)+Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\beta },\beta ,k}_s\right\} :\, 0\le s\le T\bigg ) \end{aligned}
is continuous on [0, T) and may have a jump at $$\{T\}$$. By (2) any possible jump at time T is positive, hence, $$\Big (Y^{t,z,\mathbf{b},k+1}_s:0\le s\le T\Big )$$ is a continuous process.
By a similar argument, since $$(Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'},\mathbf{b}',k}_s + \int _0^s \psi _{\mathbf{b}'}\big (r,X_r,R((z+(s-t)\mathbf{b})\wedge \delta ^{\mathbf{b}'} + (r-s)\mathbf{b}')\big )dr: 0\le s\le T)$$ is the Snell envelope of the process
\begin{aligned}&\bigg (\int _0^s\psi _{\mathbf{b}'}\left( r,X_r,R((z+(s-t)\mathbf{b})\wedge \delta ^{\mathbf{b}'} + (r-s)\mathbf{b}')\right) dr \\&\quad +\mathbb {1}_{[s=T]}h_{\mathbf{b}}\left( X_T,R((z+(T-t)\mathbf{b})\wedge \delta ^{\mathbf{b}'})\right) \\&\quad + \mathbb {1}_{[s < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}'}_{\beta }(s)+Y^{s,(z+(s-t)\mathbf{b})^+\wedge \delta ^{\mathbf{b}'}\wedge \delta ^{\beta },\beta ,k}_s\right\} :\, 0\le s\le T\bigg ), \end{aligned}
(c) holds for $$k+1$$. But then (a)–(c) hold for $$k+1$$ as well. By an induction argument the proposition now follows. $$\square$$

Next we show that the limit family, $$\lim _{k\rightarrow \infty }((Y^{t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$, exists and satisfies the verification theorem.

### Theorem 2

The limit $$(({\tilde{Y}}^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }):=$$ $$\lim _{k\rightarrow \infty }((Y^{t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ exists $${\mathbb {P}}$$-a.s. as a pointwise limit. Furthermore, the limit family $$(({\tilde{Y}}^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ satisfies the verification theorem.

### Proof

We need to show that the limit family $$(({\tilde{Y}}^{t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ exists as a member of $${\mathcal {S}}^2$$, that it is continuous in (tz) and that it satisfies (13). This is done in four steps as follows.

(i) Convergence. Since $${\mathcal {U}}^k_t\subset {\mathcal {U}}^{k+1}_t$$ we have that, $${\mathbb {P}}$$-a.s.,
\begin{aligned} Y^{t,z,\mathbf{b},k}_s&\le Y^{t,z,\mathbf{b},k+1}_s\le {\mathbb {E}}\bigg [\int _0^T \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p} |\psi _{\mathbf{b}}(r,X_r,p)|dr \\&\quad + \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p} |h_{\mathbf{b}}(X_T,p)| \Big | {\mathcal {F}}_s\bigg ], \end{aligned}
where the right hand side is bounded $${\mathbb {P}}$$-a.s. by the estimates of Sect. 2. Hence, the sequence $$((Y^{t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ is increasing and $${\mathbb {P}}$$-a.s. bounded, thus, it converges $${\mathbb {P}}$$-a.s. for all $$s\in [0,T]$$.
(ii) Limit satisfies (13). Applying the convergence result to the right hand side of (19) and using (iv) of Proposition 2 in Djehiche et al. (2009) we find that
\begin{aligned} {\tilde{Y}}^{t,z,\mathbf{b}}_s&:=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_{s}} {\mathbb {E}}\bigg [\int _s^{\tau \wedge T}\psi _{\mathbf{b}}\left( r,X_r,R(z+(r-t)\mathbf{b})\right) dr \\&\quad +\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}\left( X_T,R(z+(T-t)\mathbf{b})\right) \\&\quad +\mathbb {1}_{[\tau < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(\tau )+{\tilde{Y}}^{\tau ,(z+(\tau -t)\mathbf{b})\wedge \delta ^{\beta },\beta }_\tau \right\} \Big | {\mathcal {F}}_s\bigg ] \end{aligned}
(iii) Limit in $${\mathcal {S}}^2$$. Using the same reasoning as above we find that there exists a constant $$C>0$$ such that
\begin{aligned}&{\mathbb {E}}\bigg [\sup _{s\in [0,T]}|{\tilde{Y}}^{t,z,\mathbf{b}}_s|^2\bigg ] \\&\quad \le C{\mathbb {E}}\bigg [2T\int _0^T \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p}|\psi _{\mathbf{b}}(r,X_r,p)|^2 dr + 2\max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D^{\mathbf{b}}_p} |h_{\mathbf{b}}(X_T,p)|^2 \bigg ], \end{aligned}
which is bounded by the estimates of Sect. 2. To prove continuity in s we note that $${\tilde{Y}}^{t,z,\mathbf{b}}_s+\int _0^s \psi _{\mathbf{b}}(r,X_r,R((z+(s-t)\mathbf{b})^+\wedge \delta ^\mathbf{b}))dr$$ is the limit of an increasing sequence of continuous supermartingales and thus càdlàg Karatzas and Shreve (1991). Now, for each $$\mathbf{b}\in {\mathcal {J}}$$ and each $$(t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta }$$ the processes $$\Big (\int _0^s \psi _{\mathbf{b}}(r,X_r,R((z+(s-t)\mathbf{b})^+\wedge \delta ^\mathbf{b}))dr: 0\le s\le T\Big )$$ are continuous. Hence, by the properties of the Snell envelope, if $${\tilde{Y}}^{t,z,\mathbf{b}}_s$$ has a (necessarily negative) jump at $$s_1\in [0,T]$$, then, for some $$\beta _1\in {\mathcal {J}}^{-\mathbf{b}}$$, $${\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1},\beta _1}_s$$ also has a jump at $$s_1$$ and $${\tilde{Y}}^{t,z,\mathbf{b}}_{s_1-}=-c^{\mathbf{b}}_{\beta _1}(s_1)+{\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1},\beta _1}_{s_1-}$$. But, if $${\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1},\beta _1}_s$$ has a (negative) jump at $$s_1$$, then for some $${\beta _2}\in {\mathcal {J}}^{-\mathbf{b}}$$, the process $${\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1} \wedge \delta ^{\beta _2},\beta _2}_{s}$$ will have a negative jump at $$s_1$$ and
\begin{aligned} {\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1},\beta _1}_{s_1-}=-c^{\beta _1}_{\beta _2}(s_1)+{\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1} \wedge \delta ^{\beta _2},\beta _2}_{s_1-}. \end{aligned}
Repeating this argument we get a sequence $$(\beta _k)_{k\ge 0}$$, with $$\beta _0=\mathbf{b}$$ and $$\beta _k\in {\mathcal {J}}^{-\beta _{k-1}}$$ for $$k\ge 1$$, such that for any $$j>k\ge 0$$ we have
\begin{aligned} {\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1} \wedge \ldots \wedge \delta ^{\beta _k},\beta _k}_{s_1-}&=-c^{\beta _k}_{\beta _{k+1}}(s_1)-\cdots -c^{\beta _{j-1}}_{\beta _{j}}(s_1) \\&\quad +{\tilde{Y}}^{s_1,(z+(s_1-t)\mathbf{b})^+\wedge \delta ^{\beta _1} \wedge \ldots \wedge \delta ^{\beta _j},\beta _j}_{s_1-}. \end{aligned}
Now, since $$(\bigwedge _{l=1}^k \delta ^{\beta _l})_{k\ge 1}$$ is a decreasing sequence that takes values in a finite set and $${\mathcal {J}}$$ is a finite set, there are $$j>k\ge 0$$ such that $$\bigwedge _{l=1}^j \delta ^{\beta _l}=\bigwedge _{l=1}^k \delta ^{\beta _l}$$ and $$\beta _j = \beta _k$$. But then
\begin{aligned} 0=-c^{\beta _k}_{\beta _{k+1}}(s_1)-\cdots -c^{\beta _{j-1}}_{\beta _{j}}(s_1) \end{aligned}
contradicting the fact that $$\min _{t\in [0,T]}c^{0}_{i}(t)+\min _{t\in [0,T]}c^{1}_{i}(t)>0$$ for all $$i\in \{1,\ldots ,n\}$$. Hence, $${\tilde{Y}}^{t,z,\mathbf{b}}_s$$ must be continuous and thus belongs to $${\mathcal {S}}^2$$.
(iv) Limit continuous in (tz). By the dominated convergence theorem we have
\begin{aligned}&\lim _{(t',z')\rightarrow (t,z)}{\mathbb {E}}\bigg [\sup _{s\in [0,T]}|Y^{t',z',\mathbf{b}}_s-Y^{t,z,\mathbf{b}}_s|\bigg ] \\&\quad =\lim _{(t',z')\rightarrow (t,z)}{\mathbb {E}}\bigg [\sup _{s\in [0,T]}\lim _{k\rightarrow \infty }|Y^{t',z',\mathbf{b},k}_s-Y^{t,z,\mathbf{b},k}_s|\bigg ] \\&\quad =\lim _{k\rightarrow \infty }\lim _{(t',z')\rightarrow (t,z)}{\mathbb {E}}\bigg [\sup _{s\in [0,T]}|Y^{t',z',\mathbf{b},k}_s-Y^{t,z,\mathbf{b},k}_s|\bigg ] = 0. \end{aligned}
This finishes the proof. $$\square$$

We have thus far derived a verification theorem for the solution of Problem 1, and shown that there exists a (unique) family of processes satisfying the verification theorem. To finish the solution of Problem 1 we show that the families of processes in the verification theorem define continuous value functions.

### 3.3 Value function representation

We first extend the definition of the families of processes in the verification theorem to a full state-feedback form by introducing general initial conditions as follows. For all $$(r,x)\in [0,T]\times {\mathbb {R}}^m$$ we let $$((Y^{r,x,t,z,\mathbf{b}}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })$$ be the family of processes that satisfies the verification theorem for the process $$(X_s^{r,x}:{0\le s\le T})$$ and let $$((Y^{r,x,t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })_{k\ge 0}$$ be the corresponding versions of $$((Y^{t,z,\mathbf{b},k}_s)_{0\le s\le T}: (t,z,\mathbf{b})\in {\mathcal {D}}_{\zeta })_{k\ge 0}$$ defined by (18) and (19) with X replaced by $$X^{r,x}$$. The following estimates hold:

### Proposition 3

There exists $$C^Y_1>0$$ such that, for each $$\mathbf{b}\in {\mathcal {J}}$$, we have
\begin{aligned} {\mathbb {E}}\left[ \sup _{s\in [0,T]}|Y^{r,x,t,z,\mathbf{b}}_s|^2\right] \le C_1^Y(1+|x|^{2\gamma }), \quad \forall \,(r,t,x,z)\in [0,T]^2\times {\mathbb {R}}^m\times D_\zeta ^\mathbf{b}. \end{aligned}
Furthermore, $$Y^{r,x,t,z,\mathbf{b}}_r$$ is deterministic and
\begin{aligned} |Y^{r,x,t,z,\mathbf{b}}_r-Y^{r',x',t',z',\mathbf{b}}_{r'}|\rightarrow 0, \quad \text {as } (r',x',t',z')\rightarrow (r,x,t,z). \end{aligned}

### Proof

For the first part we note that, again using Doob’s maximal inequality, there exists a $$C>0$$ such that
\begin{aligned}&{\mathbb {E}}\bigg [\sup _{s\in [0,T]}|Y^{r,x,t,z,\mathbf{b}}_s|^2\bigg ] \\&\quad \le C{\mathbb {E}}\bigg [\int _0^T \max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D_p^{\mathbf{b}}}|\psi _{\mathbf{b}}(v,X^{r,x}_v,p)|^2 dv+\max _{\mathbf{b}\in {\mathcal {J}}}\max _{p\in D_p^{\mathbf{b}}}|h_{\mathbf{b}}(X^{r,x}_T,p)|^2\bigg ] \\&\quad \le C(C^\psi _1 T+C^h_1)(1+|x|^{2\gamma }). \end{aligned}
For the second part we pick any control $$u=(\tau _1,\ldots ,\tau _{N};\beta _1,\ldots ,\beta _N)$$ $$\in {\mathcal {U}}_r$$ and let $$u'=(\tau _{l}\vee r',\tau _{l+1}\ldots ,\tau _{N};\beta _{l},\beta _{l+1},\ldots ,\beta _N)$$, where $$l:=\max \{j\ge 1:\tau _j\le r'\}\vee 1$$ with $$\max \{\emptyset \}$$=0. Then $$u'\in {\mathcal {U}}_{r'}$$ and we have
\begin{aligned} \sup _{s\in [0, T]}|R(\zeta ^{t,z,\mathbf{b}}_s(u))-R(\zeta ^{t',z',\mathbf{b}}_s(u'))|\le n C^R \left( |z-z'|+|r-r'|+|t-t'|\right) , \end{aligned}
$${\mathbb {P}}$$-a.s. and, by Lipschitz continuity of the $$c_i^{0}$$ and $$c_{i}^{1}$$, the switching costs obey
\begin{aligned} {\mathbb {E}}\left[ \sum _{j=1}^{N'}c^{\beta '_{j-1}}_{\beta '_j}(\tau '_j) - \sum _{j=1}^{N}c^{\beta _{j-1}}_{\beta _j}(\tau _j)\right]= & {} {\mathbb {E}}\bigg [\mathbb {1}_{[N>0]}c^{\mathbf{b}}_{\beta _l}(\tau '_1) - \sum _{j=1}^{l}c^{\beta _{j-1}}_{\beta _j}(\tau _j)\bigg ]\\\le & {} C^c|r-r'| \end{aligned}
for some $$C^c>0$$. Hence, since u was arbitrary we have
\begin{aligned}&Y^{r,x,t,z,\mathbf{b}}_r-Y^{r',x',t',z',\mathbf{b}}_{r'} \\&\quad \le \sup _{u\in {\mathcal {U}}}{\mathbb {E}}\bigg [\int _r^{r\vee r'}\psi _{\xi _s}\left( s,X_s^{r,x},R(\zeta ^{t,z,\mathbf{b}}_s(u))\right) ds \\&\qquad -\int _{r'}^{r\vee r'}\psi _{\xi _s}\left( s,X_s^{r',x'},R(\zeta ^{t',z',\mathbf{b}}_s(u'))\right) ds \\&\qquad +\int _{r\vee r'}^T \left\{ \psi _{\xi _s}\left( s,X_s^{r,x},R(\zeta ^{t,z,\mathbf{b}}_s(u))\right) -\psi _{\xi _s}\left( s,X_s^{r',x'},R(\zeta ^{t',z',\mathbf{b}}_s(u'))\right) \right\} ds \\&\qquad +h_{\xi _T}\left( X_T^{r,x},R(\zeta ^{t,z,\mathbf{b}}_T)(u)\right) -h_{\xi _T}\left( X_T^{r',x'},R(\zeta ^{t',z',\mathbf{b}}_T(u'))\right) \bigg ] \\&\qquad +C^c|r-r'|. \end{aligned}
Considering (5) we see that the first two integrals on the right hand side go to zero as $$r\rightarrow r'$$. By arguing as in the proof of Proposition 2 we find that the remainder goes to 0 as $$(r',x',t',z')\rightarrow (r,x,t,z)$$. Now, by symmetry this applies to $$Y^{r',x',t',z',\mathbf{b}}_{r'}-Y^{r,x,t,z,\mathbf{b}}_r$$ as well and the second inequality follows. $$\square$$
Repeated use of Theorem 8.5 in El-Karoui et al. (1997) shows that for $$k\ge 0$$, there exist functions $$(v_\mathbf{b}^{k})_{\mathbf{b}\in {\mathcal {J}}}$$ of polynomial growth, with $$v_\mathbf{b}^{k}:[0,T]\times {\mathbb {R}}^m \times [0,T]\times D_\zeta ^{\mathbf{b}}\rightarrow {\mathbb {R}}$$ such that
\begin{aligned} Y_s^{r,x,t,z,\mathbf{b},k}=v_\mathbf{b}^{k}(s,X^{r,x}_s,t,z),\quad r\le s\le T. \end{aligned}
Furthermore, by Theorem 8.5 in El-Karoui et al. (1997) and Proposition 2 the functions $$v_\mathbf{b}^{k}$$ are continuous. Repeating the steps in the proof of Theorem 2 we find that the sequences $$(v_\mathbf{b}^{k})^{k\ge 0}_{\mathbf{b}\in {\mathcal {J}}}$$ converges pointwise to functions $$v_\mathbf{b}^{k}:[0,T]\times {\mathbb {R}}^m \times [0,T]\times D_\zeta ^{\mathbf{b}}\rightarrow {\mathbb {R}}$$ and that
\begin{aligned} Y_s^{r,x,t,z,\mathbf{b}}=v_{\mathbf{b}}(s,X^{r,x}_s,t,z),\quad r\le s\le T. \end{aligned}
Now, by Proposition 3 the functions $$v_{\mathbf{b}}$$ are continuous and of polynomial growth. Finally, the verification theorem implies that the functions $$v_{\mathbf{b}}$$ are value functions for the stochastic control problem posed in Problem 1 in the sense that
\begin{aligned}&v_{\mathbf{b}}(r,x,t,z)=\sup _{\tau \in {\mathcal {T}}_t}{\mathbb {E}}\bigg [\int _r^{\tau \wedge T}\psi _{\mathbf{b}}\left( s,X_s^{r,x},R(z+(s-t)\mathbf{b})\right) ds \nonumber \\&\quad +\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}\left( X_T^{r,x},R(z+(T-t)\mathbf{b})\right) \nonumber \\&\quad +\mathbb {1}_{[\tau < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}}\left\{ -c^{\mathbf{b}}_{\beta }(\tau ) + v_{\beta }\left( \tau ,X_\tau ^{r,x},\tau ,(z+(\tau -t)\mathbf{b})^+\wedge \delta ^{\beta }\right) \right\} \bigg ]. \end{aligned}
(21)

## 4 Limited feedback

When searching for a numerical solution to Problem 1, by means of a lattice or a Monte Carlo approximation of the value function in (21), the curse of dimensionality will generally become apparent through an explosion in the computational burden as the number of units increase. To limit this effect we present an alternative, sub-optimal, scheme where only a part of the available state information is considered when making decisions. More precisely, any intervention at time t will be based on the assumption that $$\zeta _t=\delta ^{\xi _t}$$. This reduces the state back to the pre-augmentation state $$(t,X_t,\xi _t)$$ while retaining the Markov property, thus avoiding the most prominent fact that led to computational intractability in (21).

Assume that, at time t, the system is operated in mode $$\mathbf{b}\in {\mathcal {J}}$$, with $$\zeta _t=\delta ^{\mathbf{b}}$$, when one or more units are intervened on giving us the new mode $$\mathbf{b}'\in {\mathcal {J}}^{-\mathbf{b}}$$. The production in the period [tT] can then be written $$p^{\mathbf{b}'}(\cdot ,u)-{\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,\cdot ,u)$$ where $$u\in {\mathcal {U}}_t$$ is the control applied in [tT],
\begin{aligned} p_i^{\mathbf{b}'}(r,u)&:=\,\mathbb {1}_{[b'_i=1]} \bigg \{\mathbb {1}_{[t,\tau ^i_1)}(r){\bar{p}}_i+\sum _{j=1}^{\lfloor N_i/2 \rfloor }\mathbb {1}_{[\tau ^i_{2j},\tau ^i_{2j+1})}(r)R_i\left( (r-\tau ^i_{2j})\wedge \delta _i\right) \bigg \} \\&\quad +\mathbb {1}_{[b'_i=0]} \sum _{j=1}^{\lceil N_i/2 \rceil }\mathbb {1}_{[\tau ^i_{2j-1},\tau ^i_{2j})}(r)R_i\left( (r-\tau ^i_{2j-1})\wedge \delta _i\right) \end{aligned}
is the production in Unit i at time $$r\ge t$$ assuming that $$(\xi _t,\zeta _t)=(\mathbf{b}',\delta ^{\mathbf{b}'})$$, and
\begin{aligned} \left( {\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,r,u)\right) _i&:=\,\mathbb {1}_{[b'_i>b_i]}\mathbb {1}_{[t,\tau ^i_1)}(r)\{{\bar{p}}_i-R_{i}((r-t)\wedge \delta _i)\} \nonumber \\&=\,\mathbb {1}_{[b'_i>b_i]}\mathbb {1}_{[t,\tau ^i_1)}(r){\tilde{R}}_i(r-t)\} \end{aligned}
(22)
is the error made when assuming that $$\zeta _t=\delta ^{\mathbf{b}'}$$ instead of $$\delta ^{\mathbf{b}\wedge \mathbf{b}'}$$ for $$i=1,\ldots ,n$$. The revenue without switching costs for the same period, for $$X^{t,x}$$, can then be written
\begin{aligned}&{\mathbb {E}}\bigg [\int _t^T\psi _{\xi _r}\left( r,X^{t,x}_r,p^{\mathbf{b}'}(r,u)-{\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,r,u)\right) dr \nonumber \\&\quad + h_{\xi _T}\left( X^{t,x}_T,p^{\mathbf{b}'}(T,u)-{\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,T,u))\right) \bigg ]. \end{aligned}
(23)
If we define the delay revenue
\begin{aligned} {\varGamma }^{\mathbf{b}}_{\mathbf{b}'}(t,x,u)&:={\mathbb {E}}\bigg [\int _t^T\{\psi _{\xi _r}(r,X_r^{t,x},p^{\mathbf{b}'}(r,u)-{\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,r,u)) \\&\quad \;-\psi _{\xi _r}(r,X_r^{t,x},p^{\mathbf{b}'}(r,u))\}dr \\&\quad +h_{\xi _T}(X_T^{t,x},p^{\mathbf{b}'}(T,u)-{\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,T,u))-h_{\xi _T}(X_T^{t,x},p^{\mathbf{b}'}(T,u))\bigg ] \end{aligned}
then (23) can be written
\begin{aligned} {\mathbb {E}}\bigg [\int _t^T\psi _{\xi _r}\left( r,X^{t,x}_r,p^{\mathbf{b}'}(r,u)\right) dr + h_{\xi _T}\left( X^{t,x}_T,p^{\mathbf{b}'}(T,u))\right) +{\varGamma }^{\mathbf{b}}_{\mathbf{b}'}(t,x,u)\bigg ]. \end{aligned}
A naive approach would now be to define a sequence of processes $$(({\tilde{Y}}^{t,x,\mathbf{b},k}_s)_{0\le s\le T})_{k\ge 0}$$ recursively as
\begin{aligned} {\tilde{Y}}^{t,x,\mathbf{b},0}_s:={\mathbb {E}}\bigg [\int _s^{T}\psi _{\mathbf{b}}\left( r,X_r^{t,x},p^\mathbf{b}\right) dr+h_{\mathbf{b}}\left( X_T^{t,x},p^\mathbf{b}\right) \Big |{\mathcal {F}}_s\bigg ], \end{aligned}
and
\begin{aligned} {\tilde{Y}}^{t,x,\mathbf{b},k}_s&:=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_s} {\mathbb {E}}\bigg [\int _s^{\tau \wedge T}\psi _{\mathbf{b}}(r,X_r^{t,x},p^\mathbf{b})dr+\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}(X_T^{t,x},p^\mathbf{b}) \\&\quad +\mathbb {1}_{[\tau < T]}\max _{\beta \in {\mathcal {J}}^{-\mathbf{b}}} \Big \{-c^{\mathbf{b}}_{\beta }(\tau )+{\varGamma }^{\mathbf{b}}_{\beta } \left( \tau ,X^{t,x}_{\tau },{\tilde{u}}^{k-1}_{\tau ,X^{t,x}_{\tau },\beta }\right) +{\tilde{Y}}^{t,x,\beta ,k-1}_{\tau }\Big \}\Big |{\mathcal {F}}_s\bigg ], \end{aligned}
where $${\tilde{u}}^{k-1}_{\tau ,X^{t,x}_{\tau },\beta }$$ is the control resulting from the recursion starting with $${\tilde{Y}}^{t,x,\beta ,k-1}_{\tau }$$. As seen by the following example the above sequence might not even converge for a very simple example.

### Example 1

Consider the deterministic optimal switching problem
\begin{aligned} \max _{u}-\int _0^1 12(\zeta (t)-1/2)^2dt-N/2 \end{aligned}
where $$n=1$$ and $$\delta _1=1$$. The optimal solution to this problem starting in $$(\zeta (t),\xi (t))=(0,0)$$ and $$(\zeta (t),\xi (t))=(\delta _1,1)$$ is
\begin{aligned} Y^{t,0,0}_t&=-3(1-t)+\mathbb {1}_{[t<0.6736]}(-4(1-t)^3+6(1-t)^2-0.5), \\ Y^{t,\delta _1,1}_t&=-3(1-t)+\mathbb {1}_{[t<0.5]}(-4(1-t)^3+6(1-t)^2-1), \end{aligned}
with corresponding controls
\begin{aligned} u^{*,0}_t=\mathbb {1}_{[t<0.6736]}(t;1) \quad \mathrm{and}\quad u^{*,1}_t=\mathbb {1}_{[t<0.5]}(t,t;0,1). \end{aligned}
Applying the usual Picard iteration results in the sequence $${\tilde{Y}}^{t,0,0}_t={\tilde{Y}}^{t,1,0}_t={\tilde{Y}}^{t,1,1}_t=-3(1-t)$$ with corresponding controls $${\tilde{u}}^{0,0}_t={\tilde{u}}^{1,0}_t={\tilde{u}}^{1,1}_t=\emptyset$$, then $${\tilde{Y}}^{0,1}_t={\tilde{Y}}^{0,2}_t=Y^{t,0,0}_t$$ and $${\tilde{Y}}^{1,2}_t=Y^{t,\delta _1,1}_t$$ with the corresponding optimal controls. Now,
\begin{aligned} {\tilde{Y}}^{0,3}_t&=-3(1-t)+\mathbb {1}_{[t<0.3264]}(-4(1-t)^3+6(1-t)^2-1.5) \\&\quad +\mathbb {1}_{[0.5\le t<0.6736]}(-4(1-t)^3+6(1-t)^2-0.5) \end{aligned}
and $${\tilde{Y}}^{1,3}_t=Y^{t,\delta _1,1}_t$$. Moving on we have $${\tilde{Y}}^{0,4}_t={\tilde{Y}}^{0,3}_t$$ and $${\tilde{Y}}^{1,4}_t=-3(1-t)$$ after which we start over with $${\tilde{Y}}^{0,5}=Y^{t,0,0}_t$$, $${\tilde{Y}}^{1,5}_t=-3(1-t)$$, $${\tilde{Y}}^{0,6}_t={\tilde{Y}}^{0,3}_t$$, $${\tilde{Y}}^{2,6}_t=Y^{t,\delta _1,1}_t$$, $${\tilde{Y}}^{0,7}_t={\tilde{Y}}^{1,6}_t$$, $${\tilde{Y}}^{1,7}_t=-3(1-t)$$. We thus find that for $$k=0,1,2,\ldots$$, we have
• $${\tilde{Y}}^{0,4k+1}_t=Y^{t,0,0}_t$$ and $${\tilde{Y}}^{1,4k+1}_t=-3(1-t)$$

• $${\tilde{Y}}^{0,4k+2}_t=Y^{t,0,0}_t$$ and $${\tilde{Y}}^{1,4k+2}_t=Y^{t,\delta _1,1}_t$$

• $${\tilde{Y}}^{0,4k+3}_t=Y^{0,3}_t$$ and $${\tilde{Y}}^{1,4k+3}_t=Y^{t,\delta _1,1}_t$$

• $${\tilde{Y}}^{0,4k+4}_t=Y^{0,3}_t$$ and $${\tilde{Y}}^{1,4k+4}_t=-3(1-t)$$

We note that the sequence $$({\tilde{Y}}_t^{0,k},{\tilde{Y}}_t^{1,k})_{k\ge 0}$$ does not converge for all $$t\in [0,T]$$.
In the above example, prohibiting a switch from “on” to “off” whenever an immediate switch from “off” to “on” would follow would effectively preventing the cyclic behavior. More generally, we define an ordering $$\prec$$ of the set $${\mathcal {J}}$$ (with $$\mathbf{b}\nprec \mathbf{b}$$) and prohibit a switch from $$\mathbf{b}$$ to $$\mathbf{b}'$$, whenever $$\mathbf{b}'\prec \mathbf{b}$$ and an immediate sequence of switches would follow, taking us back to the state $$\mathbf{b}$$. We thus define the sequence of processes $$(({\hat{Y}}^{t,x,\mathbf{b},k}_s)_{0\le s\le T})_{k\ge 0}$$ recursively as
\begin{aligned} {\hat{Y}}^{t,x,\mathbf{b},0}_s := {\mathbb {E}}\bigg [\int _s^{T}\psi _{\mathbf{b}}\left( r,X_r^{t,x},p^\mathbf{b}\right) dr + h_{\mathbf{b}}\left( X_T^{t,x},p^\mathbf{b}\right) \Big |{\mathcal {F}}_s\bigg ], \end{aligned}
and
\begin{aligned}&{\hat{Y}}^{t,x,\mathbf{b},k}_s:=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_s} {\mathbb {E}}\bigg [\int _s^{\tau \wedge T}\psi _{\mathbf{b}}(r,X_r^{t,x},p^\mathbf{b})dr+\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}(X_T^{t,x},p^\mathbf{b}) \\&\quad +\mathbb {1}_{[\tau < T]}\max _{\beta \in A_{\tau ,X^{t,x}_{s,\mathbf{b}}}^{k-1}}\Big \{- c^{\mathbf{b}}_{\beta }(\tau )+{\varGamma }^{\mathbf{b}}_{\beta } \left( \tau ,X^{t,x}_{\tau },u^{\diamond ,k-1}_{\tau ,X^{t,x}_{\tau },\beta }\right) +{\hat{Y}}^{t,x,\beta ,k-1}_{\tau }\Big \}\Big |{\mathcal {F}}_s\bigg ], \end{aligned}
where $$A_{t,x,\mathbf{b}}^k:={\mathcal {J}}^{-\mathbf{b}}\setminus \{\mathbf{b}'\in {\mathcal {J}}^{-\mathbf{b}}: \exists j\in \{1,\ldots k\},\, \mathbf{b}'\prec \mathbf{b},\beta ^{t,x,\mathbf{b}',k}_j=\mathbf{b},\tau ^{t,x,\mathbf{b}',k}_j=t\}$$. The controls $$u^{\diamond ,k}_{t,x,\mathbf{b}}:=(\tau ^{t,x,\mathbf{b},k}_1,\ldots ,\tau ^{t,x,\mathbf{b},k}_N;\beta ^{t,x,\mathbf{b},k}_1,\ldots ,$$ $$\beta ^{t,x,\mathbf{b},k}_N)\in {\mathcal {U}}_t^k$$ are defined as follows. For each $${\mathbb {F}}$$-stopping time $$\tau$$ we let
\begin{aligned} D_\tau ^{t,x,\mathbf{b},k}&:=\inf \Big \{s\ge \tau : \, {\hat{Y}}^{t,x,\mathbf{b},k}_s \\&=\max _{\beta \in A_{{s,X^{t,x}_s,\mathbf{b}}}^{k-1}}\Big \{ -c^{\mathbf{b}}_{\beta }(s) +{\varGamma }^{\mathbf{b}}_{\beta }\left( s,X^{t,x}_{s},u^{\diamond ,k-1}_{s,X^{t,x}_{s},\beta }\right) +{\hat{Y}}^{t,x,\beta ,k-1}_{s}\Big \}\Big \}. \end{aligned}
We define the intervention times $$\tau ^{t,x,\mathbf{b},k}_1,\ldots ,\tau ^{t,x,\mathbf{b},k}_N$$ and the corresponding sequence of active units $$\beta ^{t,x,\mathbf{b},k}_1,\ldots ,\beta ^{t,x,\mathbf{b},k}_N$$ as $$\tau ^{t,x,\mathbf{b},k}_1(=\tau _1):=D_t^{t,x,\mathbf{b},k}$$,
\begin{aligned} \beta ^{t,x,\mathbf{b},k}_1(=\beta _1)\in \mathop {\arg \max }_{\beta \in A_{{\tau _1,X^{t,x}_{\tau _1,\mathbf{b}}}}^{k-1}}\Big \{-c^{\mathbf{b}}_{\beta }(\tau _1) + {\varGamma }^{\mathbf{b}}_{\beta }\left( \tau _1,X^{t,x}_{\tau _1}, u^{\diamond ,k-1}_{\tau _1,X^{t,x}_{\tau _1},\beta }\right) + {\hat{Y}}^{t,x,\beta ,k-1}_{\tau _1}\Big \} \end{aligned}
and continue with
\begin{aligned} \tau ^{t,x,\mathbf{b},k}_j:=\tau ^{\tau _1,X^{t,x}_{\tau _1},\beta _1,k-1}_{j-1} \quad \text {and} \quad \beta ^{t,x,\mathbf{b},k}_j:=\beta ^{\tau _1,X^{t,x}_{\tau _1},\beta _1,k-1}_{j-1} \end{aligned}
for $$j=2,\ldots ,N$$.

### Example 2

Returning to the problem from Example 1 we find that $${\hat{Y}}_t^{0,0}={\hat{Y}}_t^{1,0}=-3(1-t)$$ and $${\hat{Y}}_t^{0,1}=Y^{t,0,0}_t$$, with $$u^{\diamond ,1}_{t,0}=\mathbb {1}_{[t<0.6736]}(t;1)$$ and $${\hat{Y}}_t^{1,1}=-3(1-t)$$. Now, as in the previous example $${\hat{Y}}_t^{0,2}=Y^{t,0,0}_t$$, however, for all $$t<0.6736$$ we have $$A^{1,1}_t=\emptyset$$, thus, $${\hat{Y}}_t^{1,2}=-3(1-t)$$. This pattern continues and we find that
• $${\hat{Y}}^{0,k}_t=Y^{t,0,0}_t$$ and $${\hat{Y}}^{1,k}_t=-3(1-t)$$

for all $$k\ge 1$$.
For each $$\mathbf{b}\in {\mathcal {J}}$$ and each $$(t,x,z)\in [0,T]\times {\mathbb {R}}^m \times D_\zeta ^{\mathbf{b}}$$ we define the cost-to-go when applying the control $$u\in {\mathcal {U}}_t$$, as an extension4 of J in the formulation of Problem 1,
\begin{aligned} J^{\mathbf{b}}(t,x,z;u)&:={\mathbb {E}}\bigg [\int _t^T\psi _{\xi ^{\mathbf{b}}_s}\left( s,X^{t,x}_s,R(\zeta ^{t,z,\mathbf{b}}_s)\right) ds \\&\quad + h_{\xi ^{\mathbf{b}}_T}\left( X^{t,x}_T,R(\zeta ^{t,z,\mathbf{b}}_T)\right) -\sum _{j=1}^{N}c^{\beta _{j-1}}_{\beta _j}(\tau _j)\bigg ]. \end{aligned}

### Proposition 4

For $$(t,x,\mathbf{b})\in [0,T]\times {\mathbb {R}}^m\times {\mathcal {J}}$$ we have
\begin{aligned} {\hat{Y}}^{t,x,\mathbf{b},k}_t=J^{\mathbf{b}}\left( t,x,\delta ^{\mathbf{b}};u^{\diamond ,k}_{t,x,\mathbf{b}}\right) \end{aligned}
for all $$k\ge 0$$.

### Proof

From the proof of Theorem 1 in Djehiche et al. (2009) and the Markov property we get
\begin{aligned} {\hat{Y}}^{t,x,\mathbf{b},k}_s&=\mathop {{\mathrm{ess}}\sup }\limits _{\tau \in {\mathcal {T}}_s} {\mathbb {E}}\bigg [\int _s^{\tau \wedge T}\psi _{\mathbf{b}}\left( r,X_r^{t,x},p^\mathbf{b}\right) dr+\mathbb {1}_{[\tau \ge T]}h_{\mathbf{b}}\left( X_T^{t,x},p^\mathbf{b}\right) \\&\quad +\mathbb {1}_{[\tau< T]}\max _{\beta \in A_{\tau ,X^{t,x}_{\tau },\mathbf{b}}^{k-1}}\Big \{- c^{\mathbf{b}}_{\beta }(\tau )+{\varGamma }^{\mathbf{b}}_{\beta }\left( \tau ,X^{t,x}_{\tau },u^{\diamond ,k-1}_{\tau ,X^{t,x}_{\tau },\beta }\right) +{\hat{Y}}^{t,x,\beta ,k-1}_{\tau }\Big \}\Big |{\mathcal {F}}_s\bigg ] \\&={\mathbb {E}}\bigg [\int _s^{\tau ^{t,x,\mathbf{b},k}_1\wedge T}\psi _{\mathbf{b}}\left( r,X_r^{t,x},p^\mathbf{b}\right) dr \\&\quad +\mathbb {1}_{[\tau ^{t,x,\mathbf{b},k}_1\ge T]}h_{\mathbf{b}}\left( X_T^{t,x},p^\mathbf{b}\right) +\mathbb {1}_{[\tau ^{t,x,\mathbf{b},k}_1< T]}\Big \{- c^{\mathbf{b}}_{\beta ^{t,x,\mathbf{b},k}_1}(\tau ^{t,x,\mathbf{b},k}_1) \\&\quad \left. \left. \left. +{\varGamma }^{\mathbf{b}}_{\beta ^{t,x,\mathbf{b},k}_1}\left( \tau ^{t,x,\mathbf{b},k}_1,X^{t,x}_{\tau ^{t,x,\mathbf{b},k}_1}, u^{\diamond ,k-1}_{\tau ^{t,x,\mathbf{b},k}_1,X^{t,x}_{\tau ^{t,x,\mathbf{b},k}_1},\beta ^{t,x,\mathbf{b},k}_1}\right) +{\hat{Y}}^{t,x,\beta ^{t,x,\mathbf{b},k}_1,k-1}_{\tau ^{t,x,\mathbf{b},k}_1}\right\} \right| {\mathcal {F}}_s\right] \\&=\ldots = \\&={\mathbb {E}}\bigg [\int _s^{T}\sum _{j=0}^N \mathbb {1}_{[\tau ^{t,x,\mathbf{b},k}_{j}\le r < \tau ^{t,x,\mathbf{b},k}_{j+1}]}\psi _{\beta ^{t,x,\mathbf{b},k}_{j}}\left( r,X_r^{t,x},p^{\beta ^{t,x,\mathbf{b},k}_{j}}\right) dr \\&\quad + h_{\beta ^{t,x,\mathbf{b},k}_{N}}\left( X_T^{t,x},p^{\beta ^{t,x,\mathbf{b},k}_{N}}\right) + \sum _{j=1}^N\Big \{- c^{\beta ^{t,x,\mathbf{b},k}_{j-1}}_{\beta ^{t,x,\mathbf{b},k}_j}(\tau ^{t,x,\mathbf{b},k}_j) \\&\quad + {\varGamma }^{\beta ^{t,x,\mathbf{b},k}_{j-1}}_{\beta ^{t,x,\mathbf{b},k}_j} \Big (\tau ^{t,x,\mathbf{b},k}_j,X^{t,x}_{\tau ^{t,x,\mathbf{b},k}_j}, u^{\diamond ,k-j}_{\tau ^{t,x,\mathbf{b},k}_j,X^{t,x}_{\tau ^{t,x,\mathbf{b},k}_j},\beta ^{t,x,\mathbf{b},k}_j}\Big )\Big \}\Big |{\mathcal {F}}_s\bigg ] \\&=J^{\mathbf{b}}\left( s,X_s^{t,x},\delta ^{\mathbf{b}};u^{\diamond ,k}_{s,X_s^{t,x},\mathbf{b}}\right) , \end{aligned}
with $$(\tau ^{t,x,\mathbf{b},k}_{0},\beta ^{t,x,\mathbf{b},k}_{0})=(0,\mathbf{b})$$. $$\square$$

Consider now the problem of finding the control $$u^*$$ that maximizes (1) over all controls in $${\mathcal {U}}$$ when, for each $$b\in {\mathcal {J}}$$, $$\psi _\mathbf{b}:[0,T]\times {\mathbb {R}}^k\times D_p\rightarrow {\mathbb {R}}$$ and $$h_\mathbf{b}$$ are polynomials of degree two in p, i.e.
\begin{aligned} \psi _{\mathbf{b}}(t,x,p)&=\psi _{\mathbf{b}}^0(t,x)+(\psi _{\mathbf{b}}^1(t,x))^\top p+p^\top \psi _{\mathbf{b}}^2(t,x) p, \\ h_{\mathbf{b}}(x,p)&=h_{\mathbf{b}}^0(x)+(h_{\mathbf{b}}^1(x))^\top p+p^\top h_{\mathbf{b}}^2(x) p, \end{aligned}
where $$a^\top$$ is the transpose of the vector a and for all $$\mathbf{b}\in {\mathcal {J}}$$ the functions $$\psi _{\mathbf{b}}^0:[0,T]\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}$$ and $$h_{\mathbf{b}}^0:{\mathbb {R}}^m\rightarrow {\mathbb {R}}$$ and the components of $$\psi _{\mathbf{b}}^1:[0,T]\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^n$$, $$\psi _{\mathbf{b}}^2:[0,T]\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^{n\times n}$$, $$h_{\mathbf{b}}^1:{\mathbb {R}}^m\rightarrow {\mathbb {R}}^n$$ and $$h_{\mathbf{b}}^2:{\mathbb {R}}^m\rightarrow {\mathbb {R}}^{n\times n}$$ are all locally Lipschitz continuous and of polynomial growth. Furthermore, we assume that the matrices $$\psi _{\mathbf{b}}^2(t,x)$$ and $$h_{\mathbf{b}}^2(x)$$ are both symmetric for all $$(t,x)\in [0,T]\times {\mathbb {R}}^m$$. The delay revenue can then be written
\begin{aligned} {\varGamma }^{\mathbf{b}}_{\mathbf{b}'}(t,x,u)&={\mathbb {E}}\bigg [\int _t^T\Big \{-\Big ((\psi _{\xi ^{\mathbf{b}'}_r}^1(r,X_r^{t,x}))^\top \nonumber \\&\quad +2(p^{\mathbf{b}'}(r,u))^\top \psi _{\xi ^{\mathbf{b}'}_r}^2(r,X_r^{t,x})\Big ){\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,r,u)\nonumber \\&\quad +({\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,r,u))^\top \psi ^2_{\xi ^{\mathbf{b}'}_r}(r,X_r^{t,x}){\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,r,u)\Big \}dr\nonumber \\&\quad \;-\left( h^1_{\xi ^{\mathbf{b}'}_T}(X_T^{t,x})+2(p^{\mathbf{b}'}(T,u))^\top h^2_{\xi ^{\mathbf{b}'}_T}(X_T^{t,x})\right) {\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,T,u)\nonumber \\&\quad +({\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,T,u))^\top h^2_{\xi ^{\mathbf{b}'}_T}(X_T^{t,x}){\tilde{R}}_{\mathbf{b},\mathbf{b}'}(t,T,u)\bigg ]. \end{aligned}
(24)
Exploiting the simple structure of this formulation, we will in what follows show that efficient numerical algorithms can be built to approximate the expected revenue and the corresponding control $$u^\diamond$$.

### 4.2 Numerical solution scheme

In this section we present a numerical scheme that approximates $$J^{\mathbf{b}}\left( t,x,\delta ^{\mathbf{b}};u^{\diamond ,k}_{t,x,\mathbf{b}}\right)$$ when the $$\psi _{\mathbf{b}}$$ and the $$h_{\mathbf{b}}$$ are quadratic polynomials in p.

We start by going from continuous to discrete time by introducing the grid $${\varPi }=\{t_0,t_1,\ldots ,t_{N_{{\varPi }}}\}$$, with $$t_l= l{\varDelta } t$$ for $$l=0,\ldots , N_{{\varPi }}$$, where $${\varDelta } t=T/N_{{\varPi }}$$. To get a discrete time problem we apply a Bermudan options approximation and reduce the set of stopping times in the admissible controls by restricting interventions to grid points, i.e. for all discretized intervention times $${\bar{\tau }}_j$$ we have $${\bar{\tau }}_j\in {\varPi }$$.

Let $${\bar{u}}^{\diamond }_{t,x,\mathbf{b}}:=({\bar{\tau }}^{t,x,\mathbf{b}}_1,\ldots ,{\bar{\tau }}^{t,x,\mathbf{b}}_N;\beta ^{t,x,\mathbf{b}}_1,\ldots ,\beta ^{t,x,\mathbf{b}}_N)$$, be the discrete-time version of the limited feedback control proposed above and let $$({\bar{\xi }}^{t,x,\mathbf{b}}_{s}:s\in {\varPi }\cap [t,T])$$ be the corresponding evolution of the operating mode. We define the discrete time value function
\begin{aligned} {\hat{v}}^{{\varPi }}_\mathbf{b}(t_l,x)&:={\mathbb {E}}\bigg [\sum _{k=l}^{N_{\varPi }-1} \psi _{{\bar{\xi }}^{t_l,x,\mathbf{b}}_{t_k}}\left( t_k,X_{t_k}^{t_l,x},p^\mathbf{b}(t_k,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}})\right) {\varDelta } t \\&\quad +h_{{\bar{\xi }}^{t_l,x,\mathbf{b}}_{T}}\left( X_{T}^{t_l,x},p^\mathbf{b}(T,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}})\right) -\sum _{j=1}^N c^{\beta ^{t_l,x,\mathbf{b}}_{j-1}}_{\beta ^{t_l,x,\mathbf{b}}_j}({\bar{\tau }}^{t_l,x,\mathbf{b}}_j)\bigg ], \end{aligned}
with $$({\bar{\tau }}^{t_l,x,\mathbf{b}}_{0},\beta ^{t_l,x,\mathbf{b}}_0)=(0,\mathbf{b})$$. Then the functions $${\hat{v}}^{{\varPi }}_\mathbf{b}:{\varPi }\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}$$ satisfy the recursion
\begin{aligned} {\hat{v}}^{{\varPi }}_\mathbf{b}(T,x)&=h_{\mathbf{b}}(x,p^{\mathbf{b}}), \end{aligned}
(25)
\begin{aligned} {\hat{v}}^{{\varPi }}_\mathbf{b}(t_l,x)&=\max \limits _{\beta \in A_{t_l,x,\mathbf{b}}}\Big \{\psi _{\beta }(t_l,x,p^{\mathbf{b}}\wedge p^{\beta }){\varDelta } t-c^{\mathbf{b}}_{\beta }(t_{l})+{\mathbb {E}}\Big [{\hat{{\varGamma }}}^{\mathbf{b}}_{\beta }(t_{l+1},X^{t_l,x}_{t_{l+1}})\nonumber \\&\quad +{\hat{v}}^{{\varPi }}_\beta (t_{l+1},X^{t_l,x}_{t_{l+1}}) \Big ]\Big \}, \end{aligned}
(26)
where, for $$\mathbf{b},\mathbf{b}'\in {\mathcal {J}}$$, $${\hat{{\varGamma }}}^{\mathbf{b}}_{\mathbf{b}'}:{\varPi }\times {\mathbb {R}}^m \rightarrow {\mathbb {R}}$$ is the discrete-time delay revenue and $$A_{t_l,x,\mathbf{b}}$$ is the switching set.
At each time step, starting at $$t_l=T$$ and moving backwards, we obtain the expected revenue to-go by solving (25) and (26). This gives us the $${\mathcal {F}}_{t_l}$$-measurable optimal actions as a selection of
\begin{aligned} \nonumber&\beta ^*_{t_l,x,\mathbf{b}}\in \mathop {\arg \max } \limits _{\beta \in A_{t_l,x,\mathbf{b}}\cup \{\mathbf{b}\}}\Big \{\psi _{\beta }(t_l,x,p^{\mathbf{b}}\wedge p^{\beta }){\varDelta } t-c^{\mathbf{b}}_{\beta }(t_{l})\\&\quad +{\mathbb {E}}\Big [{\hat{{\varGamma }}}^{\mathbf{b}}_{\beta }(t_{l+1},X^{t_l,x}_{t_{l+1}})+{\hat{v}}^{{\varPi }}_\beta (t_{l+1},X^{t_l,x}_{t_{l+1}}) \Big ]\Big \}. \end{aligned}
(27)
When defining the sets $$A_{t_l,x,\mathbf{b}}$$ we need to be careful as the optimal action at time $$t_l$$ depends on the $$A_{t_l,x,\mathbf{b}}$$ and vice versa. We let $$\mathbf{b}_1,\ldots ,\mathbf{b}_{2^n}$$ be an ordered sequence of the elements of $${\mathcal {J}}$$, i.e. $$\mathbf{b}_i\prec \mathbf{b}_j$$ whenever $$i<j$$. First, it is obvious that $$A_{t_l,x,\mathbf{b}_1}={\mathcal {J}}^{-\mathbf{b}_1}$$. For $$i>1$$ we will build the sets $$A_{t_l,x,\mathbf{b}_i}$$ by removing appropriate subsets from $${\mathcal {J}}^{-\mathbf{b}_i}$$. The sets to be removed will be defined in a recursive manner. We start with $$A^c_{t_l,x,\mathbf{b}_1,1}:=\{\beta ^*_{t_l,x,\mathbf{b}_1}\}$$ and for each $$j> 1$$ we define the sets $$A^c_{t_l,x,\mathbf{b}_i,j}$$ recursively for all $$1\le i<j$$ as
\begin{aligned} A^c_{t_l,x,\mathbf{b}_i,j}:=\left\{ \begin{array}{ll} A^c_{t_l,x,\mathbf{b}_i,j-1}, &{} \quad \mathrm{if}\, \mathbf{b}_j\notin A^c_{t_l,x,\mathbf{b}_i,j-1}, \\ A^c_{t_l,x,\mathbf{b}_i,j-1} \cup \{\beta ^*_{t_l,x,\mathbf{b}_j}\},&{}\quad \mathrm{if}\, \mathbf{b}_j\in A^c_{t_l,x,\mathbf{b}_i,j-1},\mathbf{b}_j\prec \beta ^*_{t_l,x,\mathbf{b}_j} \\ A^c_{t_l,x,\mathbf{b}_i,j-1} \cup A^c_{t_l,x,\beta ^*_{t_l,x,\mathbf{b}_j},j-1},&{}\quad \mathrm{if}\, \mathbf{b}_j\in A^c_{t_l,x,\mathbf{b}_i,j-1},\beta ^*_{t_l,x,\mathbf{b}_j}\prec \mathbf{b}_j \end{array}\right. \end{aligned}
and let $$A^c_{t_l,x,j,j}:=\{\beta ^*_{t_l,x,\mathbf{b}_j}\}$$. The set $$A^c_{t_l,x,\mathbf{b}_i,j}$$ then contains all members of $$\{\mathbf{b}_1,\ldots ,\mathbf{b}_j\}$$ that it is optimal to switch to from $$\mathbf{b}_i$$ at time $$t_l$$ when $$X_{t_l}=x$$, without passing through the set $$\{\mathbf{b}_{j+1},\ldots ,\mathbf{b}_{2^n}\}$$. This immediately gives $$A_{t_l,x,\mathbf{b}_j}:={\mathcal {J}}^{-\mathbf{b}_j}\setminus \{\mathbf{b}_i: \mathbf{b}_j\in A^c_{t_l,x,\mathbf{b}_i,j-1}\}$$.
We move on to the discrete-time delay revenue $${\hat{{\varGamma }}}$$. For each $$\mathbf{b},\mathbf{b}'\in {\mathcal {J}}$$ we have
\begin{aligned} {\hat{{\varGamma }}}^{\mathbf{b}}_{\mathbf{b}'}(t_l,x)&=\sum _{k={l}}^{N_{\varPi }}\left\{ -\sum _{i\in {\mathcal {I}}(\mathbf{b}')\setminus {\mathcal {I}}(\mathbf{b})} {\varLambda }^1_{\mathbf{b}',i}(t_{l-1},x;t_k){\tilde{R}}_i(t_k-t_{l-1}) \right. \nonumber \\&\quad \left. + \sum _{i,j\in {\mathcal {I}}(\mathbf{b}')\setminus {\mathcal {I}}(\mathbf{b})}{\varLambda }^2_{\mathbf{b}',i,j}(t_{l-1},x;t_k){\tilde{R}}_i(t_k-t_{l-1}){\tilde{R}}_j(t_k-t_{l-1})\right\} {\varDelta } t \end{aligned}
(28)
with, for all $$i\in {\mathcal {I}}(\mathbf{b}')$$,
\begin{aligned} {\varLambda }^1_{\mathbf{b}',i}(t_l,x;t_k)&:={\mathbb {E}}\left[ \mathbb {1}_{[{\bar{\tau }}_1^{i,t_l,x,\mathbf{b}'}>t_k]}\left\{ \left( \psi _{{\bar{\xi }}^{t_l,x,\mathbf{b}'}_{t_k}}^1\left( t_k,X_{t_k}^{t_l,x}\right) \right) _i \right. \right. \nonumber \\&\quad \left. \left. +2\left( \left( p^{\mathbf{b}'}\left( t_k,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}'}\right) \right) ^\top \psi ^2_{{\bar{\xi }}^{t_l,x,\mathbf{b}'}_{t_k}}\left( t_k,X_{t_k}^{t_l,x}\right) \right) _i\right\} \right] , \end{aligned}
(29)
\begin{aligned} {\varLambda }^1_{\mathbf{b}',i}(t_l,x;T)&:={\mathbb {E}}\Big [\mathbb {1}_{[{\bar{\tau }}_1^{i,t_l,x,\mathbf{b}'}>T]}\big \{(h_{{\bar{\xi }}^{t_l,x,\mathbf{b}'}_{T}}^1(X_{T}^{t_l,x}))_i \nonumber \\&\quad \left. \left. +2\left( \left( p^{\mathbf{b}'}\left( T,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}'}\right) \right) ^\top h^2_{{\bar{\xi }}^{t_l,x,\mathbf{b}'}_{T}}\left( X_{T}^{t_l,x}\right) \right) _i\right\} \right] \end{aligned}
(30)
and, for all $$i,j\in {\mathcal {I}}(\mathbf{b}')$$,
\begin{aligned} {\varLambda }^2_{\mathbf{b}',i,j}(t_l,x;t_k):= & {} {\mathbb {E}}\left[ \mathbb {1}_{[{\bar{\tau }}_1^{i,t_l,x,\mathbf{b}'}>t_k]} \mathbb {1}_{[{\bar{\tau }}_1^{j,t_l,x,\mathbf{b}'}>t_k]} \left\{ \left( \psi ^2_{{\bar{\xi }}^{t_l,x,\mathbf{b}'}_{t_k}}(t_k,X_{t_k}^{t_l,x})\right) _{i,j}\right\} \right] ,\nonumber \\ \end{aligned}
(31)
\begin{aligned} {\varLambda }^2_{\mathbf{b}',i,j}(t_l,x;T):= & {} {\mathbb {E}}\left[ \mathbb {1}_{[{\bar{\tau }}_1^{i,t_l,x,\mathbf{b}'}>T]} \mathbb {1}_{[{\bar{\tau }}_1^{j,t_l,x,\mathbf{b}'}>T]} \left\{ \left( h^2_{{\bar{\xi }}^{t_l,x,\mathbf{b}'}_{T}}(X_T^{t,x})\right) _{i,j}\right\} \right] . \end{aligned}
(32)
We note that $$(\psi _{\beta }(t_l,x,p^{\mathbf{b}}\wedge p^{\beta })-\psi _{\beta }(t_l,x,p^{\beta })){\varDelta } t+{\mathbb {E}}\left[ {\hat{{\varGamma }}}^{\mathbf{b}}_{\mathbf{b}'}(t_{l+1},X^{t_l,x}_{t_{l+1}})\right]$$ is the discrete time version of $${\varGamma }^{\mathbf{b}}_{\mathbf{b}'}(t_l,x,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}})$$. The first term on the right hand side of (28) represents the part of (24) that is linear in $${\tilde{R}}_{\mathbf{b},\mathbf{b}'}$$ and the second term of (28) represents the part of (24) that is quadratic in $${\tilde{R}}_{\mathbf{b},\mathbf{b}'}$$. Equations (29)–(32) then follow immediately from the definition of $${\tilde{R}}_{\mathbf{b},\mathbf{b}'}$$ in (22).

In each $$t_l\in {\varPi }\setminus \{T\}$$ the recursion (26) evaluates the optimal action to take at the present time, take the action and move to the next time $$t_{l+1}\in {\varPi }$$ where the process is repeated. Having arrived at the conclusion that $$\beta \in {\mathcal {J}}$$ is the optimal action, we know that the present production is $$p^{\mathbf{b}}\wedge p^\beta$$ as turnoffs are immediate while increasing the output requires ramping. This is why $$\psi$$ is evaluated in $$p^{\mathbf{b}}\wedge p^\beta$$ and the delay revenue $${\hat{{\varGamma }}}$$ starts at time $$t_{l+1}$$.

#### 4.2.1 Recursions for $${\varLambda }^1_{\mathbf{b},i}$$ and $${\varLambda }^2_{\mathbf{b},i,j}$$

From (27) we deduce that the intervention times satisfy $$\{{\bar{\tau }}_1^{i,t_l,x,\mathbf{b}}>t_l\}=\{(\beta ^*_{t_l,x,\mathbf{b}})_i= b_i\}$$ for $$i=1,\ldots ,n$$. As we will see, knowledge of whether these events occur is enough to compute $${\varLambda }^1_{\mathbf{b},i}$$ and $${\varLambda }^2_{\mathbf{b},i,j}$$ in a recursive manner.

Let us start with the simpler $${\varLambda }^2_{\mathbf{b},i,j}$$, where $$\mathbf{b}\in {\mathcal {J}}$$ and $$i,j\in {\mathcal {I}}(\mathbf{b})$$. First, if any of the events $$\{{\bar{\tau }}_1^{i,t_l,x,\mathbf{b}}=t_l\}$$ and $$\{{\bar{\tau }}_1^{j,t_l,x,\mathbf{b}}=t_l\}$$ occur, then (31) and (32) immediately give $${\varLambda }^2_{\mathbf{b},i,j}(t_l,x;t_k)=0$$ for all $$t_k\in {\varPi }$$, with $$t_k\ge t_l$$.

Assume instead that $${\bar{\tau }}_1^{i,t_l,x,\mathbf{b}}>t_l$$ and $${\bar{\tau }}_1^{j,t_l,x,\mathbf{b}}>t_l$$, $${\mathbb {P}}$$-a.s. Then5
\begin{aligned} {\varLambda }^2_{\mathbf{b},i,j}(t_l,x;t_l)=\left\{ \begin{array}{ll} (\psi ^2_{\beta ^*}(t_l,x))_{i,j}, &{} \quad \text {for }t_l\in {\varPi }\setminus \{T\}, \\ (h^2_{\mathbf{b}}(x))_{i,j}, &{} \quad \text {for }t_l=T, \end{array}\right. \end{aligned}
and
\begin{aligned} {\varLambda }^2_{\mathbf{b},i,j}(t_l,x;t_k)={\mathbb {E}}\big [{\varLambda }^2_{\beta ^*,i,j}(t_{l+1},X_{t_{l+1}}^{t_l,x};t_k)\big ],\quad \text {for }t_k>t_l. \end{aligned}
For $${\varLambda }^1_{\mathbf{b},i}$$ the situation is just slightly more involved, as these depend on future values of the optimal output vector $$p^{\mathbf{b}}(t_k,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}})$$. As above we note that, whenever $${\bar{\tau }}_1^{i,t_l,x,\mathbf{b}}=t_l$$, Eqs. (29) and (30) give $${\varLambda }^1_{\mathbf{b},i}(t_l,x;t_k)=0$$ for all $$t_k\ge t_l$$.
Let us thus assume that $${\bar{\tau }}_1^{i,t_l,x,\mathbf{b}}>t_l$$. From (29) and (30) we note that we must have
\begin{aligned} {\varLambda }^1_{\mathbf{b},i}(t_l,x;t_l)=\left\{ \begin{array}{ll} (\psi _{\beta ^*}^1(t_k,x) + 2(p^{\mathbf{b}}\wedge p^{\beta ^*})^\top \psi ^2_{\beta ^*}(t_k,x))_i, &{} \quad \text {for }t_l\in {\varPi }\setminus \{T\}, \\ (h_{\mathbf{b}}^1(x) + 2(p^{\mathbf{b}})^\top h^2_{\mathbf{b}}(x))_i, &{}\quad \text {for }t_l=T. \end{array}\right. \end{aligned}
In the recursion for $${\varLambda }^1_{\mathbf{b},i}(t_l,x;t_k)$$ we have to consider the fact that $$p^{\beta ^*}(t_k,{\bar{u}}^{\diamond }_{t_l,x,\beta ^*})$$ depends on future control actions. In particular $$(p^{\mathbf{b}}(t_k,{\bar{u}}^{\diamond }_{t_l,x,\mathbf{b}}))_j$$ $$=(p^{\beta ^*}(t_k,{\bar{u}}^{\diamond }_{t_l,x,\beta ^*}))_j-\mathbb {1}_{[{\bar{\tau }}_1^{j,t_l,x,\beta ^*}>t_k]}{\tilde{R}}_j(t_k-t_l)$$ for all $$j\in {\mathcal {I}}(\beta ^*)\setminus {\mathcal {I}}(\mathbf{b})$$. Hence,
\begin{aligned} {\varLambda }^1_{\mathbf{b},i}(t_l,x;t_k)&={\mathbb {E}}\big [{\varLambda }^1_{\beta ^*,i}(t_{l+1},X_{t_{l+1}}^{t_l,x};t_k)\big ] \\&\quad \;-2\sum _{j\in {\mathcal {I}}(\beta ^*)\setminus {\mathcal {I}}(\mathbf{b})}{\varLambda }^2_{\beta ^*,i,j}(t_l,x;t_k){\tilde{R}}_j(t_k-t_l), \end{aligned}
for $$t_k>t_l$$.

### Remark 2

The choice of a cost function that is quadratic in p leads to terms of the type $${\varLambda }^2_{\mathbf{b},i,j}$$, giving us a computational complexity that is $${\mathcal {O}}(n^22^n)$$. Including higher order terms in the cost function would lead to higher computational complexity. A third order polynomial would for example give terms of the type $${\varLambda }^3_{\mathbf{b},i,j,k}$$ with a resulting computational complexity $${\mathcal {O}}(n^32^n)$$.

## 5 Numerical example

In the numerical example we will consider a tracking problem where an operator wants to minimize
\begin{aligned} J(u)&:={\mathbb {E}}\left[ \int _0^T \left( f_{\textit{pen}}(X_t-\sum _{i=1}^n p_i(t))^2+(c^f)^{\top } p(t)\right) dt \right. \nonumber \\&\quad \; \left. +f_{\textit{pen},T}\left( X_T-\sum _{i=1}^n p_i(T)\right) ^2+\sum _{j=1}^N c^{\beta _{j-1}}_{\beta _j}\right] \end{aligned}
(33)
over a period of $$T=24$$ h, where $$f_{\textit{pen}},f_{\textit{pen},T}>0$$ are penalization coefficients, $$c^f\in {\mathbb {R}}_+^n$$ is the marginal production cost in the different units and the switching costs are constant. The signal $$(X_t:0\le t\le T)$$ to be tracked is given by the sum, $$X_t=d(t)+Z_t$$ of a deterministic forecast $$(d(t):0\le t\le T)$$ and an Ornstein–Uhlenbeck process that solves the SDE
\begin{aligned} dZ_t&=-a Z_t dt + \sigma dW_t,\quad \text {for } t\in [0,T] \\ Z_0&=x_0-d(0), \end{aligned}
where $$a=0.01$$ and $$\sigma =10$$. We will investigate the performance of the limited feedback control $$u^\diamond$$ for three different shapes of the forecast d(t),
\begin{aligned} d_1(t)&:=100+20t,\\ d_2(t)&:=500\left( 1-\tfrac{2}{T}|t-T/2|\right) ,\\ d_3(t)&:=250(1+\sin (2\pi t/T)), \end{aligned}
that are depicted in Fig. 1.
We assume that the operator has at her disposal a set of six production units whose data is summarized in Table 1, where $$c^f_i$$ is the marginal production cost in Unit i and the associated ramp function is defined through the constants $$0\le \delta '_i<\delta _i$$ and $${\bar{p}}_i$$ as
\begin{aligned} R_i(s)=1_{[\delta '_i,\delta _i]}(s)\frac{s-\delta '_i}{\delta _i-\delta '_i}{\bar{p}}_i,\quad \text {for }i=1,\ldots ,6. \end{aligned}
Table 1

Data for the production units in the example

i

$${\bar{p}}_i$$

$$c_i^{0}$$

$$c_i^{1}$$

$$c_i^f$$

$$\delta '_i$$

$$\delta _i$$

1

150

5000

200

3

0.5

7

2

125

3000

200

4

1

6

3

100

2000

150

4

0.5

5

4

75

1500

100

4

1

4

5

50

750

100

5

0.5

3

6

25

500

100

7

1

2

Equation (33) can be written
\begin{aligned} J(u)&={\mathbb {E}}\bigg [\int _0^T \Big ([X_t\,\, (p(t))^{\top }]Q\left[ \begin{array}{c} X_t \\ p(t) \end{array}\right] +(c^f)^{\top } p(t)\Big )dt \\&\quad +[X_T\, \, (p(T))^{\top }]M\left[ \begin{array}{c}X_T \\ p(T) \end{array}\right] -\sum _{j=1}^N c^{\beta _{j-1}}_{\beta _j}\bigg ], \end{aligned}
where Q and M are symmetric matrices. Hence, the problem of finding an efficient control scheme fits in the quadratic setting described in Sect. 4.1.
We solve the problem for constants $$f_{\textit{pen}}=0.1$$ and $$f_{\textit{pen},T}=0.3$$ and using three different sets of available units $$F_1:=\{3,5\}$$, $$F_2:=\{2,4,6\}$$ and $$F_3:=\{1,2,3,4,5,6\}$$ for each of the three different forecasts.

The problem is numerically solved by means of a Markov-Chain approximation of the process $$(X_t:0\le t\le T)$$ as prescribed in Kushner and Dupuis (2001). We use a time-discretization with $$N_{{\varPi }}=241$$ points and discretize the state space of $$(X_t:0\le t\le T)$$ using 201 grid-points.

With this discretization, the numerical solution was obtained in 4, 18 and 720 s for the limited feedback algorithm. For the fully augmented solution method the first two settings with two and three units where solved in around 220 and 12,000 s, respectively (it seemed computationally impossible to obtain a solution with the full system of six units).

Figures 2, 3 and 4 show the expected operation costs at time zero for the limited feedback approach (solid blue lines) and the corresponding minimal operation costs obtained by state space augmentation (dashed magenta lines) for the three different forecasts. In all cases the expected operation costs decreased with more units, in particular the expected operation cost with units $$\{3,5\}$$ was always higher than the expected operation cost with units $$\{2,4,6\}$$.

In Figs. 5, 6 and 7 the relative error of the limited feedback approximation is plotted for the three different forecasts. Here, we define the relative error as the function
\begin{aligned} e_{rel}(x):=100\left( \frac{{\hat{v}}_\mathbf{0}^{{\varPi }}(0,x)}{v_{\mathbf{0}}^{{\varPi }}(0,x,0,0)}-1\right) , \end{aligned}
where $$v_{\mathbf{0}}^{{\varPi }}$$ is the discretized version of $$v_{\mathbf{0}}$$. In the figures the blue lines are the relative errors with units $$\{3,5\}$$ and the green lines are the relative errors with units $$\{2,4,6\}$$.

Note that the level of sub-optimality induced by the limited feedback approximation depends on the properties of the process $$(X_t: 0\le t\le T)$$ but also on the available units. The seemingly higher error with three units ($$F_2$$) compared to with two units ($$F_1$$) can, however, be partially explained by the lower operation cost for $$F_2$$ leading to a higher weight of the absolute error in the relative error.

## 6 Conclusion

In this paper we consider the problem of optimizing the operation of n production units that can be operated in two modes, “off” and “on”, in the presence of uncertainties. What distinguishes the investigated problem from many other problems treated in the literature is the fact that we assume that following each switch from mode “off” to mode “on” there is a delay-period under which the output ramps up.

First we use a probabilistic approach to show that the problem has a unique solution that can be represented by a set of value functions.

As the first formulation suffers from the curse of dimensionality often encountered in optimal control, we then develop an approximation routine based on limiting the information used in the decision process. In a numerical example the limited feedback approach shows a considerable improvement of the computational tractability of the problem.

The main computational issue of the limited feedback approach now resides in the exploding number of switching modes, $$|{\mathcal {J}}|$$, that have to be analyzed when solving the Bellman equation (25)–(26). Already for the small example in Sect. 5 with $$n=6$$ bids we get $$|{\mathcal {J}}|=2^6=64$$ modes. In a system with, for example, $$n=12$$ units we would get 4096 switching modes. A future line of research could thus be to try to sort out unnecessary states that the optimal control is unlikely to reach in order to reduce the number of switching modes.

## Footnotes

1. 1.

$$N_i$$ is the (random) number of interventions on Unit i.

2. 2.

If unit i is “on” at time $$t\in [0,T]$$ then $$(\zeta _t)_i$$ is the minimum of $$\delta _i$$ and the time that has passed since the most recent switch from “off” to “on” of Unit i.

3. 3.

For $$s\ge t$$, if Unit i is “on” at time s then $$(\zeta _s^{t,z,\mathbf{b}})_i$$ is the minimum of $$\delta _i$$ and the time that has passed since the most recent switch from “off” to “on” of Unit i, given that at time t operation was in mode $$\mathbf{b}$$ with ramp-time z.

4. 4.

For each $$u\in {\mathcal {U}}$$, we have $$J(u)=J^\mathbf{0}(0,x_0,0;u)$$.

5. 5.

To make notation less heavy we have dropped the subscript in $$\beta ^*_{t_l,x,\mathbf{b}}$$.

## References

1. Aïd R, Campi L, Langrené N, Pham H (2014) A probabilistic numerical method for optimal multiple switching problems in high dimension. SIAM J Financ Math 5(1):191–231.
2. Aïd R, Federico S, Pham H, Villeneuve B (2015) Explicit investment rules with time-to-build and uncertainty. J Econ Dyn Control 51:240–256
3. Bar-Ilan A, Sulem A (1995) Explicit solution of inventory problems with delivery lags. Math Oper Res 20(3):709–720
4. Bar-Ilan A, Sulem A, Zanello A (2002) Time-to-build and capacity choice. J Econ Dyn Control 26(1):69–98.
5. Bertsekas DP (2005) Dynamic programming and optimal control, vol 1, 3rd edn. Athena Scientific, Belmont
6. Bruder B, Pham H (2009) Impulse control problem on finite horizon with execution delay. Stoch Process Appl 119:1436–1469
7. Carmona R, Ludkovski M (2008) Pricing asset scheduling flexibility using optimal switching. Appl Math Financ 15:405–447
8. Deng J, Xia Z (2005) Pricing and hedging electric supply contracts: a case with tolling agreements. Preprint, Georgia Institute of Technology, AtlantaGoogle Scholar
9. Djehiche B, Hamadéne S, Popier A (2009) A finite horizon optimal multiple switching problem. SIAM J Control Optim 47(4):2751–2770
10. El Asri B, Hamadéne S (2009) The finite horizon optimal multi-modes switching problem: the viscosity solution approach. Appl Math Optim 60:213–235
11. El-Karoui N, Kapoudjian C, Pardoux E, Peng S, Quenez MC (1997) Reflected solutions of backward SDEs and related obstacle problems for PDEs. Ann Probab 25(2):702–737
12. Gut A (2005) Probability: a graduate course. Springer, New York
13. Karatzas I, Shreve SE (1991) Brownian motion and stochastic calculus. Springer, New York
14. Karatzas I, Shreve SE (1998) Methods of mathematical finance. Springer, New York
15. Kushner HJ, Dupuis P (2001) Numerical methods for stochastic control problems in continuous time, 2nd edn. Springer, New York
16. Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: a simple least-squares approach. Rev Financ Stud 14:113–148
17. Øksendal B, Sulem A (2008) Optimal stochastic impulse control with delayed reaction. Appl Math Optim 58:243–255
18. Perninge M (2015) A control-variable regression monte carlo technique for short-term electricity generation planning. arXiv:1512.08880v1
19. Perninge M, Söder L (2014) Irreversible investments with delayed reaction: an application to generation re-dispatch in power system operation. Math Methods Oper Res 79:195–224.
20. Yong J, Zhou XY (1999) Stochastic controls: Hamiltonian systems and HJB equations. Springer, New York