Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Dufour, F.; Piunovskiy, A. B.

doi:10.1007/s00245-015-9310-8

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Published: 24 July 2015

Volume 74, pages 129–161, (2016)
Cite this article

Applied Mathematics & Optimization Submit manuscript

F. Dufour^1,2,3 &
A. B. Piunovskiy⁴

403 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we investigate an optimization problem for continuous-time Markov decision processes with both impulsive and continuous controls. We consider the so-called constrained problem where the objective of the controller is to minimize a total expected discounted optimality criterion associated with a cost rate function while keeping other performance criteria of the same form, but associated with different cost rate functions, below some given bounds. Our model allows multiple impulses at the same time moment. The main objective of this work is to study the associated linear program defined on a space of measures including the occupation measures of the controlled process and to provide sufficient conditions to ensure the existence of an optimal control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Optimal Stopping and Impulse Control with Constraint

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

Article 15 April 2016

Xianping Guo, Yonghui Huang & Yi Zhang

Stability of nonlinear impulsive stochastic systems with Markovian switching under generalized average dwell time condition

Article 19 October 2018

Xiaozheng Fu & Quanxin Zhu

References

Arapostathis, A., Borkar, V.S., Ghosh, M.K.: Ergodic Control of Diffusion Processes. Encyclopedia of Mathematics and Its Applications, vol. 143. Cambridge University Press, Cambridge (2012)
MATH Google Scholar
Bhatt, Abhay G., Borkar, Vivek S.: Occupation measures for controlled Markov processes: characterization and optimality. Ann. Probab. 24(3), 1531–1562 (1996)
Article MathSciNet MATH Google Scholar
Buckdahn, R., Goreac, D., Quincampoix, M.: Stochastic optimal control and linear programming approach. Appl. Math. Optim. 63(2), 257–276 (2011)
Article MathSciNet MATH Google Scholar
Christensen, Sören: On the solution of general impulse control problems using superharmonic functions. Stoch. Process. Appl. 124(1), 709–729 (2014)
Article MathSciNet MATH Google Scholar
Costa, O.L.V., Dufour, F.: A linear programming formulation for constrained discounted continuous control for piecewise deterministic Markov processes. J. Math. Anal. Appl. 424(2), 892–914 (2015)
Article MathSciNet MATH Google Scholar
Davis, M.H.A.: Markov Models and Optimization. Monographs on Statistics and Applied Probability, vol. 49. Chapman & Hall, London (1993)
Book MATH Google Scholar
Dufour, F., Horiguchi, M., Piunovskiy, A.B.: The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. Appl. Probab. 44(3), 774–793 (2012)
Article MathSciNet MATH Google Scholar
Dufour, F., Piunovskiy, A.B.: Impulsive control for continuous-time Markov decision processes. Adv. Appl. Probab. 47(1), 106–127 (2015)
Article MathSciNet MATH Google Scholar
Guo, X., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and applications. Stochastic Modelling and Applied Probability, vol. 62. Springer, Berlin (2009)
MATH Google Scholar
Guo, X., Hernández-Lerma, O., Prieto-Rumeau, T.: A survey of recent results on continuous-time Markov decision processes. Top 14(2), 177–261 (2006). With comments and a rejoinder by the authors
Article MathSciNet MATH Google Scholar
Guo, X., Piunovskiy, A.: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36(1), 105–132 (2011)
Article MathSciNet MATH Google Scholar
Helmes, K., Stockbridge, R.H.: Linear programming approach to the optimal stopping of singular stochastic processes. Stochastics 79(3–4), 309–335 (2007)
MathSciNet MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 30. Springer, New York (1996)
Book MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 42. Springer, New York (1999)
Book MATH Google Scholar
Hordijk, A., van der Duyn Schouten, F.A.: Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model. Adv. Appl. Probab. 15(2), 274–303 (1983)
Article MathSciNet MATH Google Scholar
Hordijk, A., Schouten, F.A.V.D.D.: Markov decision drift processes: conditions for optimality obtained by discretization. Math. Oper. Res. 10(1), 160–173 (1985)
Article MathSciNet MATH Google Scholar
Jacod, J.: Multivariate point processes: predictable projection, Radon-Nikodým derivatives, representation of martingales. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 31,235–253 (1974/75)
Jacod, J.: Calcul Stochastique et Problèmes de Martingales. Lecture Notes in Mathematics, vol. 714. Springer, Berlin (1979)
MATH Google Scholar
Kitaev, M.Y., Rykov, V.V.: Controlled Queueing Systems. CRC Press, Boca Raton (1995)
MATH Google Scholar
Kurtz, T.G., Stockbridge, R.H.: Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36(2), 609–653 (1998)
Article MathSciNet MATH Google Scholar
Kushner, H.J., Dupuis, P.G.: Numerical Methods for Stochastic Control Problems in Continuous Time. Applications of Mathematics (New York), vol. 24. Springer, New York (1992)
Book MATH Google Scholar
Kushner, H.J., Martins, L.F.: Numerical methods for stochastic singular control problems. SIAM J. Control Optim. 29(6), 1443–1475 (1991)
Article MathSciNet MATH Google Scholar
Last, G., Brandt, A.: Marked Point Processes on the Real Line. Probability and Its Applications (New York). Springer, New York (1995)
MATH Google Scholar
Piunovskiy, A., Zhang, Y.: Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optim. 49(5), 2032–2061 (2011)
Article MathSciNet MATH Google Scholar
Piunovskiy, A.B.: Multicriteria impulsive control of jump Markov processes. Math. Methods Oper. Res. 60(1), 125–144 (2004)
Article MathSciNet MATH Google Scholar
Prieto-Rumeau, T., Hernández-Lerma, O.: Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optim. 47(4), 1888–1908 (2008)
Article MathSciNet MATH Google Scholar
Prieto-Rumeau, T., Hernández-Lerma, O.: Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. ICP Advanced Texts in Mathematics, vol. 5. Imperial College Press, London (2012)
MATH Google Scholar
Stockbridge, R.H.: Time-average control of martingale problems: a linear programming formulation. Ann. Probab. 18(1), 206–217 (1990)
Article MathSciNet MATH Google Scholar
Yushkevich, A.A.: Continuous time Markov decision processes with interventions. Stochastics 9(4), 235–274 (1983)
Article MathSciNet MATH Google Scholar
Yushkevich, A.A.: Markov decision processes with both continuous and impulsive control. In: Stochastic optimization (Kiev, 1984). Lecture Notes in Control and Information Sciences, vol. 81, pp. 234–246. Springer, Berlin (1986)
Yushkevich, A.A.: Bellman inequalities in Markov decision deterministic drift processes. Stochastics 23(1), 25–77 (1987)
Article MathSciNet MATH Google Scholar
Yushkevich, A.A.: Verification theorems for Markov decision processes with controllable deterministic drift, gradual and impulse controls. Teor. Veroyatnost. i Primenen. 34(3), 528–551 (1989)
MathSciNet Google Scholar

Download references

Acknowledgments

This study has been carried out with financial support from the French State, managed by the French National Research Agency (ANR) in the frame of the “Investments for the future” Programme IdEx Bordeaux - CPU (ANR-10-IDEX-03-02).

Author information

Authors and Affiliations

Bordeaux INP, IMB, UMR CNRS 5251, Talence, France
F. Dufour
Universite de Bordeaux, IMB, UMR CNRS 5251, Talence, France
F. Dufour
Inria Bordeaux-Sud-Ouest, 200 Avenue de la Vieille Tour, 33405, Talence Cedex, France
F. Dufour
Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, United Kingdom
A. B. Piunovskiy

Authors

F. Dufour
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Piunovskiy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Dufour.

Appendices

Appendix 1: Proofs of Propositions 4.3 and 4.4

The next result provides a sufficient condition in terms of the finiteness of the occupation measure to ensure that the process is not explosive.

Lemma A.1

For any $u\in \mathcal {U}$, $\displaystyle \eta ^i_u({\mathbf {X}}\times \{\Delta \})\le 1+\frac{1}{\alpha } \int _{\mathbb {K}^i} \overline{q}(\mathbf {X}|x,a)\eta ^g_u(dx,da)+\eta ^i_u({\mathbf {X}}\times \mathbf {A}^i).$ If $\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty $ then $\mu ^{i}_{u}(\mathbf {Y})<\infty $ and $\mathbb {P}^{u}_{x_{0}}(T_{\infty }<\infty )=0$.

Proof

Note that

$$\begin{aligned} \eta ^i_u({\mathbf {X}}\times \{\Delta \})&=\widetilde{\mu }^i_u ( {\mathbf {X}}\times \{\Delta \}) =\sum _{j=1}^\infty \mu ^i_u \big ( \{ {\mathbf {y}}\in {\mathbf {Y}}:{\mathbf {y}}_j\in {\mathbf {X}}\times \{\Delta \} \} \big ) \nonumber \\&=\sum _{j=1}^\infty \mu ^i_u({\mathbf {Y}}_{j-1})=\mu ^i_u({\mathbf {Y}}) =\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s}\mu (ds,{\mathbf {Y}})\right] +u_0({\mathbf {Y}}|x_0). \end{aligned}$$

Since $\nu =\nu _0+\nu _1$ is the predictable projection of $\mu $ and $\nu _1(ds,\cdot )$ is concentrated on ${\mathbf {Y}}^*$, we see that

$$\begin{aligned} \eta ^i_u({\mathbf {X}}\times \{\Delta \})&=\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s}\int _{\mathbf {A}^g}\overline{q}({\mathbf {X}}|\overline{x}(\xi _{s-}),a)\pi (da|s)ds\right] \\&\quad +\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s} \nu _1(ds,{\mathbf {Y}})\right] +1 \nonumber \\&=\frac{1}{\alpha }\int _{\mathbb {K}^i} \overline{q}(\mathbf {X}|x,a)\eta ^g_u(dx,da)+\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s} \nu _1(ds,{\mathbf {Y}}^*)\right] +1 \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s} \nu _1(ds,{\mathbf {Y}}^*)\right] \le \mu ^i_u({\mathbf {Y}}^*)=\sum _{k=1}^\infty \mu ^i_u({\mathbf {Y}}_k). \end{aligned}$$

Finally, since $ \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}\}=\bigcup _{k=j}^\infty {\mathbf {Y}}_k$,

$$\begin{aligned} \eta ^i_u({\mathbf {X}}\times \mathbf {A}^i)=\sum _{j=1}^\infty \mu ^i_u \big ( \{ {\mathbf {y}}\in {\mathbf {Y}}: {\mathbf {y}}_j\in {\mathbf {X}}\times \mathbf {A}^i \} \big ) =\sum _{j=1}^\infty j\mu ^i_u({\mathbf {Y}}_j)\ge \sum _{k=1}^\infty \mu ^i_u({\mathbf {Y}}_k), \end{aligned}$$

showing the first part of the result. To prove the last statement, observe first that for any $j\in \mathbb {N}^*$, we have

$$\begin{aligned} \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}_{\Delta }\}= & {} \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}\} \mathop {\cup }\{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \}\} \\= & {} \mathop {\cup }_{k=j}^{\infty } \mathbf {Y}_{k}\mathop {\cup }\mathbf {Y}_{j-1} \end{aligned}$$

Consequently, $\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })=\widetilde{\mu }_{u}^{i}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta }) =\sum _{j\in \mathbb {N}} (j+1) \mu _{u}^{i}(\mathbf {Y}_{j})\ge \mu _{u}^{i}(\mathbf {Y})$. Now, we have that $\displaystyle \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n=2}^{\infty } e^{-\alpha T_{n}} I_{\{T_{n}<T_{\infty }\}} \Bigg ] \le \mu _{u}^{i}(\mathbf {Y}) < \infty $, showing the last part of the result. $\Box $

Proof of Proposition 4.3 Consider $\Gamma \in \mathcal {B}(\mathbf {X})$. From Lemma A.1, $\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1$ and so, by using the product formula for functions of bounded variation

$$\begin{aligned} e^{-\alpha t} I_{\Gamma }(\overline{x}(\xi _{t}))= & {} I_{\Gamma }(\overline{x}(y_{1})) - \int _{0}^{t} \alpha e^{-\alpha s} I_{\Gamma }(\overline{x}(\xi _{s})) ds \\&+ \int _{]0,t]\times \mathbf {Y}} e^{-\alpha s} \Big [ I_{\Gamma }(\overline{x}(z)) - I_{\Gamma }(\overline{x}(\xi _{s-})) \Big ] \mu (ds,dz) . \end{aligned}$$

Therefore, combining the bounded convergence Theorem and the fact that $\mu ^{i}_{u}(\mathbf {Y})<\infty $ (see Lemma A.1), we have

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times&\mathbf {A}^{g}) = \alpha \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{\infty } e^{-\alpha s} I_{\Gamma }(\overline{x}(\xi _{s})) ds \Bigg ] \nonumber \\&\qquad = \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{\mathbf {Y}} I_{\Gamma }(\overline{x}(y)) u_{0}(dy|x_{0}) \Bigg ]\nonumber \\&\qquad \quad \,+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [\times \mathbf {Y}} e^{-\alpha s} \Big [ I_{\Gamma }(\overline{x}(z)) - I_{\Gamma }(\overline{x}(\xi _{s-})) \Big ] \mu (ds,dz) \Bigg ]. \end{aligned}$$

Recalling the definition $\mu _{u}^{i}$ (see Eq. 9) and the fact that $\nu $ is the predictable projection of $\mu $, we obtain by using Lemma 3.2

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)\\&\quad - \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{\infty } e^{-\alpha s} \int _{\mathbf {A}^{g}} I_{\Gamma } (\overline{x}(\xi _{s})) \bar{q}(\mathbf {X}| \overline{x}(\xi _{s}),a) \pi (da |s) ds \Bigg ] \nonumber \\&- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} I_{\Gamma }(\overline{x}(\xi _{s-})) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ], \end{aligned}$$

and so,

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)\\&\quad - \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \int _{0}^{\infty } e^{-\alpha s} \delta _{\overline{x}(\xi _{s})}(dx)\pi (da |s) ds \Bigg ] \nonumber \\&- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ]. \end{aligned}$$

By using Fubini’s Theorem, we have that

$$\begin{aligned}&\mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \int _{0}^{\infty } e^{-\alpha s} \delta _{\overline{x}(\xi _{s})}(dx)\pi (da |s) ds \Bigg ]\\&\quad = \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta _{u}^{g}(dx,da). \end{aligned}$$

Moreover, observe that

$$\begin{aligned} \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Big | \mathcal {F}_{T_{n}} \Bigg ] = e^{-\alpha T_{n}} \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}). \end{aligned}$$

Combining the last three equations, it follows that

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy) - \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta _{u}^{g}(dx,da) \nonumber \\&- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ]. \end{aligned}$$

Finally, remark that $I_{\Gamma }(\bar{x}(y))= \sum _{j=1}^{\infty } I_{\Gamma \times \{\Delta \}} (y_{j})$ for any $y=\big (y_{1},y_{2},\ldots ,y_{j},\ldots \big )\in \mathbf {Y}$. Therefore, $\displaystyle \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)= \eta _{u}^{i}(\Gamma \times \{\Delta \})$ showing the result. $\square $

Lemma A.2

Consider a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$ fixed with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$.

Then, for any $n\in \mathbb {N}^{*}$, $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })$, $t\in \mathbb {R}_{+}$, $x\in \mathbf {X}$ and $h_{n}\in \mathbf {H}_{n}$

$$\begin{aligned} \widetilde{\gamma }^{0}_{n}(\Gamma \times \mathbf {A}^{i}_{\Delta } | h_{n},t,x)&= \delta _{x}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,a) \widetilde{\gamma }^{0}_{n}(dz,da | h_{n},t,x) ,\end{aligned}$$

(47)

$$\begin{aligned} \widetilde{\gamma }^{1}_{n}(\Gamma \times \mathbf {A}^{i}_{\Delta } | h_{n})&= \delta _{\bar{x}(y_{n})}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,a) \widetilde{\gamma }^{1}_{n}(dz,da | h_{n}) , \end{aligned}$$

(48)

where $h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}$. Similarly, for any $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })$ and $x\in \mathbf {X}$

$$\begin{aligned} \widetilde{u}_{0}(\Gamma \times \mathbf {A}^{i}_{\Delta } | x)&= \delta _{x}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,a) \widetilde{u}_{0}(dz,da | x) . \end{aligned}$$

(49)

Proof

This Lemma is a straightforward consequence of Lemma 9.4.3 in [14]. $\square $

Proof of Proposition 4.4 By using the fact that $\nu $ is the predictable projection of $\mu $ and Lemma 3.2, we have

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \widetilde{u}_{0}(\Gamma \times \mathbf {A}^{i}_{\Delta }|x_{0}) \\&+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [} e^{-\alpha s} \int _{\mathbf {A}^{g}}\int _\mathbf {X} \widetilde{\gamma }^{0}(\Gamma \times \mathbf {A}^{i}_{\Delta }|x,s) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds \Bigg ] \\&+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \widetilde{\gamma }^{1}_{n}(\Gamma \times \mathbf {A}^{i}_{\Delta }|H_{n}) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ]. \end{aligned}$$

Now, by using Lemma A.2, it follows that

$$\begin{aligned}&\eta _{u}^{i}( \Gamma \times \mathbf {A}^{i}_{\Delta }) \\&\quad = \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \widetilde{u}_{0}(dz,db|x_{0}) \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [} e^{-\alpha s} \int _{\mathbf {A}^{g}}\int _\mathbf {X} \delta _{x}(\Gamma ) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds \Bigg ] \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [} e^{-\alpha s} \int _{\mathbf {A}^{g}}\int _\mathbf {X} \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \widetilde{\gamma }^{0}(dz,db|x,s) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds \Bigg ] \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} I_{\Gamma }(\overline{x}(Y_{n})) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ] \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \widetilde{\gamma }^{1}_{n}(dz,db|H_{n}) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ]. \end{aligned}$$

Consequently

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db)\\&\quad + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma \mathop {\cap }\mathbf {X} | x,a) \eta _{u}^{g}(dx,da) \\&+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} I_{\Gamma }(\overline{x}(Y_{n})) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ] \end{aligned}$$

showing the result. $\square $

Appendix 2: Proof of Proposition 4.8

This appendix is dedicated to the proof of Proposition 4.8. We first need to derive some technical results. In all this section, we consider $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ and $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$ fixed. Let us introduce the stochastic kernel $G_{\pi ,\varphi }$ on $\mathbb {R}^{*}_{+}\times \mathbf {Y}$ given $\mathbf {Y}$

$$\begin{aligned} G_{\pi ,\varphi }(dt,dy| z)= \bar{q}^{\pi } R^{\varphi }(dy| \bar{x}(z)) e^{-t\bar{q}^{\pi }(\mathbf {X}|\bar{x}(z))} dt, \end{aligned}$$

(50)

and the stochastic kernel $L_{\pi }$ on $\mathbf {X}$ given $\mathbf {Y}$

$$\begin{aligned} L_{\pi }(dx|y)&= \frac{\delta _{\overline{x}(y)}(dx)}{\alpha +\bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}. \end{aligned}$$

(51)

For notational convenience, we denote

$$\begin{aligned} H_{\pi ,\varphi }=L_{\pi }\bar{q}^{\pi }R^{\varphi }. \end{aligned}$$

(52)

Lemma B.1

Let $\gamma \in \mathcal {P}(\mathbf {Y})$. Then $\widetilde{\gamma }$ is supported on $\mathbb {K}^{i}_{\Delta }$ and $\widetilde{\gamma }(\mathbf {X}\times \{\Delta \})=1$. Consider $x\in \mathbf {X}$ and a randomized stationary policy $\varphi $ for the model $\mathcal {M}^{i}$ then $\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \}|x)\le 1$. Moreover, $\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \}|x)=1$ if and only if $P^{\varphi }(\mathbf {Y}|x)=1$.

Proof

Let $j\in \mathbb {N}^{*}$. Observe that $\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \}\}=\mathbf {Y}_{j-1}$ and the first assertion is clear. Regarding the second claim, we have $P^{\varphi } \Big ( \big \{\mathbf {y}\in (\mathbb {K}^{i}_{\Delta })^{\infty }: \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \} \big \} |x \Big ) =P^{\varphi } \Big ( \mathbf {Y}_{j-1} |x \Big )$ for $x\in \mathbf {X}$ since $P^{\varphi }$ is the strategic measure for the model $\mathcal {M}^{i}$ generated by $\varphi $, showing the last part of the result. $\square $

Lemma B.2

For any $\Upsilon \in \mathcal {B}(\mathbf {Y})$ and $n\in \mathbb {N}^{*}$, we have

$$\begin{aligned} \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \delta _{Y_{n}}(\Upsilon ) \Big ] =R^{\varphi } H^{n-1}_{\pi ,\varphi }(\Upsilon |x_{0}). \end{aligned}$$

(53)

Proof

Let us show the result by induction. Clearly, this equation holds for $n=1$. Now, assume that Eq. (53) holds for n. Consider $\Upsilon \in \mathcal {B}(\mathbf {Y})$. Then,

$$\begin{aligned} \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n+1}< \infty \}}&e^{-\alpha T_{n+1}} \delta _{Y_{n+1}}(\Upsilon ) \Big ] \\&=\mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{\Theta _{n+1}< \infty \}} e^{-\alpha \Theta _{n+1}} \delta _{Y_{n+1}}(\Upsilon ) | \mathcal {F}_{T_{n}} \Big ] \Big ] \\&= \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \int _{\mathbb {R}_{+}^{*}} e^{-\alpha s} G_{\pi ,\varphi }(ds,\Upsilon | Y_{n}) \Big ] \\&= \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \int _{\mathbb {R}_{+}^{*}} e^{-\alpha s} \bar{q}^{\pi }R^{\varphi }(\Upsilon |\bar{x}(Y_{n})) e^{-s\bar{q}^{\pi }(\mathbf {X}|\bar{x}(Y_{n}))} ds \Big ] \\&= \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} H_{\pi ,\varphi }(\Upsilon | Y_{n}) \Big ] \\&= \int _{\mathbf {Y}} H_{\pi ,\varphi }(\Upsilon |z) R^{\varphi }H^{n-1}_{\pi ,\varphi }(dz|x_{0}), \end{aligned}$$

showing the result. $\square $

Proposition B.3

The following three equalities hold:

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(dx,da)= & {} \widetilde{R}^{\varphi }(dx,da|x_{0}) + \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(dx,da|x_{0}), \end{aligned}$$

(54)

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)= & {} \alpha \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(dx|x_{0}), \end{aligned}$$

(55)

and

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(\Gamma \times \{\Delta \}) =&\widehat{\eta }_{u^{\pi ,\varphi }}^{g}(\Gamma ) +\frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}^{\pi }(\mathbf {X}| x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx), \end{aligned}$$

(56)

Proof

From the definition of $\mu _{u^{\pi ,\varphi }}^{i}$ (see Eq. 9) and Lemma B.2, we have

$$\begin{aligned} \mu _{u^{\pi ,\varphi }}^{i}(dy)= & {} R^{\varphi }(dy|x_{0}) + \sum _{n=2}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \delta _{Y_{n}}(dy) \Big ] \\= & {} R^{\varphi }(dy|x_{0}) + \sum _{n=1}^{\infty } R^{\varphi } H^{n}_{\pi ,\varphi }(dy|x_{0}). \end{aligned}$$

Observe that $\widetilde{H}_{\pi ,\varphi }= L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }$. Since $\eta _{u^{\pi ,\varphi }}^{i}(dx,da)=\widetilde{\mu }_{u^{\pi ,\varphi }}^{i}(dx,da)$, we obtain easily Eq. (54).

Moreover, we have

$$\begin{aligned}&\mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ \int _{0}^{T_{\infty }} e^{-\alpha s} \delta _{\bar{x}(\xi _{s-})}(dx) ds \Bigg ]\\&\quad = \sum _{n=1}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ I_{\{T_{n}<\infty \}} \int _{]T_{n},T_{n+1}]} e^{-\alpha s} \delta _{\bar{x}(\xi _{s-})}(dx) ds \Bigg ] \\&\quad = \frac{1}{\alpha } \sum _{n=1}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ I_{\{T_{n}<\infty \}} e^{-\alpha T_{n}} \delta _{\bar{x}(Y_{n})}(dx) \big (1- e^{-\alpha \Theta _{n+1}}\big ) \Bigg ] \\&\quad = \sum _{n=1}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ I_{\{T_{n}<\infty \}} e^{-\alpha T_{n}} \frac{\delta _{\bar{x}(Y_{n})}(dx)}{\alpha +\bar{q}^{\pi }(\mathbf {X}|\bar{x}(Y_{n}))} \Bigg ], \end{aligned}$$

and so by using the definition of $L_{\pi }$ and Lemma B.2, we obtain (55).

Now, from Eq. (55) we get

$$\begin{aligned}&\widehat{\eta }_{u^{\pi ,\varphi }}^{g} (\Gamma ) +\frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}^{\pi }(\mathbf {X}| x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx) \\&\quad = \sum _{n=1}^{\infty } \int _{\mathbf {Y}} \frac{\alpha I_{\Gamma }(\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0})\\&\qquad +\sum _{n=1}^{\infty } \int _{\mathbf {Y}} \frac{I_{\Gamma }(\bar{x}(y)) \bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0}) \\&\quad = \sum _{n=1}^{\infty } \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0}) = \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0})\\&\qquad + \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma \times \{\Delta \}|x_{0}). \end{aligned}$$

Recalling (54), we have Eq. (56), showing the result. $\square $

Proof of Proposition 4.8 Observe that $\widetilde{H}_{\pi ,\varphi }= L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }$, and so

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(dx,da)&= \widetilde{R}^{\varphi }(dx,da|x_{0}) + \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }(dx,da|x_{0}), \end{aligned}$$

and with (55) we get (19). The measure $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}$ is finite by definition and so, by using Propositions B.3 and Assumption (A1), we have that for any $\Gamma \in \mathcal {B}(\mathbf {X})$

$$\begin{aligned} \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(\Gamma |x_{0}) <\infty \end{aligned}$$

(57)

$$\begin{aligned} \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(\Gamma \times \{\Delta \}|x_{0}) <\infty \end{aligned}$$

(58)

Now, from Eq. (55) and the definition of $H_{\pi ,\varphi }$ (see Eq. 52), we have that for any $\Gamma \in \mathcal {B}(\mathbf {X})$

$$\begin{aligned} \frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \})&= \sum _{n\in \mathbb {N}^{*}} R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}) \nonumber \\&= \sum _{n\in \mathbb {N}} R^{\varphi } H^{n}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma \times \{\Delta \}|x_{0}). \end{aligned}$$

(59)

Moreover, observe that $\displaystyle \frac{\bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} = 1 - \frac{\alpha }{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))}$ and so, for $n\ge 2$

$$\begin{aligned}&\int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } (dx|x_{0})\nonumber \\&\quad = \int _{\mathbf {Y}} \frac{I_{\Gamma }(\bar{x}(y)) \bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0}) \nonumber \\&\quad = R^{\varphi } H^{n-2}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma \times \{\Delta \}|x_{0}) - \alpha R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } (\Gamma |x_{0}) \end{aligned}$$

(60)

and

$$\begin{aligned} \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } L_{\pi } (dx|x_{0})= & {} \int _{\mathbf {Y}} \frac{I_{\Gamma }(\bar{x}(y)) \bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi }(dy|x_{0}) \nonumber \\= & {} \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}) - \alpha R^{\varphi } L_{\pi } (\Gamma |x_{0}). \end{aligned}$$

(61)

Consequently, by using the expression of $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}$ in (55) and Eqs. (60)$-$(61)

$$\begin{aligned} \frac{1}{\alpha } \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g} (dx|x_{0})= & {} \sum _{n=2}^{\infty } \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } (dx|x_{0}) \nonumber \\&+\int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } L_{\pi } (dx|x_{0}) \nonumber \\= & {} \sum _{n\in \mathbb {N}} R^{\varphi } H^{n}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma {\times } \{\Delta \}|x_{0}) + \widetilde{R}^{\varphi } (\Gamma {\times } \{\Delta \}|x_{0}) \nonumber \\&- \alpha \sum _{n\in \mathbb {N}} R^{\varphi } H^{n}_{\pi ,\varphi } L_{\pi } (\Gamma |x_{0}). \end{aligned}$$

(62)

Note that the above calculations are possible since the quantities $\sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(\Gamma |x_{0})$ and $\sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(\Gamma \times \{\Delta \}|x_{0})$ are finite (see inequalities (58) and (58)). Recalling (55) and combining Eqs. (59) and (62) we obtain that

$$\begin{aligned}&\frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}) - \frac{1}{\alpha } \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g} (dx|x_{0})\\&\quad = - \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}) + \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(\Gamma |x_{0}), \end{aligned}$$

showing the result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dufour, F., Piunovskiy, A.B. Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach. Appl Math Optim 74, 129–161 (2016). https://doi.org/10.1007/s00245-015-9310-8

Download citation

Published: 24 July 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00245-015-9310-8

Keywords

Mathematical Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Abstract

Access this article

Similar content being viewed by others

On Optimal Stopping and Impulse Control with Constraint

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

Stability of nonlinear impulsive stochastic systems with Markovian switching under generalized average dwell time condition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs of Propositions 4.3 and 4.4

Lemma A.1

Proof

Lemma A.2

Proof

Appendix 2: Proof of Proposition 4.8

Lemma B.1

Proof

Lemma B.2

Proof

Proposition B.3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematical Subject Classification

Navigation

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Abstract

Access this article

Similar content being viewed by others

On Optimal Stopping and Impulse Control with Constraint

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

Stability of nonlinear impulsive stochastic systems with Markovian switching under generalized average dwell time condition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs of Propositions 4.3 and 4.4

Lemma A.1

Proof

Lemma A.2

Proof

Appendix 2: Proof of Proposition 4.8

Lemma B.1

Proof

Lemma B.2

Proof

Proposition B.3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematical Subject Classification

Search

Navigation