Abstract
In this paper, we investigate an optimization problem for continuous-time Markov decision processes with both impulsive and continuous controls. We consider the so-called constrained problem where the objective of the controller is to minimize a total expected discounted optimality criterion associated with a cost rate function while keeping other performance criteria of the same form, but associated with different cost rate functions, below some given bounds. Our model allows multiple impulses at the same time moment. The main objective of this work is to study the associated linear program defined on a space of measures including the occupation measures of the controlled process and to provide sufficient conditions to ensure the existence of an optimal control.
Similar content being viewed by others
References
Arapostathis, A., Borkar, V.S., Ghosh, M.K.: Ergodic Control of Diffusion Processes. Encyclopedia of Mathematics and Its Applications, vol. 143. Cambridge University Press, Cambridge (2012)
Bhatt, Abhay G., Borkar, Vivek S.: Occupation measures for controlled Markov processes: characterization and optimality. Ann. Probab. 24(3), 1531–1562 (1996)
Buckdahn, R., Goreac, D., Quincampoix, M.: Stochastic optimal control and linear programming approach. Appl. Math. Optim. 63(2), 257–276 (2011)
Christensen, Sören: On the solution of general impulse control problems using superharmonic functions. Stoch. Process. Appl. 124(1), 709–729 (2014)
Costa, O.L.V., Dufour, F.: A linear programming formulation for constrained discounted continuous control for piecewise deterministic Markov processes. J. Math. Anal. Appl. 424(2), 892–914 (2015)
Davis, M.H.A.: Markov Models and Optimization. Monographs on Statistics and Applied Probability, vol. 49. Chapman & Hall, London (1993)
Dufour, F., Horiguchi, M., Piunovskiy, A.B.: The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. Appl. Probab. 44(3), 774–793 (2012)
Dufour, F., Piunovskiy, A.B.: Impulsive control for continuous-time Markov decision processes. Adv. Appl. Probab. 47(1), 106–127 (2015)
Guo, X., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and applications. Stochastic Modelling and Applied Probability, vol. 62. Springer, Berlin (2009)
Guo, X., Hernández-Lerma, O., Prieto-Rumeau, T.: A survey of recent results on continuous-time Markov decision processes. Top 14(2), 177–261 (2006). With comments and a rejoinder by the authors
Guo, X., Piunovskiy, A.: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36(1), 105–132 (2011)
Helmes, K., Stockbridge, R.H.: Linear programming approach to the optimal stopping of singular stochastic processes. Stochastics 79(3–4), 309–335 (2007)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 30. Springer, New York (1996)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 42. Springer, New York (1999)
Hordijk, A., van der Duyn Schouten, F.A.: Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model. Adv. Appl. Probab. 15(2), 274–303 (1983)
Hordijk, A., Schouten, F.A.V.D.D.: Markov decision drift processes: conditions for optimality obtained by discretization. Math. Oper. Res. 10(1), 160–173 (1985)
Jacod, J.: Multivariate point processes: predictable projection, Radon-Nikodým derivatives, representation of martingales. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 31,235–253 (1974/75)
Jacod, J.: Calcul Stochastique et Problèmes de Martingales. Lecture Notes in Mathematics, vol. 714. Springer, Berlin (1979)
Kitaev, M.Y., Rykov, V.V.: Controlled Queueing Systems. CRC Press, Boca Raton (1995)
Kurtz, T.G., Stockbridge, R.H.: Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36(2), 609–653 (1998)
Kushner, H.J., Dupuis, P.G.: Numerical Methods for Stochastic Control Problems in Continuous Time. Applications of Mathematics (New York), vol. 24. Springer, New York (1992)
Kushner, H.J., Martins, L.F.: Numerical methods for stochastic singular control problems. SIAM J. Control Optim. 29(6), 1443–1475 (1991)
Last, G., Brandt, A.: Marked Point Processes on the Real Line. Probability and Its Applications (New York). Springer, New York (1995)
Piunovskiy, A., Zhang, Y.: Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optim. 49(5), 2032–2061 (2011)
Piunovskiy, A.B.: Multicriteria impulsive control of jump Markov processes. Math. Methods Oper. Res. 60(1), 125–144 (2004)
Prieto-Rumeau, T., Hernández-Lerma, O.: Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optim. 47(4), 1888–1908 (2008)
Prieto-Rumeau, T., Hernández-Lerma, O.: Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. ICP Advanced Texts in Mathematics, vol. 5. Imperial College Press, London (2012)
Stockbridge, R.H.: Time-average control of martingale problems: a linear programming formulation. Ann. Probab. 18(1), 206–217 (1990)
Yushkevich, A.A.: Continuous time Markov decision processes with interventions. Stochastics 9(4), 235–274 (1983)
Yushkevich, A.A.: Markov decision processes with both continuous and impulsive control. In: Stochastic optimization (Kiev, 1984). Lecture Notes in Control and Information Sciences, vol. 81, pp. 234–246. Springer, Berlin (1986)
Yushkevich, A.A.: Bellman inequalities in Markov decision deterministic drift processes. Stochastics 23(1), 25–77 (1987)
Yushkevich, A.A.: Verification theorems for Markov decision processes with controllable deterministic drift, gradual and impulse controls. Teor. Veroyatnost. i Primenen. 34(3), 528–551 (1989)
Acknowledgments
This study has been carried out with financial support from the French State, managed by the French National Research Agency (ANR) in the frame of the “Investments for the future” Programme IdEx Bordeaux - CPU (ANR-10-IDEX-03-02).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proofs of Propositions 4.3 and 4.4
The next result provides a sufficient condition in terms of the finiteness of the occupation measure to ensure that the process is not explosive.
Lemma A.1
For any \(u\in \mathcal {U}\), \(\displaystyle \eta ^i_u({\mathbf {X}}\times \{\Delta \})\le 1+\frac{1}{\alpha } \int _{\mathbb {K}^i} \overline{q}(\mathbf {X}|x,a)\eta ^g_u(dx,da)+\eta ^i_u({\mathbf {X}}\times \mathbf {A}^i).\) If \(\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty \) then \(\mu ^{i}_{u}(\mathbf {Y})<\infty \) and \(\mathbb {P}^{u}_{x_{0}}(T_{\infty }<\infty )=0\).
Proof
Note that
Since \(\nu =\nu _0+\nu _1\) is the predictable projection of \(\mu \) and \(\nu _1(ds,\cdot )\) is concentrated on \({\mathbf {Y}}^*\), we see that
and
Finally, since \( \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}\}=\bigcup _{k=j}^\infty {\mathbf {Y}}_k\),
showing the first part of the result. To prove the last statement, observe first that for any \(j\in \mathbb {N}^*\), we have
Consequently, \(\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })=\widetilde{\mu }_{u}^{i}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta }) =\sum _{j\in \mathbb {N}} (j+1) \mu _{u}^{i}(\mathbf {Y}_{j})\ge \mu _{u}^{i}(\mathbf {Y})\). Now, we have that \(\displaystyle \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n=2}^{\infty } e^{-\alpha T_{n}} I_{\{T_{n}<T_{\infty }\}} \Bigg ] \le \mu _{u}^{i}(\mathbf {Y}) < \infty \), showing the last part of the result. \(\Box \)
Proof of Proposition 4.3 Consider \(\Gamma \in \mathcal {B}(\mathbf {X})\). From Lemma A.1, \(\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1\) and so, by using the product formula for functions of bounded variation
Therefore, combining the bounded convergence Theorem and the fact that \(\mu ^{i}_{u}(\mathbf {Y})<\infty \) (see Lemma A.1), we have
Recalling the definition \(\mu _{u}^{i}\) (see Eq. 9) and the fact that \(\nu \) is the predictable projection of \(\mu \), we obtain by using Lemma 3.2
and so,
By using Fubini’s Theorem, we have that
Moreover, observe that
Combining the last three equations, it follows that
Finally, remark that \(I_{\Gamma }(\bar{x}(y))= \sum _{j=1}^{\infty } I_{\Gamma \times \{\Delta \}} (y_{j})\) for any \(y=\big (y_{1},y_{2},\ldots ,y_{j},\ldots \big )\in \mathbf {Y}\). Therefore, \(\displaystyle \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)= \eta _{u}^{i}(\Gamma \times \{\Delta \})\) showing the result. \(\square \)
Lemma A.2
Consider a strategy \(u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}\) fixed with \(u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\).
Then, for any \(n\in \mathbb {N}^{*}\), \(\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })\), \(t\in \mathbb {R}_{+}\), \(x\in \mathbf {X}\) and \(h_{n}\in \mathbf {H}_{n}\)
where \(h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}\). Similarly, for any \(\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })\) and \(x\in \mathbf {X}\)
Proof
This Lemma is a straightforward consequence of Lemma 9.4.3 in [14]. \(\square \)
Proof of Proposition 4.4 By using the fact that \(\nu \) is the predictable projection of \(\mu \) and Lemma 3.2, we have
Now, by using Lemma A.2, it follows that
Consequently
showing the result. \(\square \)
Appendix 2: Proof of Proposition 4.8
This appendix is dedicated to the proof of Proposition 4.8. We first need to derive some technical results. In all this section, we consider \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\) and \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\) fixed. Let us introduce the stochastic kernel \(G_{\pi ,\varphi }\) on \(\mathbb {R}^{*}_{+}\times \mathbf {Y}\) given \(\mathbf {Y}\)
and the stochastic kernel \(L_{\pi }\) on \(\mathbf {X}\) given \(\mathbf {Y}\)
For notational convenience, we denote
Lemma B.1
Let \(\gamma \in \mathcal {P}(\mathbf {Y})\). Then \(\widetilde{\gamma }\) is supported on \(\mathbb {K}^{i}_{\Delta }\) and \(\widetilde{\gamma }(\mathbf {X}\times \{\Delta \})=1\). Consider \(x\in \mathbf {X}\) and a randomized stationary policy \(\varphi \) for the model \(\mathcal {M}^{i}\) then \(\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \}|x)\le 1\). Moreover, \(\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \}|x)=1\) if and only if \(P^{\varphi }(\mathbf {Y}|x)=1\).
Proof
Let \(j\in \mathbb {N}^{*}\). Observe that \(\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \}\}=\mathbf {Y}_{j-1}\) and the first assertion is clear. Regarding the second claim, we have \(P^{\varphi } \Big ( \big \{\mathbf {y}\in (\mathbb {K}^{i}_{\Delta })^{\infty }: \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \} \big \} |x \Big ) =P^{\varphi } \Big ( \mathbf {Y}_{j-1} |x \Big )\) for \(x\in \mathbf {X}\) since \(P^{\varphi }\) is the strategic measure for the model \(\mathcal {M}^{i}\) generated by \(\varphi \), showing the last part of the result. \(\square \)
Lemma B.2
For any \(\Upsilon \in \mathcal {B}(\mathbf {Y})\) and \(n\in \mathbb {N}^{*}\), we have
Proof
Let us show the result by induction. Clearly, this equation holds for \(n=1\). Now, assume that Eq. (53) holds for n. Consider \(\Upsilon \in \mathcal {B}(\mathbf {Y})\). Then,
showing the result. \(\square \)
Proposition B.3
The following three equalities hold:
and
Proof
From the definition of \(\mu _{u^{\pi ,\varphi }}^{i}\) (see Eq. 9) and Lemma B.2, we have
Observe that \(\widetilde{H}_{\pi ,\varphi }= L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }\). Since \(\eta _{u^{\pi ,\varphi }}^{i}(dx,da)=\widetilde{\mu }_{u^{\pi ,\varphi }}^{i}(dx,da)\), we obtain easily Eq. (54).
Moreover, we have
and so by using the definition of \(L_{\pi }\) and Lemma B.2, we obtain (55).
Now, from Eq. (55) we get
Recalling (54), we have Eq. (56), showing the result. \(\square \)
Proof of Proposition 4.8 Observe that \(\widetilde{H}_{\pi ,\varphi }= L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }\), and so
and with (55) we get (19). The measure \(\widehat{\eta }_{u^{\pi ,\varphi }}^{g}\) is finite by definition and so, by using Propositions B.3 and Assumption (A1), we have that for any \(\Gamma \in \mathcal {B}(\mathbf {X})\)
Now, from Eq. (55) and the definition of \(H_{\pi ,\varphi }\) (see Eq. 52), we have that for any \(\Gamma \in \mathcal {B}(\mathbf {X})\)
Moreover, observe that \(\displaystyle \frac{\bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} = 1 - \frac{\alpha }{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))}\) and so, for \(n\ge 2\)
and
Consequently, by using the expression of \(\widehat{\eta }_{u^{\pi ,\varphi }}^{g}\) in (55) and Eqs. (60)\(-\)(61)
Note that the above calculations are possible since the quantities \(\sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(\Gamma |x_{0})\) and \(\sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(\Gamma \times \{\Delta \}|x_{0})\) are finite (see inequalities (58) and (58)). Recalling (55) and combining Eqs. (59) and (62) we obtain that
showing the result. \(\square \)
Rights and permissions
About this article
Cite this article
Dufour, F., Piunovskiy, A.B. Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach. Appl Math Optim 74, 129–161 (2016). https://doi.org/10.1007/s00245-015-9310-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-015-9310-8
Keywords
- Impulsive control
- Continuous control
- Continuous-time Markov decision process
- Linear programming approach
- Discounted cost