Abstract
The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP’s) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we first derive some important properties for a pseudo-Poisson equation associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution satisfying the optimality equation holds under some classical hypotheses and that this optimal solution yields to an optimal control strategy for the average control problem for the continuous-time PDMP in a feedback form.
Similar content being viewed by others
References
Arapostathis, A., Borkar, V.S., Fernández-Gaucherand, E., Ghosh, M.K., Marcus, S.I.: Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31(2), 282–344 (1993)
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control. Mathematics in Science and Engineering, vol. 139. Academic Press, New York (1978). The discrete time case
Borkar, V.S.: Topics in Controlled Markov Chains. Pitman Research Notes in Mathematics Series, vol. 240. Longman, Harlow (1991)
Costa, O.L.V., Dufour, F.: Relaxed long run average continuous control of piecewise deterministic Markov processes. In: Proceedings of the European Control Conference, pp. 5052–5059. Kos, Greece, July, 2007
Costa, O.L.V., Dufour, F.: Average control of piecewise deterministic Markov processes. SIAM J. Control Optim. (2008, under review). 0809.0477
Costa, O.L.V., Dufour, F.: The vanishing discount approach for the average continuous control of piecewise deterministic Markov processes. J. Appl. Probab. 46(4), 1157–1183 (2009)
Davis, M.H.A.: Markov Models and Optimization. Chapman & Hall, London (1993)
Dynkin, E.B., Yushkevich, A.A.: Controlled Markov Processes. Grundlehren der Mathematischen Wissenschaften, vol. 235. Springer, Berlin (1979)
Gordienko, E., Hernández-Lerma, O.: Average cost Markov control processes with weighted norms: Existence of canonical policies. Appl. Math. (Warsaw) 23, 199–218 (1995)
Guo, X., Rieder, U.: Average optimality for continuous-time Markov decision processes in polish spaces. Ann. Appl. Probab. 16, 730–756 (2006)
Guo, X., Zhu, Q.: Average optimality for Markov decision processes in Borel spaces: A new condition and approach. J. Appl. Probab. 43, 318–334 (2006)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 30. Springer, New York (1996). Basic optimality criteria
Hernández-Lerma, O., Lasserre, J.B.: Policy iteration for average cost Markov control processes on Borel spaces. Acta Appl. Math. 47, 125–154 (1997)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 42. Springer, New York (1999)
Hernández-Lerma, O., Montes-de-Oca, R., Cavazos-Cadena, R.: Recurrence conditions for Markov decision processes with Borel state space: a survey. Ann. Oper. Res. 28(1–4), 29–46 (1991)
Meyn, S.P.: The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans. Automat. Control 42(12), 1663–1680 (1997)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York (1994)
Schäl, M.: Conditions for optimality and for the limit of the n-stage optimal polices to be optimal. Z. Wahrscheinlichkeitstheor. Verw. Geb. 32, 179–96 (1975)
Author information
Authors and Affiliations
Corresponding author
Additional information
O.L.V. Costa received financial support from CNPq (Brazilian National Research Council), grant 301067/09-0.
F. Dufour was supported by ARPEGE program of the French National Agency of Research (ANR), project “FAUTOCOES”, number ANR-09-SEGI-004.
Rights and permissions
About this article
Cite this article
Costa, O.L.V., Dufour, F. The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes. Appl Math Optim 62, 185–204 (2010). https://doi.org/10.1007/s00245-010-9099-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-010-9099-4