Abstract
Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search considerably improves the efficiency of learning policies. We demonstrate the method on a proof-of-principle non-linear population model, showing strong performance in a non-trivial task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Kernel functions typically also have an amplitude parameter, which we consider to be equal to 1.
References
Baier, C., Haverkort, B., Hermanns, H., Katoen, J.-P.: Model-checking algorithms for continuous-time Markov chains. IEEE Trans. Softw. Eng. 29(6), 524–541 (2003)
Baier, C., Hermanns, H., Katoen, J.-P., Haverkort, B.R.: Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes. Theor. Comput. Sci. 345(1), 2–26 (2005)
Baier, C., Kwiatkowska, M.Z.: Model checking for a probabilistic branching time logic with fairness. Distrib. Comput. 11, 125–155 (1998)
Bartocci, E., Bortolussi, L., Brázdil, T., Milios, D., Sanguinetti, G.: Policy learning for time-bounded reachability in continuous-time Markov decision processes via doubly-stochastic gradient ascent (2016). CoRR ArXiv, abs/1605.09703
Bartocci, E., Bortolussi, L., Nenzi, L., Sanguinetti, G.: System design of stochastic models using robustness of temporal properties. Theor. Comput. Sci. 587, 3–25 (2015)
Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with infinite-horizon, policy-gradient estimation. J. Artif. Int. Res. 15(1), 351–381 (2011)
Bianco, A., de Alfaro, L.: Model checking of probabilistic and nondeterministic systems. In: Thiagarajan, P.S. (ed.) Foundations of Software Technology and Theoretical Computer Science. LNCS, vol. 1026, pp. 499–513. Springer, Heidelberg (1995)
Bortolussi, L., Hillston, J., Latella, D., Massink, M.: Continuous aproximation of collective systems behaviour: a tutorial. Perform. Eval. 70(5), 317–349 (2013)
Bortolussi, L., Milios, D., Sanguinetti, G.: Smoothed model checking for uncertain continuous time Markov chains. Inf. Comput. 247, 235–253 (2016)
Bortolussi, L., Sanguinetti, G.: Learning and designing stochastic processes from logical constraints. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 89–105. Springer, Heidelberg (2013)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT, pp. 177–186. Physica-Verlag HD (2010)
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 421–436. Springer, Heidelberg (2012)
Butkova, Y., Hatefi, H., Hermanns, H., Krcál, J.: Optimal continuous time Markov decisions. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 166–182. Springer, Heidelberg (2015)
Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)
Guo, X., Hernández-Lerma, O., Prieto-Rumeau, T., Cao, X.-R., Zhang, J., Hu, Q., Lewis, M.E., Vélez, R.: A survey of recent results on continuous-time Markov decision processes. TOP 14(2), 177–261 (2006)
Henriques, D., Martins, J., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking for Markov decision processes. In: Proceedings of QEST, pp. 84–93. IEEE Computer Society (2012)
Henzinger, T., Jobstmann, B., Wolf, V.: Formalisms for specifying Markovian population models. Int. J. Found. Comput. Sci. 22(04), 823–841 (2011)
Jha, S.K., Clarke, E.M., Langmead, C.J., Legay, A., Platzer, A., Zuliani, P.: A Bayesian approach to model checking biological systems. In: Degano, P., Gorrieri, R. (eds.) CMSB 2009. LNCS, vol. 5688, pp. 218–234. Springer, Heidelberg (2009)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011)
Lefevre, C.: Optimal control of a birth and death epidemic process. Oper. Res. 29(5), 971–982 (1981)
Mannor, S., Rubinstein, R.Y., Gat, Y.: The cross entropy method for fast policy search. In: ICML, pp. 512–519 (2003)
Medina Ayala, A.I., Andersson, S.B., Belta, C.: Probabilistic control from time-bounded temporal logic specifications in dynamic environments. In: Proceedings of ICRA 2012, pp. 4705–4710. IEEE (2012)
Miller, B.: Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22(3), 552–569 (1968)
Murata, N.: A statistical study of on-line learning. In: On-Line Learning in Neural Networks, pp. 63–92. Cambridge University Press, Cambridge (1998)
Neuhaeusser, M.R., Zhang, L.: Time-bounded reachability probabilities in continuous-time Markov decision processes. In: Proceedings of QEST, pp. 209–218. IEEE (2010)
Neuhäußer, M.R.: Model checking nondeterministic and randomly timed systems. Ph.D. thesis, RWTH Aachen University (2010)
Qiu, Q., Wu, Q., Pedram, M.: Stochastic modeling of a power-managed system-construction and optimization. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 20(10), 1200–1217 (2001)
Rabe, M.N., Schewe, S.: Finite optimal control for time-bounded reachability in CTMDPs and continuous-time Markov games. Acta Inform. 48, 291–315 (2011)
Rabe, M.N., Schewe, S.: Optimal time-abstract schedulers for CTMDPs and continuous-time Markov games. Theor. Comput. Sci. 467, 53–67 (2013)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Rosenstein, M., Barto, A.G.: Robot weightlifting by direct policy search. In: Proceedings of IJCAI, vol. 17, pp. 839–846 (2001)
Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York (1998)
Stulp, F., Sigaud, O.: Path integral policy improvement with covariance matrix adaptation (2012). CoRR ArXiv, arXiv:1206.4621
Stulp, F., Sigaud, O.: Policy improvement methods: between black-box optimization and episodic reinforcement learning (2012)
Younes, H.L.S., Simmons, R.G.: Statistical probabilistic model checking with a focus on time-bounded properties. Inf. Comput. 204(9), 1368–1409 (2006)
Acknowledgements
L.B. acknowledges partial support from the EU-FET project QUANTICOL (nr. 600708) and by FRA-UniTS. G.S. and D.M. acknowledge support from the European Reasearch Council under grant MLCS306999. T.B. is supported by the Czech Science Foundation, grant No. 15-17564S. E.B. acknowledges the partial support of the Austrian National Research Network S 11405-N23 (RiSE/SHiNE) of the Austrian Science Fund (FWF), the ICT COST Action IC1402 Runtime Verification beyond Monitoring (ARVI) and the IKT der Zukunft of Austrian FFG project HARMONIA (nr. 845631).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bartocci, E., Bortolussi, L., Brázdil, T., Milios, D., Sanguinetti, G. (2016). Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent. In: Agha, G., Van Houdt, B. (eds) Quantitative Evaluation of Systems. QEST 2016. Lecture Notes in Computer Science(), vol 9826. Springer, Cham. https://doi.org/10.1007/978-3-319-43425-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-43425-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43424-7
Online ISBN: 978-3-319-43425-4
eBook Packages: Computer ScienceComputer Science (R0)