Skip to main content

Simple Strategies in Multi-Objective MDPs

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12078)

Abstract

We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining a Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case is treated by a product construction. Experimental results using Storm and Gurobi show the feasibility of our algorithms.

currently affiliated with Vrije Universiteit Brussel.

Research partially supported by F.R.S.-FNRS Grant n\(^{\circ }\) F.4520.18 (ManySynth). Mickael Randour is an F.R.S.-FNRS Research Associate.

References

  1. Baier, C., Daum, M., Dubslaff, C., Klein, J., Klüppelholz, S.: Energy-utility quantiles. In: NASA Formal Methods, NFM. pp. 285–299 (2014). https://doi.org/10.1007/978-3-319-06200-6_24

  2. Baier, C., Dubslaff, C., Klüppelholz, S.: Trade-off analysis meets probabilistic model checking. In: CSL-LICS. pp. 1:1–1:10. ACM (2014)

    Google Scholar 

  3. Baier, C., Hermanns, H., Katoen, J.: The 10, 000 facets of MDP model checking. In: Computing and Software Science, LNCS, vol. 10000, pp. 420–451. Springer (2019)

    Google Scholar 

  4. Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)

    Google Scholar 

  5. Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In: CAV (1). LNCS, vol. 10426, pp. 160–180. Springer (2017)

    Google Scholar 

  6. Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: (ICML). pp. 41–47 (2008)

    Google Scholar 

  7. Benini, L., Bogliolo, A., Paleologo, G.A., De Micheli, G.: Policy optimization for dynamic power management. Trans. Comp.-Aided Des. Integ. Cir. Sys. 18(6), 813–833 (2006). https://doi.org/10.1109/43.766730

  8. Berthon, R., Randour, M., Raskin, J.: Threshold constraints with guarantees for parity objectives in Markov decision processes. In: ICALP. LIPIcs, vol. 80, pp. 121:1–121:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)

    Google Scholar 

  9. Bouyer, P., González, M., Markey, N., Randour, M.: Multi-weighted Markov decision processes with reachability objectives. In: GandALF. EPTCS, vol. 277, pp. 250–264 (2018)

    Google Scholar 

  10. Bruno, J.L., Downey, P.J., Frederickson, G.N.: Sequencing tasks with exponential service times to minimize the expected flow time or makespan. J. ACM 28(1), 100–113 (1981). https://doi.org/10.1145/322234.322242

    MathSciNet  CrossRef  MATH  Google Scholar 

  11. Bruyère, V., Filiot, E., Randour, M., Raskin, J.: Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. Inf. Comput. 254, 259–295 (2017)

    MathSciNet  CrossRef  Google Scholar 

  12. Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Trading memory for randomness. In: QEST. pp. 206–217. IEEE Computer Society (2004)

    Google Scholar 

  13. Chatterjee, K., Kretínská, Z., Kretínský, J.: Unifying two views on multiple mean-payoff objectives in markov decision processes. LMCS 13(2) (2017)

    Google Scholar 

  14. Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS. LNCS, vol. 3884, pp. 325–336. Springer (2006)

    Google Scholar 

  15. Chen, T., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Verifying team formation protocols with probabilistic model checking. In: CLIMA. pp. 190–207 (2011)

    Google Scholar 

  16. Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A Storm is coming: A modern probabilistic model checker. In: CAV. LNCS, vol. 10427. Springer (2017)

    Google Scholar 

  17. Delgrange, F., Katoen, J.P., Quatmann, T., Randour, M.: Simple strategies in multi-objective MDPs (technical report). CoRR abs//1910.11024 (2019), http://arxiv.org/abs/1910.11024

  18. Delgrange, F., Katoen, J.P., Quatmann, T., Randour, M.: Evaluated artifact for this paper. figshare (2020). https://doi.org/10.6084/m9.figshare.11569485

    CrossRef  Google Scholar 

  19. von Essen, C., Giannakopoulou, D.: Probabilistic verification and synthesis of the next generation airborne collision avoidance system. STTT 18(2), 227–243 (2016)

    Google Scholar 

  20. Etessami, K., Kwiatkowska, M.Z., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science 4(4) (2008). https://doi.org/10.2168/LMCS-4(4:8)2008

  21. Feng, L., Wiltsche, C., Humphrey, L.R., Topcu, U.: Controller synthesis for autonomous systems interacting with human operators. In: ICCPS. pp. 70–79. ACM (2015)

    Google Scholar 

  22. Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: SFM. LNCS, vol. 6659, pp.53–113. Springer (2011)

    Google Scholar 

  23. Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D., Qu, H.: Quantitative multi-objective verification for probabilistic systems. In: TACAS. LNCS, vol. 6605, pp. 112–127. Springer (2011)

    Google Scholar 

  24. Forejt, V., Kwiatkowska, M.Z., Parker, D.: Pareto curves for probabilistic model checking. In: ATVA. LNCS, vol. 7561, pp. 317–332. Springer (2012)

    Google Scholar 

  25. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1979)

    MATH  Google Scholar 

  26. Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L., Hendel, G., Hojny, C., Koch, T., Lübbecke, M.E., Maher, S.J., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schlösser, F., Schubert, C., Serrano, F., Shinano, Y., Viernickel, J.M., Walter, M., Wegscheider, F., Witt, J.T., Witzig, J.: The SCIP Optimization Suite 6.0. Technical report, Optimization Online (July 2018), http://www.optimization-online.org/DB_HTML/2018/07/6692.html

  27. Gurobi Optimization, L.: Gurobi optimizer reference manual (2019), http://www.gurobi.com

  28. Hartmanns, A., Junges, S., Katoen, J., Quatmann, T.: Multi-cost bounded reachability in MDP. In: TACAS (2). LNCS, vol. 10806, pp. 320–339. Springer (2018)

    Google Scholar 

  29. Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI. pp. 519–529. AUAI Press (2018)

    Google Scholar 

  30. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) Proc. 23rd International Conference on Computer Aided Verification (CAV’11). LNCS, vol. 6806, pp. 585–591. Springer (2011)

    Google Scholar 

  31. Kwiatkowska, M.Z., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST. pp. 203–204 (2012). https://doi.org/10.1109/QEST.2012.14

  32. Lacerda, B., Parker, D., Hawes, N.: Multi-objective policy generation for mobile robots under probabilistic time-bounded guarantees. In: ICAPS. pp. 504–512. AAAI Press (2017)

    Google Scholar 

  33. Lizotte, D.J., Bowling, M., Murphy, S.A.: Linear fitted-Q iteration with multiple reward functions. J. Mach. Learn. Res. 13, 3253–3295 (2012)

    MathSciNet  MATH  Google Scholar 

  34. Perny, P., Weng, P.: On finding compromise solutions in multiobjective Markov decision processes. In: ECAI. FAIA, vol. 215, pp. 969–970. IOS Press (2010)

    Google Scholar 

  35. Pia, A.D., Dey, S.S., Molinaro, M.: Mixed-integer quadratic programming is in NP. Math. Program. 162(1-2), 225–240 (2017)

    Google Scholar 

  36. Puterman, M.L.: Markov Decision Processes. John Wiley and Sons (1994)

    Google Scholar 

  37. Qiu, Q., Wu, Q., Pedram, M.: Stochastic modeling of a power-managed system: Construction and optimization. In: ISLPED. pp. 194–199. ACM (1999)

    Google Scholar 

  38. Quatmann, T., Junges, S., Katoen, J.: Markov automata with multiple objectives. In: CAV (1). LNCS, vol. 10426, pp. 140–159. Springer (2017)

    Google Scholar 

  39. Randour, M., Raskin, J., Sankur, O.: Variations on the stochastic shortest path problem. In: VMCAI. Lecture Notes in Computer Science, vol. 8931, pp.1–18. Springer (2015)

    Google Scholar 

  40. Randour, M., Raskin, J., Sankur, O.: Percentile queries in multi-dimensional Markov decision processes. FMSD 50(2-3), 207–248 (2017)

    Google Scholar 

  41. Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. JAIR 48, 67–113 (2013)

    MathSciNet  CrossRef  Google Scholar 

  42. Scheftelowitsch, D., Buchholz, P., Hashemi, V., Hermanns, H.: Multi-objective approaches to Markov decision processes with uncertain transition parameters. In: VALUETOOLS. pp. 44–51. ACM (2017)

    Google Scholar 

  43. Srinivasan, M.: Nondeterministic polling systems. Management Science 37(6), 667–681 (1991). https://doi.org/10.1287/mnsc.37.6.667

    CrossRef  MATH  Google Scholar 

  44. Wiering, M.A., de Jong, E.D.: Computing optimal stationary policies for multi-objective Markov decision processes. In: ADPRL. pp. 158–165 (2007). https://doi.org/10.1109/ADPRL.2007.368183

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Quatmann .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2020 The Author(s)

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Delgrange, F., Katoen, JP., Quatmann, T., Randour, M. (2020). Simple Strategies in Multi-Objective MDPs. In: Biere, A., Parker, D. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2020. Lecture Notes in Computer Science(), vol 12078. Springer, Cham. https://doi.org/10.1007/978-3-030-45190-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45190-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45189-9

  • Online ISBN: 978-3-030-45190-5

  • eBook Packages: Computer ScienceComputer Science (R0)