Model Checking for Safe Navigation Among Humans

  • Sebastian Junges
  • Nils JansenEmail author
  • Joost-Pieter Katoen
  • Ufuk Topcu
  • Ruohan Zhang
  • Mary Hayhoe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11024)


We investigate the use of probabilistic model checking to synthesise optimal strategies for autonomous systems that operate among uncontrollable agents such as humans. To formally assess such uncontrollable behaviour, we use models obtained from reinforcement learning. These behaviour models are, e.g., based on data collected in experiments in which humans execute dynamic tasks in a virtual environment. We first describe a method to translate such behaviour models into Markov decision processes (MDPs). The composition of these MDPs with models for (controllable) autonomous systems gives rise to stochastic games (SGs). MDPs and SGs are amenable to probabilistic model checking which enables the synthesis of strategies that provably adhere to formal specifications such as probabilistic temporal logic constraints. Experiments with a prototype provide (1) systematic insights on the credibility and the characteristics of behavioural models and (2) methods for automated synthesis of strategies satisfying guarantees on their required characteristics in the presence of humans.


  1. 1.
    Brafman, R.I., Tennenholtz, M.: On partially controlled multi-agent systems. J. Artif. Intell. Res. 4, 477–507 (1996)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Dresner, K., Stone, P.: A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 31, 591–656 (2008)CrossRefGoogle Scholar
  3. 3.
    Wellman, M.P., Wurman, P.R., O’Malley, K., Bangera, R., Reeves, D., Walsh, W.E.: Designing the market game for a trading agent competition. IEEE Internet Comput. 5(2), 43–51 (2001)CrossRefGoogle Scholar
  4. 4.
    Khandelwal, P., et al.: Bwibots: a platform for bridging the gap between AI and human-robot interaction research. Int. J. Robot. Res. 36, 635–659 (2017)CrossRefGoogle Scholar
  5. 5.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)CrossRefGoogle Scholar
  6. 6.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Kwiatkowska, M.Z.: Model checking for probability and time: from theory to practice. In: LICS, p. 351. IEEE Computer Society (2003)Google Scholar
  8. 8.
    Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). Scholar
  9. 9.
    Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A STORM is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017).
  10. 10.
    Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994)CrossRefGoogle Scholar
  11. 11.
    Condon, A.: The complexity of stochastic games. Inf. Comput. 96(2), 203–224 (1992)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Kwiatkowska, M., Parker, D., Wiltsche, C.: PRISM-Games 2.0: a tool for multi-objective strategy synthesis for stochastic games. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 560–566. Springer, Heidelberg (2016). Scholar
  13. 13.
    Dean, T.L., Givan, R.: Model minimization in Markov decision processes. In: AAAI/IAAI, pp. 106–111. AAAI Press/The MIT Press (1997)Google Scholar
  14. 14.
    Tong, M.H., Zohar, O., Hayhoe, M.M.: Control of gaze while walking: task structure, reward, and uncertainty. J. Vis. 17(1), 28 (2017)CrossRefGoogle Scholar
  15. 15.
    Rothkopf, C.A., Ballard, D.H.: Modular inverse reinforcement learning for visuomotor behaviour. Biol. Cybern. 107(4), 477–490 (2013)CrossRefGoogle Scholar
  16. 16.
    Sprague, N., Ballard, D.: Multiple-goal reinforcement learning with modular sarsa (0). IJCA I, 1445–1447 (2003)Google Scholar
  17. 17.
    Ballard, D.H., Kit, D., Rothkopf, C.A., Sullivan, B.: A hierarchical modular architecture for embodied cognition. Multisens. Res. 26(1–2), 177–204 (2013)CrossRefGoogle Scholar
  18. 18.
    Leong, Y.C., Radulescu, A., Daniel, R., DeWoskin, V., Niv, Y.: Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93(2), 451–463 (2017)CrossRefGoogle Scholar
  19. 19.
    Konur, S., Dixon, C., Fisher, M.: Analysing robot swarm behaviour via probabilistic model checking. Robot. Auton. Syst. 60(2), 199–213 (2012)CrossRefGoogle Scholar
  20. 20.
    Johnson, B., Kress-Gazit, H.: Analyzing and revising synthesized controllers for robots with sensing and actuation errors. Int. J. Robot. Res. 34(6), 816–832 (2015)CrossRefGoogle Scholar
  21. 21.
    Giaquinta, R., Hoffmann, R., Ireland, M., Miller, A., Norman, G.: Strategy synthesis for autonomous agents using PRISM. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 220–236. Springer, Cham (2018). Scholar
  22. 22.
    Chen, T., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: Synthesis for multi-objective stochastic games: an application to autonomous urban driving. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 322–337. Springer, Heidelberg (2013). Scholar
  23. 23.
    Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Synthesis of human-in-the-loop control protocols for autonomous systems. IEEE Trans. Autom. Sci. Eng. 13(2), 450–462 (2016)CrossRefGoogle Scholar
  24. 24.
    Lacerda, B., Parker, D., Hawes, N.: Optimal policy generation for partially satisfiable co-safe LTL specifications. In: IJCAI, pp. 1587–1593. AAAI Press (2015)Google Scholar
  25. 25.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. ICML 157, 157–163 (1994)Google Scholar
  26. 26.
    Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artif. Intell. 136(2), 215–250 (2002)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Bruni, R., Corradini, A., Gadducci, F., Lluch Lafuente, A., Vandin, A.: Modelling and analyzing adaptive self-assembly strategies with maude. In: Durán, F. (ed.) WRLA 2012. LNCS, vol. 7571, pp. 118–138. Springer, Heidelberg (2012). Scholar
  28. 28.
    Katoen, J.P.: The probabilistic model checking landscape. In: LICS, pp. 31–45. ACM (2016)Google Scholar
  29. 29.
    Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Sculley, D., Phillips, T., Ebner, D., Chaudhary, V., Young, M.: Machine learning: the high-interest credit card of technical debt (2014)Google Scholar
  31. 31.
    Winterer, L., et al.: Motion planning under partial observability using game-based abstraction. In: CDC, pp. 2201–2208. IEEE (2017)Google Scholar
  32. 32.
    Etessami, K., Kwiatkowska, M.Z., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Log. Methods Comput. Sci. 4(4), 1–21 (2008)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Agha, G., Palmskog, K.: A survey of statistical model checking. ACM Trans. Model. Comput. Simul. 28(1), 6:1–6:39 (2018)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Wachter, B., Zhang, L., Hermanns, H.: Probabilistic model checking modulo theories. In: QEST, pp. 129–140. IEEE CS (2007)Google Scholar
  35. 35.
    Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sebastian Junges
    • 1
  • Nils Jansen
    • 2
    Email author
  • Joost-Pieter Katoen
    • 1
  • Ufuk Topcu
    • 3
  • Ruohan Zhang
    • 3
  • Mary Hayhoe
    • 3
  1. 1.RWTH Aachen UniversityAachenGermany
  2. 2.Radboud UniversityNijmegenThe Netherlands
  3. 3.The University of Texas at AustinAustinUSA

Personalised recommendations