International Workshop on Formal Aspects of Component Software

Formal Aspects of Component Software pp 1-30

OnPlan: A Framework for Simulation-Based Online Planning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9539)

Abstract

This paper proposes the OnPlan framework for modeling autonomous systems operating in domains with large probabilistic state spaces and high branching factors. The framework defines components for acting and deliberation, and specifies their interactions. It comprises a mathematical specification of requirements for autonomous systems. We discuss the role of such a specification in the context of simulation-based online planning. We also consider two instantiations of the framework: Monte Carlo Tree Search for discrete domains, and Cross Entropy Open Loop Planning for continuous state and action spaces. The framework’s ability to provide system autonomy is illustrated empirically on a robotic rescue example.

References

  1. 1.
    Kolobov, A., Dai, P., Mausam, M., Weld, D.S.: Reverse iterative deepening for finite-horizon MDPS with large branching factors. In: Proceedings of the 22nd International Conference on Automated Planning and Scheduling, ICAPS (2012)Google Scholar
  2. 2.
    Keller, T., Helmert, M.: Trial-based Heuristic Tree Search for Finite Horizon MDPs. In: Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS 2013), pp. 135–143. AAAI Press, June 2013Google Scholar
  3. 3.
    Weinstein, A.: Local Planning for Continuous Markov Decision Processes. Ph.D. thesis, Rutgers, The State University of New Jersey (2014)Google Scholar
  4. 4.
    Kephart, J.: An architectural blueprint for autonomic computing. IBM (2003)Google Scholar
  5. 5.
    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)MATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method, vol. 707. Wiley, New York (2011)Google Scholar
  7. 7.
    Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer Science & Business Media, New York (2013)CrossRefGoogle Scholar
  8. 8.
    Audibert, J.Y., Munos, R., Szepesvári, C.: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410(19), 1876–1902 (2009)MATHCrossRefGoogle Scholar
  9. 9.
    Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Game 4(1), 1–43 (2012)CrossRefGoogle Scholar
  10. 10.
    Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Weinstein, A., Littman, M.L.: Open-loop planning in large-scale stochastic domains. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)Google Scholar
  12. 12.
    Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., Teytaud, O.: The grand challenge of computer go: Monte carlo tree search and extensions. Commun. ACM 55(3), 106–113 (2012)CrossRefGoogle Scholar
  13. 13.
    Silver, D., Sutton, R.S., Müller, M.: Temporal-difference search in computer go. In: Borrajo, D., Kambhampati, S., Oddi, A., Fratini, S. (eds.) Proceedings of the Twenty-Third International Conference on Automated Planning and Scheduling, ICAPS 2013, Rome, Italy, June 10–14, 2013. AAAI (2013)Google Scholar
  14. 14.
    Gelly, S., Silver, D.: Monte-carlo tree search and rapid action value estimation in computer go. Artif. Intell. 175(11), 1856–1875 (2011)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)MATHCrossRefGoogle Scholar
  16. 16.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)MATHCrossRefGoogle Scholar
  17. 17.
    Sebastio, S., Vandin, A.: Multivesta: Statistical model checking for discrete event simulators. In: Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), pp. 310–315 (2013)Google Scholar
  18. 18.
    de Boer, P., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Annals OR 134(1), 19–67 (2005)MATHCrossRefGoogle Scholar
  19. 19.
    Margolin, L.: On the convergence of the cross-entropy method. Ann. Oper. Res. 134(1), 201–214 (2005)MATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Kobilarov, M.: Cross-entropy motion planning. I. J. Robotic Res. 31(7), 855–871 (2012)CrossRefGoogle Scholar
  21. 21.
    Livingston, S.C., Wolff, E.M., Murray, R.M.: Cross-entropy temporal logic motion planning. In: Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control, HSCC 2015, pp. 269–278 (2015)Google Scholar
  22. 22.
    Box, G.E., Muller, M.E.: A note on the generation of random normal deviates. Ann. Math. Stat. 29, 610–611 (1958)MATHCrossRefGoogle Scholar
  23. 23.
    Hester, T., Stone, P.: Texplore: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(3), 385–429 (2013)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Bonet, B., Geffner, H.: Labeled RTDP: Improving the convergence of real-time dynamic programming. In: ICAPS, vol. 3, pp. 12–21 (2003)Google Scholar
  25. 25.
    Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1238–1246 (2013)Google Scholar
  26. 26.
    Cazenave, T., Pepels, T., Winands, M.H.M., Lanctot, M.: Minimizing simple and cumulative regret in monte-carlo tree search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 1–15. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  27. 27.
    Mansley, C.R., Weinstein, A., Littman, M.L.: Sample-based planning for continuous action markov decision processes. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS (2011)Google Scholar
  28. 28.
    Weinstein, A., Littman, M.L.: Bandit-based planning and learning in continuous-action markov decision processes. In: Proceedings of the 22nd International Conference on Automated Planning and Scheduling, ICAPS (2012)Google Scholar
  29. 29.
    Baier, C., Katoen, J.P., et al.: Principles of Model Checking, vol. 26202649. MIT Press, Cambridge (2008)MATHGoogle Scholar
  30. 30.
    Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.): Software Engineering for Collective Autonomic Systems: Results of the ASCENS Project. LNCS, vol. 8998. Springer, Heidelberg (2015)Google Scholar
  31. 31.
    Hölzl, M.M., Gabor, T.: Continuous collaboration: A case study on the development of an adaptive cyber-physical system. In: 1st IEEE/ACM International Workshop on Software Engineering for Smart Cyber-Physical Systems, SEsCPS 2015, pp. 19–25 (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institut für InformatikLudwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations