Autonomous Robots

, Volume 41, Issue 8, pp 1589–1607 | Cite as

Planning using hierarchical constrained Markov decision processes

  • Seyedshams Feyzabadi
  • Stefano Carpin


Constrained Markov decision processes offer a principled method to determine policies for sequential stochastic decision problems where multiple costs are concurrently considered. Although they could be very valuable in numerous robotic applications, to date their use has been quite limited. Among the reasons for their limited adoption is their computational complexity, since policy computation requires the solution of constrained linear programs with an extremely large number of variables. To overcome this limitation, we propose a hierarchical method to solve large problem instances. States are clustered into macro states and the parameters defining the dynamic behavior and the costs of the clustered model are determined using a Monte Carlo approach. We show that the algorithm we propose to create clustered states maintains valuable properties of the original model, like the existence of a solution for the problem. Our algorithm is validated in various planning problems in simulation and on a mobile robot platform, and we experimentally show that the clustered approach significantly outperforms the non-hierarchical solution while experiencing only moderate losses in terms of objective functions.


Constrained Markov decision processes Planning Uncertainty 



This paper extends preliminary results presented in Feyzabadi and Carpin (2015). This work is supported by the National Institute of Standards and Technology under cooperative agreement 70NANB12H143. Any opinions, findings, and conclusions or recommendations expressed in these materials are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of the funding agencies of the U.S. Government.


  1. Altman, E. (1999). Constrained Markov decision processes. Boca Raton: CRC Press.zbMATHGoogle Scholar
  2. Bai, A., Wu, F., & Chen, X. (2012). Online planning for large MDPs with MAXQ decomposition. In Proceedings of the 11th international conference on autonomous agents and multiagent systems (Vol. 3, pp. 1215–1216).Google Scholar
  3. Barry, J., Kaelbling, L. P., & Lozano-Pérez, T. (2010). Hierarchical solution of large Markov decision processes. Technical report, MIT.Google Scholar
  4. Barry, J. L., Kaelbling, L. P., & Lozano-Pérez, T. T. (2011). DetH*: Approximate hierarchical solution of large markov decision processes. In International joint conference on artificial intelligence (IJCAI).Google Scholar
  5. Bertsekas, D. P. (2005). Dynamic programming and optimal control (Vol. 1, 2). Belmont, MA: Athena Scientific.zbMATHGoogle Scholar
  6. Bouvrie, J., & Maggioni, M. (2012). Efficient solution of Markov decision problems with multiscale representations. In 2012 50th annual Allerton conference on communication, control, and computing (Allerton) (pp. 474–481). IEEE.Google Scholar
  7. Carpin, S., Pavone, M., & Sadler, B. M. (2014). Rapid multirobot deployment with time constraints. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1147–1154).Google Scholar
  8. Chow, Y.-L., Pavone, M., Sadler, B. M., & Carpin, S. (2015). Trading safety versus performance: rapid deployment of robotic swarms with robust performance constraints. ASME Journal of Dynamic Systems, Measurement and Control, 137(3), 031005-1–031005-11.Google Scholar
  9. Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In Proceedings of the international joint conference on artificial intelligence (pp. 1860–1865).Google Scholar
  10. Dai, P., Mausam, M., & Weld, D. S. (2009). Focused topological value iteration. In International conference on automated planning and scheduling.Google Scholar
  11. Dai, P., Mausam, M., Weld, D. S., & Goldsmith, J. (2011). Topological value iteration algorithms. Journal of Artificial Intelligence Research, 42(1), 181–209.MathSciNetzbMATHGoogle Scholar
  12. Ding, X. C., Englot, B., Pinto, A., Speranzon, A., & Surana, A. (2014). Hierarchical multi-objective planning: From mission specifications to contingency management. In 2014 IEEE international conference on robotics and automation (ICRA) (pp .3735–3742). IEEE.Google Scholar
  13. Ding, X. C., Pinto, A., & Surana, A. (2013). Strategic planning under uncertainties via constrained Markov decision processes. In Proceedings of the IEEE international conference on robotics and automation (pp. 4568–4575).Google Scholar
  14. El Chamie, M., & Açikmeşe, B. (2016). Convex synthesis of optimal policies for Markov decision processes with sequentially-observed transitions. In Proceedings of the American control conference (pp. 3862–3867).Google Scholar
  15. Feyzabadi, S., & Carpin, S. (2014). Risk aware path planning using hierarchical constrained Markov decision processes. In Proceedings of the IEEE international conference on automation science and engineering (pp. 297–303).Google Scholar
  16. Feyzabadi, S., & Carpin, S. (2015). HCMDP: A hierarchical solution to constrained markov decision processes. In Proceedings of the IEEE international conference on robotics and automation (pp. 3791–3798).Google Scholar
  17. Grisetti, G., Stachniss, C., & Burgard, W. (2007). Improved techniques for grid mapping with Rao-Blackwellized particle filters. IEEE Transactions on Robotics, 23(1), 36–46.CrossRefGoogle Scholar
  18. Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (pp. 220–229). Morgan Kaufmann Publishers.Google Scholar
  19. Hoey, J., St-Aubin, R., Hu, A.J., & Boutilier, C. C. (1999). SPUDD: Stochastic planning using decision diagrams. In Proceedings of uncertainty in artificial intelligence (pp .279–288).Google Scholar
  20. Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning. International Journal of Robotics Research, 30(7), 846–894.CrossRefzbMATHGoogle Scholar
  21. Kavraki, L. E., Švetska, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), 566–580.CrossRefGoogle Scholar
  22. Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.zbMATHGoogle Scholar
  23. LaValle, S. M. (2006). Planning algorithms. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  24. LaValle, S. M., & Kuffner, J. J. (2001). Randomized kinodynamic planning. International Journal of Robotics Research, 20(5), 378–400.CrossRefGoogle Scholar
  25. Moldovan, T. M., & Abbeel, P. (2012). Risk aversion in Markov decision processes via near optimal Chernoff bounds. In NIPS (pp. 3140–3148).Google Scholar
  26. Pineau, J., Roy, N., & Thrun, S. (2001). A hierarchical approach to pomdp planning and execution. In Workshop on hierarchy and memory in reinforcement learning (ICML) (Vol. 65, p. 51).Google Scholar
  27. Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken: Wiley-Interscience.zbMATHGoogle Scholar
  28. Thrun, S., Burgard, W., & Fox, D. (2006). Probabilistic robotics. Cambridge: MIT Press.zbMATHGoogle Scholar
  29. Vien, N. A., & Toussaint, M. (2015). Hierarchical Monte-Carlo planning. In AAAI (pp. 3613–3619).Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.School of EngineeringUniversity of California, MercedMercedUSA

Personalised recommendations