# Planning using hierarchical constrained Markov decision processes

- 664 Downloads
- 1 Citations

## Abstract

Constrained Markov decision processes offer a principled method to determine policies for sequential stochastic decision problems where multiple costs are concurrently considered. Although they could be very valuable in numerous robotic applications, to date their use has been quite limited. Among the reasons for their limited adoption is their computational complexity, since policy computation requires the solution of constrained linear programs with an extremely large number of variables. To overcome this limitation, we propose a hierarchical method to solve large problem instances. States are clustered into macro states and the parameters defining the dynamic behavior and the costs of the clustered model are determined using a Monte Carlo approach. We show that the algorithm we propose to create clustered states maintains valuable properties of the original model, like the existence of a solution for the problem. Our algorithm is validated in various planning problems in simulation and on a mobile robot platform, and we experimentally show that the clustered approach significantly outperforms the non-hierarchical solution while experiencing only moderate losses in terms of objective functions.

## Keywords

Constrained Markov decision processes Planning Uncertainty## Notes

### Acknowledgements

This paper extends preliminary results presented in Feyzabadi and Carpin (2015). This work is supported by the National Institute of Standards and Technology under cooperative agreement 70NANB12H143. Any opinions, findings, and conclusions or recommendations expressed in these materials are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of the funding agencies of the U.S. Government.

## References

- Altman, E. (1999).
*Constrained Markov decision processes*. Boca Raton: CRC Press.zbMATHGoogle Scholar - Bai, A., Wu, F., & Chen, X. (2012). Online planning for large MDPs with MAXQ decomposition. In
*Proceedings of the 11th international conference on autonomous agents and multiagent systems*(Vol. 3, pp. 1215–1216).Google Scholar - Barry, J., Kaelbling, L. P., & Lozano-Pérez, T. (2010).
*Hierarchical solution of large Markov decision processes*. Technical report, MIT.Google Scholar - Barry, J. L., Kaelbling, L. P., & Lozano-Pérez, T. T. (2011). DetH*: Approximate hierarchical solution of large markov decision processes. In
*International joint conference on artificial intelligence (IJCAI)*.Google Scholar - Bertsekas, D. P. (2005).
*Dynamic programming and optimal control*(Vol. 1, 2). Belmont, MA: Athena Scientific.zbMATHGoogle Scholar - Bouvrie, J., & Maggioni, M. (2012). Efficient solution of Markov decision problems with multiscale representations. In
*2012 50th annual Allerton conference on communication, control, and computing (Allerton)*(pp. 474–481). IEEE.Google Scholar - Carpin, S., Pavone, M., & Sadler, B. M. (2014). Rapid multirobot deployment with time constraints. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1147–1154).Google Scholar
- Chow, Y.-L., Pavone, M., Sadler, B. M., & Carpin, S. (2015). Trading safety versus performance: rapid deployment of robotic swarms with robust performance constraints.
*ASME Journal of Dynamic Systems, Measurement and Control, 137*(3), 031005-1–031005-11.Google Scholar - Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In
*Proceedings of the international joint conference on artificial intelligence*(pp. 1860–1865).Google Scholar - Dai, P., Mausam, M., & Weld, D. S. (2009). Focused topological value iteration. In
*International conference on automated planning and scheduling*.Google Scholar - Dai, P., Mausam, M., Weld, D. S., & Goldsmith, J. (2011). Topological value iteration algorithms.
*Journal of Artificial Intelligence Research*,*42*(1), 181–209.MathSciNetzbMATHGoogle Scholar - Ding, X. C., Englot, B., Pinto, A., Speranzon, A., & Surana, A. (2014). Hierarchical multi-objective planning: From mission specifications to contingency management. In
*2014 IEEE international conference on robotics and automation (ICRA)*(pp .3735–3742). IEEE.Google Scholar - Ding, X. C., Pinto, A., & Surana, A. (2013). Strategic planning under uncertainties via constrained Markov decision processes. In
*Proceedings of the IEEE international conference on robotics and automation*(pp. 4568–4575).Google Scholar - El Chamie, M., & Açikmeşe, B. (2016). Convex synthesis of optimal policies for Markov decision processes with sequentially-observed transitions. In
*Proceedings of the American control conference*(pp. 3862–3867).Google Scholar - Feyzabadi, S., & Carpin, S. (2014). Risk aware path planning using hierarchical constrained Markov decision processes. In
*Proceedings of the IEEE international conference on automation science and engineering*(pp. 297–303).Google Scholar - Feyzabadi, S., & Carpin, S. (2015). HCMDP: A hierarchical solution to constrained markov decision processes. In
*Proceedings of the IEEE international conference on robotics and automation*(pp. 3791–3798).Google Scholar - Grisetti, G., Stachniss, C., & Burgard, W. (2007). Improved techniques for grid mapping with Rao-Blackwellized particle filters.
*IEEE Transactions on Robotics*,*23*(1), 36–46.CrossRefGoogle Scholar - Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In
*Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence*(pp. 220–229). Morgan Kaufmann Publishers.Google Scholar - Hoey, J., St-Aubin, R., Hu, A.J., & Boutilier, C. C. (1999). SPUDD: Stochastic planning using decision diagrams. In
*Proceedings of uncertainty in artificial intelligence*(pp .279–288).Google Scholar - Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning.
*International Journal of Robotics Research*,*30*(7), 846–894.CrossRefzbMATHGoogle Scholar - Kavraki, L. E., Švetska, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps for path planning in high-dimensional configuration spaces.
*IEEE Transactions on Robotics and Automation*,*12*(4), 566–580.CrossRefGoogle Scholar - Kochenderfer, M. J. (2015).
*Decision making under uncertainty: Theory and application*. Cambridge: MIT Press.zbMATHGoogle Scholar - LaValle, S. M. (2006).
*Planning algorithms*. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar - LaValle, S. M., & Kuffner, J. J. (2001). Randomized kinodynamic planning.
*International Journal of Robotics Research*,*20*(5), 378–400.CrossRefGoogle Scholar - Moldovan, T. M., & Abbeel, P. (2012). Risk aversion in Markov decision processes via near optimal Chernoff bounds. In
*NIPS*(pp. 3140–3148).Google Scholar - Pineau, J., Roy, N., & Thrun, S. (2001). A hierarchical approach to pomdp planning and execution. In
*Workshop on hierarchy and memory in reinforcement learning (ICML)*(Vol. 65, p. 51).Google Scholar - Puterman, M. L. (2005).
*Markov decision processes: Discrete stochastic dynamic programming*. Hoboken: Wiley-Interscience.zbMATHGoogle Scholar - Thrun, S., Burgard, W., & Fox, D. (2006).
*Probabilistic robotics*. Cambridge: MIT Press.zbMATHGoogle Scholar - Vien, N. A., & Toussaint, M. (2015). Hierarchical Monte-Carlo planning. In
*AAAI*(pp. 3613–3619).Google Scholar