Cooperative Dynamic Domain Reduction
Unmanned vehicles (UxVs) are increasingly deployed in a wide range of challenging scenarios, including disaster response, surveillance, and search and rescue. This paper is motivated by scenarios where a heterogeneous swarm of UxVs is tasked with completing a variety of different objectives that possibly require cooperation from vehicles of varying capabilities. Our goal is to develop an approach that enables vehicles to aid each other in the services of these objectives in a distributed and autonomous fashion. To address this problem, we build on Dynamic domain reduction for multi-agent planning (DDRP), which is a framework that utilizes model-based hierarchical reinforcement learning and spatial state abstractions crafted for robotic planning. Our strategy to tackle the exponential complexity of reasoning over the joint action space of the multi-agent system is to have agents reason over single-agent trajectories, evaluate the result as a function of the cooperative objectives that can be completed, and use simulated annealing to refine the search for the best set of joint trajectories. The resulting algorithm is termed Cooperative dynamic domain reduction for multi-agent planning (CDDRP). Our analysis characterizes the long-term convergence in probability to the optimal set of trajectories. We provide simulations to estimate the performance of CDDRP in the context of swarm deployment.
This work was supported by ONR Award N00014-16-1-2836.
- 1.Ma, A., Ouimet, M., Cortés, J.: Dynamic domain reduction for multi-agent planning. In: International Symposium on Multi-Robot and Multi-Agent Systems, pp. 142–149, Los Angeles, CA (2017)Google Scholar
- 3.Bullo, F., Cortés, J., Martínez, S.: Distributed Control of Robotic Networks. Applied Mathematics Series. Princeton University Press (2009). Electronically available at http://coordinationbook.info
- 4.Mesbahi, M., Egerstedt, M.: Graph Theoretic Methods in Multiagent Networks. Applied Mathematics Series. Princeton University Press (2010)Google Scholar
- 9.Broz, F., Nourbakhsh, I., Simmons, R.: Planning for human-robot interaction using time-state aggregated POMDPs. In: AAAI, vol. 8, pp. 1339–1344 (2008)Google Scholar
- 10.Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2014)Google Scholar
- 11.Howard, R.: Dynamic Programming and Markov Processes. M.I.T. Press (1960)Google Scholar
- 12.Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific (1995)Google Scholar
- 13.Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: ECML, vol. 6, pp. 282–293. Springer (2006)Google Scholar
- 14.Parr, R., Russell, S.: Hierarchical control and learning for Markov decision processes, University of California, Berkeley, Berkeley, CA (1998)Google Scholar
- 15.Bai, A., Srivastava, S., Russell, S.: Markovian state and action abstractions for MDPs via hierarchical MCTS. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, pp. 3029–3039, New York, NY (2016)Google Scholar
- 17.Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: Proceedings of the 16th International Joint Conference on Artifical Intelligence (IJCAI), vol. 1, pp. 478–485 (1999)Google Scholar
- 19.Laarhoven, P., Aarts, E.: Simulated annealing. In: Simulated Annealing: Theory and Applications, pp. 7–15. Springer (1987)Google Scholar