Cooperative Dynamic Domain Reduction

  • Aaron MaEmail author
  • Michael Ouimet
  • Jorge Cortés
Conference paper
Part of the Springer Proceedings in Advanced Robotics book series (SPAR, volume 9)


Unmanned vehicles (UxVs) are increasingly deployed in a wide range of challenging scenarios, including disaster response, surveillance, and search and rescue. This paper is motivated by scenarios where a heterogeneous swarm of UxVs is tasked with completing a variety of different objectives that possibly require cooperation from vehicles of varying capabilities. Our goal is to develop an approach that enables vehicles to aid each other in the services of these objectives in a distributed and autonomous fashion. To address this problem, we build on Dynamic domain reduction for multi-agent planning (DDRP), which is a framework that utilizes model-based hierarchical reinforcement learning and spatial state abstractions crafted for robotic planning. Our strategy to tackle the exponential complexity of reasoning over the joint action space of the multi-agent system is to have agents reason over single-agent trajectories, evaluate the result as a function of the cooperative objectives that can be completed, and use simulated annealing to refine the search for the best set of joint trajectories. The resulting algorithm is termed Cooperative dynamic domain reduction for multi-agent planning (CDDRP). Our analysis characterizes the long-term convergence in probability to the optimal set of trajectories. We provide simulations to estimate the performance of CDDRP in the context of swarm deployment.



This work was supported by ONR Award N00014-16-1-2836.


  1. 1.
    Ma, A., Ouimet, M., Cortés, J.: Dynamic domain reduction for multi-agent planning. In: International Symposium on Multi-Robot and Multi-Agent Systems, pp. 142–149, Los Angeles, CA (2017)Google Scholar
  2. 2.
    Gerkey, B.P., Mataric, M.J.: A formal analysis and taxonomy of task allocation in multi-robot systems. Int. J. Robot. Res. 23(9), 939–954 (2004)CrossRefGoogle Scholar
  3. 3.
    Bullo, F., Cortés, J., Martínez, S.: Distributed Control of Robotic Networks. Applied Mathematics Series. Princeton University Press (2009). Electronically available at
  4. 4.
    Mesbahi, M., Egerstedt, M.: Graph Theoretic Methods in Multiagent Networks. Applied Mathematics Series. Princeton University Press (2010)Google Scholar
  5. 5.
    Dunbabin, M., Marques, L.: Robots for environmental monitoring: significant advancements and applications. IEEE Robot. Autom. Mag. 19(1), 24–39 (2012)CrossRefGoogle Scholar
  6. 6.
    Das, J., Py, F., Harvey, J.B.J., Ryan, J.P., Gellene, A., Graham, R., Caron, D.A., Rajan, K., Sukhatme, G.S.: Data-driven robotic sampling for marine ecosystem monitoring. Int. J. Robot. Res. 34(12), 1435–1452 (2015)CrossRefGoogle Scholar
  7. 7.
    Cortés, J., Egerstedt, M.: Coordinated control of multi-robot systems: a survey. SICE J. Control Meas. Syst. Integr. 10(6), 495–503 (2017)CrossRefGoogle Scholar
  8. 8.
    Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Broz, F., Nourbakhsh, I., Simmons, R.: Planning for human-robot interaction using time-state aggregated POMDPs. In: AAAI, vol. 8, pp. 1339–1344 (2008)Google Scholar
  10. 10.
    Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2014)Google Scholar
  11. 11.
    Howard, R.: Dynamic Programming and Markov Processes. M.I.T. Press (1960)Google Scholar
  12. 12.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific (1995)Google Scholar
  13. 13.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: ECML, vol. 6, pp. 282–293. Springer (2006)Google Scholar
  14. 14.
    Parr, R., Russell, S.: Hierarchical control and learning for Markov decision processes, University of California, Berkeley, Berkeley, CA (1998)Google Scholar
  15. 15.
    Bai, A., Srivastava, S., Russell, S.: Markovian state and action abstractions for MDPs via hierarchical MCTS. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, pp. 3029–3039, New York, NY (2016)Google Scholar
  16. 16.
    Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13(4), 341–379 (2003)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: Proceedings of the 16th International Joint Conference on Artifical Intelligence (IJCAI), vol. 1, pp. 478–485 (1999)Google Scholar
  18. 18.
    Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Laarhoven, P., Aarts, E.: Simulated annealing. In: Simulated Annealing: Theory and Applications, pp. 7–15. Springer (1987)Google Scholar
  20. 20.
    Malek, M., Guruswamy, M., Pandya, M., Owens, H.: Serial and parallel simulated annealing and tabu search algorithms for the traveling salesman problem. Ann. Oper. Res. 21(1), 59–84 (1989)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Hajek, B.: Cooling schedules for optimal annealing. Math. Oper. Res. 13(2), 311–329 (1988)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Suman, B., Kumar, P.: A survey of simulated annealing as a tool for single and multiobjective optimization. J. Oper. Res. Soc. 57(10), 1143–1160 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Mechanical and Aerospace EngineeringUniversity of CaliforniaSan DiegoUSA
  2. 2.SPAWAR Systems Center PacificSan DiegoUSA

Personalised recommendations