ALAN: adaptive learning for multi-agent navigation

  • Julio Godoy
  • Tiannan Chen
  • Stephen J. Guy
  • Ioannis Karamouzas
  • Maria Gini
Part of the following topical collections:
  1. Special Issue on Distributed Robotics: From Fundamentals to Applications


In multi-agent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and obstacles, often without communication. Existing methods compute motions that are locally optimal but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop a method that allows agents to dynamically adapt their behavior to their local conditions. We formulate the multi-agent navigation problem as an action-selection problem and propose an approach, ALAN, that allows agents to compute time-efficient and collision-free motions. ALAN is highly scalable because each agent makes its own decisions on how to move, using a set of velocities optimized for a variety of navigation tasks. Experimental results show that agents using ALAN, in general, reach their destinations faster than using ORCA, a state-of-the-art collision avoidance framework, and two other navigation models.


Multi-agent navigation Online learning Action selection Multi-agent coordination 



This work was partially funded by the University of Minnesota Informatics Institute, the CONICYT PFCHA/DOCTORADO BECAS CHILE/2009 - 72100243 and the NSF through grants #CHS-1526693, #CNS-1544887, #IIS-1748541 and #IIP-1439728.


  1. Alonso-Mora, J., Breitenmoser, A., Rufli, M., Beardsley, P., & Siegwart, R. (2013). Optimal reciprocal collision avoidance for multiple non-holonomic robots. In A. Martinoli, F. Mondada, N. Correll, G. Mermoud, M. Egerstedt, Hsieh M. Ani, et al. (Eds.), Distributed autonomous robotic systems (pp. 203–216). Berlin: Springer.CrossRefGoogle Scholar
  2. Audibert, J. Y., Munos, R., & Szepesvári, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902.MathSciNetCrossRefMATHGoogle Scholar
  3. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.CrossRefMATHGoogle Scholar
  4. Bayazit, O., Lien, J. M., & Amato, N. (2003). Better group behaviors in complex environments using global roadmaps. In 8th international conference on artificial life (pp. 362–370).Google Scholar
  5. Buşoniu, L., Babuška, R., & De Schutter, B. (2008). A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C Applications and Reviews, 38(2), 156–172.CrossRefGoogle Scholar
  6. Cunningham, B., & Cao, Y. (2012). Levels of realism for cooperative multi-agent reinforcement learning. In Advances in swarm intelligence (pp. 573–582). Springer.Google Scholar
  7. Fiorini, P., & Shiller, Z. (1998). Motion planning in dynamic environments using velocity obstacles. The International Journal of Robotics Research, 17, 760–772.CrossRefGoogle Scholar
  8. Funge, J., Tu, X., & Terzopoulos, D. (1999). Cognitive modeling: knowledge, reasoning and planning for intelligent characters. In 26th annual conference on computer graphics and interactive techniques (pp. 29–38).Google Scholar
  9. Giese, A., Latypov, D., & Amato, N. M. (2014). Reciprocally-rotating velocity obstacles. In IEEE international conference on robotics and automation (pp. 3234–3241).Google Scholar
  10. Godoy, J., Karamouzas, I., Guy, S. J., & Gini, M. (2015). Adaptive learning for multi-agent navigation. In Proceedings of international conference on autonomous agents and multi-agent systems (pp. 1577–1585).Google Scholar
  11. Guy, S., Chhugani, J., Kim, C., Satish, N., Lin, M., Manocha, D., & Dubey, P. (2009). Clearpath: Highly parallel collision avoidance for multi-agent simulation. In ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 177–187).Google Scholar
  12. Guy, S., Kim, S., Lin, M., & Manocha, D. (2011). Simulating heterogeneous crowd behaviors using personality trait theory. In Proceedings ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 43–52).Google Scholar
  13. Guy, S.J., Chhugani, J., Curtis, S., Pradeep, D., Lin, M., & Manocha, D. (2010). PLEdestrians: A least-effort approach to crowd simulation. In ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 119–128).Google Scholar
  14. Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1), 97–109.MathSciNetCrossRefMATHGoogle Scholar
  15. Helbing, D., Buzna, L., & Werner, T. (2003). Self-organized pedestrian crowd dynamics and design solutions. Traffic Forum 12.Google Scholar
  16. Helbing, D., Farkas, I., & Vicsek, T. (2000). Simulating dynamical features of escape panic. Nature, 407(6803), 487–490.CrossRefGoogle Scholar
  17. Helbing, D., & Molnar, P. (1995). Social force model for pedestrian dynamics. Physical Review E, 51(5), 4282.CrossRefGoogle Scholar
  18. Helbing, D., Molnar, P., Farkas, I. J., & Bolay, K. (2001). Self-organizing pedestrian movement. Environment and Planning B: Planning and Design, 28(3), 361–384.CrossRefGoogle Scholar
  19. Hennes, D., Claes, D., Meeussen, W., & Tuyls, K. (2012). Multi-robot collision avoidance with localization uncertainty. In Proceedings of international conference on autonomous agents and multi-agent systems (pp. 147–154).Google Scholar
  20. Henry, P., Vollmer, C., Ferris, B., & Fox, D. (2010). Learning to navigate through crowded environments. In Proceedings of ieee international conference on robotics and automation (pp. 981–986).Google Scholar
  21. Hettiarachchi, S. (2010). An evolutionary approach to swarm adaptation in dense environments. In IEEE Int’l conference on control automation and systems (pp. 962–966).Google Scholar
  22. Hopcroft, J. E., Schwartz, J. T., & Sharir, M. (1984). On the complexity of motion planning for multiple independent objects; pspace-hardness of the" warehouseman’s problem". The International Journal of Robotics Research, 3(4), 76–88.CrossRefGoogle Scholar
  23. Johansson, A., Helbing, D., & Shukla, P. K. (2007). Specification of the social force pedestrian model by evolutionary adjustment to video tracking data. Advances in Complex Systems, 10, 271–288.MathSciNetCrossRefMATHGoogle Scholar
  24. Karamouzas, I., Geraerts, R., & van der Stappen, A. F. (2013). Space-time group motion planning. In E. Frazzoli, T. Lozano-Perez, N. Roy, & D. Rus (Eds.), Algorithmic foundations of robotics X (pp. 227–243). Berlin: Springer.CrossRefGoogle Scholar
  25. Karamouzas, I., Heil, P., van Beek, P., & Overmars, M. (2009). A predictive collision avoidance model for pedestrian simulation. In Motion in games, LNCS, (vol. 5884, pp. 41–52). Springer.Google Scholar
  26. Karamouzas, I., & Overmars, M. (2012). Simulating and evaluating the local behavior of small pedestrian groups. IEEE Transactions on Visualization and Computer Graphics, 18(3), 394–406.CrossRefGoogle Scholar
  27. Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. International Journal of Robotics Research, 5(1), 90–98.CrossRefGoogle Scholar
  28. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., et al. (1983). Optimization by simmulated annealing. Science, 220(4598), 671–680.MathSciNetCrossRefMATHGoogle Scholar
  29. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.CrossRefGoogle Scholar
  30. Kornhauser, D. M., Miller, G. L., & Spirakis, P. G. (1984). Coordinating pebble motion on graphs, the diameter of permutation groups, and applications. Master’s thesis, M. I. T., Deptartment of Electrical Engineering and Computer Science.Google Scholar
  31. Macready, W. G., & Wolpert, D. H. (1998). Bandit problems and the exploration/exploitation tradeoff. IEEE Transactions on Evolutionary Computation, 2(1), 2–22.CrossRefGoogle Scholar
  32. Martinez-Gil, F., Lozano, M., & Fernández, F. (2012). Multi-agent reinforcement learning for simulating pedestrian navigation. In Adaptive and learning agents, (pp. 54–69). Springer.Google Scholar
  33. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092.CrossRefGoogle Scholar
  34. Ondřej, J., Pettré, J., Olivier, A. H., & Donikian, S. (2010). A synthetic-vision based steering approach for crowd simulation. ACM Transactions on Graphics, 29(4), 123.Google Scholar
  35. Pelechano, N., Allbeck, J., & Badler, N. (2007). Controlling individual agents in high-density crowd simulation. In Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 99–108).Google Scholar
  36. Pelechano, N., Allbeck, J. M., & Badler, N. I. (2008). Virtual crowds: Methods, simulation, and control. Synthesis lectures on computer graphics and animation (vol. 3, No. 1, pp. 1–176).Google Scholar
  37. Pettré, J., Ondrej, J., Olivier, A. H., Crétual, A., & Donikian, S. (2009). Experiment-based modeling, simulation and validation of interactions between virtual walkers. In ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 189–198).Google Scholar
  38. Popelová, M., Bída, M., Brom, C., Gemrot, J., & Tomek, J. (2011). When a couple goes together: Walk along steering. In Motion in games, LNCS (vol. 7060, pp. 278–289). Springer.Google Scholar
  39. Ratering, S., & Gini, M. (1995). Robot navigation in a known environment with unknown moving obstacles. Autonomous Robots, 1(2), 149–165.CrossRefGoogle Scholar
  40. Reynolds, C. (1999). Steering behaviors for autonomous characters. In Game developers conference (pp. 763–782).Google Scholar
  41. Reynolds, C. W. (1987). Flocks, herds, and schools: A distributed behavioral model. Computer Graphics, 21(4), 24–34.CrossRefGoogle Scholar
  42. Shao, W., & Terzopoulos, D. (2007). Autonomous pedestrians. Graphical Models, 69(5–6), 246–274.CrossRefGoogle Scholar
  43. Sieben, A., Schumann, J., & Seyfried, A. (2017). Collective phenomena in crowdswhere pedestrian dynamics need social psychology. PLoS ONE, 12(6), 1–9.CrossRefGoogle Scholar
  44. Solovey, K., Yu, J., Zamir, O., & Halperin, D. (2015). Motion planning for unlabeled discs with optimality guarantees. In Proceedings of Robotics: Science and Systems.
  45. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.Google Scholar
  46. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.Google Scholar
  47. Torrey, L. (2010). Crowd simulation via multi-agent reinforcement learning. In Proceedings of artificial intelligence and interactive digital entertainment (pp. 89–94).Google Scholar
  48. Tsai, J., Bowring, E., Marsella, S., & Tambe, M. (2013). Empirical evaluation of computational fear contagion models in crowd dispersions. Autonomous agents and multi-agent systems (pp. 1–18).Google Scholar
  49. Uther, W., & Veloso, M. (1997). Adversarial reinforcement learning. Technical report, Carnegie Mellon University.Google Scholar
  50. van den Berg, J., Lin, M., & Manocha, D. (2008). Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of IEEE international conference on robotics and automation (pp. 1928–1935).Google Scholar
  51. van den Berg, J., Guy, S.J., Lin, M., & Manocha, D. (2011). Reciprocal n-body collision avoidance. In Proceedings of international symposium of robotics research (pp. 3–19). Springer.Google Scholar
  52. van den Berg, J., Snape, J., Guy, S. J., & Manocha, D. (2011). Reciprocal collision avoidance with acceleration-velocity obstacles. In IEEE international conference on robotics and automation (pp. 3475–3482).Google Scholar
  53. Whiteson, S., Taylor, M. E., & Stone, P. (2007). Empirical studies in action selection with reinforcement learning. Adaptive Behavior, 15(1), 33–50.CrossRefGoogle Scholar
  54. Yu, J., & LaValle, S. M. (2013). Planning optimal paths for multiple robots on graphs. In Proceedings IEEE international conference on robotics and automation (pp. 3612–3617). IEEE.Google Scholar
  55. Zhang, C., & Lesser, V. (2012). Coordinated multi-agent learning for decentralized POMDPs. In 7th annual workshop on multiagent sequential decision-making under uncertainty (MSDM) at AAMAS (pp. 72–78).Google Scholar
  56. Zhang, C., & Lesser, V. (2013). Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of international conference on autonomous agents and multi-agent systems (pp. 1101–1108).Google Scholar
  57. Ziebart, B. D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (pp. 3931–3936).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversidad de ConcepcionConcepcionChile
  2. 2.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA
  3. 3.School of ComputingClemson UniversityClemsonUSA

Personalised recommendations