ALAN: adaptive learning for multi-agent navigation


In multi-agent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and obstacles, often without communication. Existing methods compute motions that are locally optimal but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop a method that allows agents to dynamically adapt their behavior to their local conditions. We formulate the multi-agent navigation problem as an action-selection problem and propose an approach, ALAN, that allows agents to compute time-efficient and collision-free motions. ALAN is highly scalable because each agent makes its own decisions on how to move, using a set of velocities optimized for a variety of navigation tasks. Experimental results show that agents using ALAN, in general, reach their destinations faster than using ORCA, a state-of-the-art collision avoidance framework, and two other navigation models.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19


  1. 1.

    Videos highlighting our work can be found in


  1. Alonso-Mora, J., Breitenmoser, A., Rufli, M., Beardsley, P., & Siegwart, R. (2013). Optimal reciprocal collision avoidance for multiple non-holonomic robots. In A. Martinoli, F. Mondada, N. Correll, G. Mermoud, M. Egerstedt, Hsieh M. Ani, et al. (Eds.), Distributed autonomous robotic systems (pp. 203–216). Berlin: Springer.

    Google Scholar 

  2. Audibert, J. Y., Munos, R., & Szepesvári, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902.

    MathSciNet  Article  Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.

    Article  Google Scholar 

  4. Bayazit, O., Lien, J. M., & Amato, N. (2003). Better group behaviors in complex environments using global roadmaps. In 8th international conference on artificial life (pp. 362–370).

  5. Buşoniu, L., Babuška, R., & De Schutter, B. (2008). A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C Applications and Reviews, 38(2), 156–172.

    Article  Google Scholar 

  6. Cunningham, B., & Cao, Y. (2012). Levels of realism for cooperative multi-agent reinforcement learning. In Advances in swarm intelligence (pp. 573–582). Springer.

  7. Fiorini, P., & Shiller, Z. (1998). Motion planning in dynamic environments using velocity obstacles. The International Journal of Robotics Research, 17, 760–772.

    Article  Google Scholar 

  8. Funge, J., Tu, X., & Terzopoulos, D. (1999). Cognitive modeling: knowledge, reasoning and planning for intelligent characters. In 26th annual conference on computer graphics and interactive techniques (pp. 29–38).

  9. Giese, A., Latypov, D., & Amato, N. M. (2014). Reciprocally-rotating velocity obstacles. In IEEE international conference on robotics and automation (pp. 3234–3241).

  10. Godoy, J., Karamouzas, I., Guy, S. J., & Gini, M. (2015). Adaptive learning for multi-agent navigation. In Proceedings of international conference on autonomous agents and multi-agent systems (pp. 1577–1585).

  11. Guy, S., Chhugani, J., Kim, C., Satish, N., Lin, M., Manocha, D., & Dubey, P. (2009). Clearpath: Highly parallel collision avoidance for multi-agent simulation. In ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 177–187).

  12. Guy, S., Kim, S., Lin, M., & Manocha, D. (2011). Simulating heterogeneous crowd behaviors using personality trait theory. In Proceedings ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 43–52).

  13. Guy, S.J., Chhugani, J., Curtis, S., Pradeep, D., Lin, M., & Manocha, D. (2010). PLEdestrians: A least-effort approach to crowd simulation. In ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 119–128).

  14. Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1), 97–109.

    MathSciNet  Article  Google Scholar 

  15. Helbing, D., Buzna, L., & Werner, T. (2003). Self-organized pedestrian crowd dynamics and design solutions. Traffic Forum 12.

  16. Helbing, D., Farkas, I., & Vicsek, T. (2000). Simulating dynamical features of escape panic. Nature, 407(6803), 487–490.

    Article  Google Scholar 

  17. Helbing, D., & Molnar, P. (1995). Social force model for pedestrian dynamics. Physical Review E, 51(5), 4282.

    Article  Google Scholar 

  18. Helbing, D., Molnar, P., Farkas, I. J., & Bolay, K. (2001). Self-organizing pedestrian movement. Environment and Planning B: Planning and Design, 28(3), 361–384.

    Article  Google Scholar 

  19. Hennes, D., Claes, D., Meeussen, W., & Tuyls, K. (2012). Multi-robot collision avoidance with localization uncertainty. In Proceedings of international conference on autonomous agents and multi-agent systems (pp. 147–154).

  20. Henry, P., Vollmer, C., Ferris, B., & Fox, D. (2010). Learning to navigate through crowded environments. In Proceedings of ieee international conference on robotics and automation (pp. 981–986).

  21. Hettiarachchi, S. (2010). An evolutionary approach to swarm adaptation in dense environments. In IEEE Int’l conference on control automation and systems (pp. 962–966).

  22. Hopcroft, J. E., Schwartz, J. T., & Sharir, M. (1984). On the complexity of motion planning for multiple independent objects; pspace-hardness of the" warehouseman’s problem". The International Journal of Robotics Research, 3(4), 76–88.

    Article  Google Scholar 

  23. Johansson, A., Helbing, D., & Shukla, P. K. (2007). Specification of the social force pedestrian model by evolutionary adjustment to video tracking data. Advances in Complex Systems, 10, 271–288.

    MathSciNet  Article  Google Scholar 

  24. Karamouzas, I., Geraerts, R., & van der Stappen, A. F. (2013). Space-time group motion planning. In E. Frazzoli, T. Lozano-Perez, N. Roy, & D. Rus (Eds.), Algorithmic foundations of robotics X (pp. 227–243). Berlin: Springer.

    Google Scholar 

  25. Karamouzas, I., Heil, P., van Beek, P., & Overmars, M. (2009). A predictive collision avoidance model for pedestrian simulation. In Motion in games, LNCS, (vol. 5884, pp. 41–52). Springer.

  26. Karamouzas, I., & Overmars, M. (2012). Simulating and evaluating the local behavior of small pedestrian groups. IEEE Transactions on Visualization and Computer Graphics, 18(3), 394–406.

    Article  Google Scholar 

  27. Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. International Journal of Robotics Research, 5(1), 90–98.

    Article  Google Scholar 

  28. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., et al. (1983). Optimization by simmulated annealing. Science, 220(4598), 671–680.

    MathSciNet  Article  Google Scholar 

  29. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.

    Article  Google Scholar 

  30. Kornhauser, D. M., Miller, G. L., & Spirakis, P. G. (1984). Coordinating pebble motion on graphs, the diameter of permutation groups, and applications. Master’s thesis, M. I. T., Deptartment of Electrical Engineering and Computer Science.

  31. Macready, W. G., & Wolpert, D. H. (1998). Bandit problems and the exploration/exploitation tradeoff. IEEE Transactions on Evolutionary Computation, 2(1), 2–22.

    Article  Google Scholar 

  32. Martinez-Gil, F., Lozano, M., & Fernández, F. (2012). Multi-agent reinforcement learning for simulating pedestrian navigation. In Adaptive and learning agents, (pp. 54–69). Springer.

  33. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092.

    Article  Google Scholar 

  34. Ondřej, J., Pettré, J., Olivier, A. H., & Donikian, S. (2010). A synthetic-vision based steering approach for crowd simulation. ACM Transactions on Graphics, 29(4), 123.

    Article  Google Scholar 

  35. Pelechano, N., Allbeck, J., & Badler, N. (2007). Controlling individual agents in high-density crowd simulation. In Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 99–108).

  36. Pelechano, N., Allbeck, J. M., & Badler, N. I. (2008). Virtual crowds: Methods, simulation, and control. Synthesis lectures on computer graphics and animation (vol. 3, No. 1, pp. 1–176).

    Article  Google Scholar 

  37. Pettré, J., Ondrej, J., Olivier, A. H., Crétual, A., & Donikian, S. (2009). Experiment-based modeling, simulation and validation of interactions between virtual walkers. In ACM SIGGRAPH/Eurographics symposium on computer animation (pp. 189–198).

  38. Popelová, M., Bída, M., Brom, C., Gemrot, J., & Tomek, J. (2011). When a couple goes together: Walk along steering. In Motion in games, LNCS (vol. 7060, pp. 278–289). Springer.

  39. Ratering, S., & Gini, M. (1995). Robot navigation in a known environment with unknown moving obstacles. Autonomous Robots, 1(2), 149–165.

    Article  Google Scholar 

  40. Reynolds, C. (1999). Steering behaviors for autonomous characters. In Game developers conference (pp. 763–782).

  41. Reynolds, C. W. (1987). Flocks, herds, and schools: A distributed behavioral model. Computer Graphics, 21(4), 24–34.

    Article  Google Scholar 

  42. Shao, W., & Terzopoulos, D. (2007). Autonomous pedestrians. Graphical Models, 69(5–6), 246–274.

    Article  Google Scholar 

  43. Sieben, A., Schumann, J., & Seyfried, A. (2017). Collective phenomena in crowdswhere pedestrian dynamics need social psychology. PLoS ONE, 12(6), 1–9.

    Article  Google Scholar 

  44. Solovey, K., Yu, J., Zamir, O., & Halperin, D. (2015). Motion planning for unlabeled discs with optimality guarantees. In Proceedings of Robotics: Science and Systems.

  45. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.

    Google Scholar 

  46. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

    Google Scholar 

  47. Torrey, L. (2010). Crowd simulation via multi-agent reinforcement learning. In Proceedings of artificial intelligence and interactive digital entertainment (pp. 89–94).

  48. Tsai, J., Bowring, E., Marsella, S., & Tambe, M. (2013). Empirical evaluation of computational fear contagion models in crowd dispersions. Autonomous agents and multi-agent systems (pp. 1–18).

    Article  Google Scholar 

  49. Uther, W., & Veloso, M. (1997). Adversarial reinforcement learning. Technical report, Carnegie Mellon University.

  50. van den Berg, J., Lin, M., & Manocha, D. (2008). Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of IEEE international conference on robotics and automation (pp. 1928–1935).

  51. van den Berg, J., Guy, S.J., Lin, M., & Manocha, D. (2011). Reciprocal n-body collision avoidance. In Proceedings of international symposium of robotics research (pp. 3–19). Springer.

  52. van den Berg, J., Snape, J., Guy, S. J., & Manocha, D. (2011). Reciprocal collision avoidance with acceleration-velocity obstacles. In IEEE international conference on robotics and automation (pp. 3475–3482).

  53. Whiteson, S., Taylor, M. E., & Stone, P. (2007). Empirical studies in action selection with reinforcement learning. Adaptive Behavior, 15(1), 33–50.

    Article  Google Scholar 

  54. Yu, J., & LaValle, S. M. (2013). Planning optimal paths for multiple robots on graphs. In Proceedings IEEE international conference on robotics and automation (pp. 3612–3617). IEEE.

  55. Zhang, C., & Lesser, V. (2012). Coordinated multi-agent learning for decentralized POMDPs. In 7th annual workshop on multiagent sequential decision-making under uncertainty (MSDM) at AAMAS (pp. 72–78).

  56. Zhang, C., & Lesser, V. (2013). Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of international conference on autonomous agents and multi-agent systems (pp. 1101–1108).

  57. Ziebart, B. D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (pp. 3931–3936).

Download references


This work was partially funded by the University of Minnesota Informatics Institute, the CONICYT PFCHA/DOCTORADO BECAS CHILE/2009 - 72100243 and the NSF through grants #CHS-1526693, #CNS-1544887, #IIS-1748541 and #IIP-1439728.

Author information



Corresponding author

Correspondence to Julio Godoy.

Additional information

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Distributed Robotics: From Fundamentals to Applications”.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Godoy, J., Chen, T., Guy, S.J. et al. ALAN: adaptive learning for multi-agent navigation. Auton Robot 42, 1543–1562 (2018).

Download citation


  • Multi-agent navigation
  • Online learning
  • Action selection
  • Multi-agent coordination