Advertisement

Cooperative Multi-agent Deep Reinforcement Learning in a 2 Versus 2 Free-Kick Task

  • Jim Martin Catacora OcanaEmail author
  • Francesco Riccio
  • Roberto Capobianco
  • Daniele Nardi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11531)

Abstract

In multi-robot reinforcement learning the goal is to enable a team of robots to learn a coordinated behavior from direct interaction with the environment. Here, we provide a comparison of the two main approaches to tackle this challenge, namely independent learners (IL) and joint-action learners (JAL). IL is suitable for highly scalable domains, but it faces non-stationarity issues. Whereas, JAL overcomes non-stationarity and can generate highly coordinated behaviors, but it presents scalability issues due to the increased size of the search space. We implement and evaluate these methods in a new multi-robot cooperative and adversarial soccer scenario, called 2 versus 2 free-kick task, where scalability issues affecting JAL are less relevant given the small number of learners. In this work, we implement and deploy these methodologies on a team of simulated NAO humanoid robots. We describe the implementation details of our scenario and show that both approaches are able to achieve satisfying solutions. Notably, we observe joint-action learners to have a better performance than independent learners in terms of success rate and quality of the learned policies. Finally, we discuss the results and provide conclusions based on our findings.

References

  1. 1.
    Hafner, R., Riedmiller, M.: Reinforcement learning in feedback control. Mach. Learn. 84(1–2), 137–169 (2011)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Haksar, R.N., Schwager, M.: Distributed deep reinforcement learning for fighting forest fires with a network of aerial robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pp. 1067–1074 (2018)Google Scholar
  3. 3.
    Hausknecht, M.J.: Cooperation and communication in multiagent deep reinforcement learning. Ph.D. thesis, University of Texas at Austin, USA (2016)Google Scholar
  4. 4.
    Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)zbMATHGoogle Scholar
  5. 5.
    Kalyanakrishnan, S., Stone, P.: Learning complementary multiagent behaviors: a case study. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS (LNAI), vol. 5949, pp. 153–165. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11876-0_14CrossRefGoogle Scholar
  6. 6.
    Knopp, M., Aykın, C., Feldmaier, J., Shen, H.: Formation control using GQ(\(\lambda \)) reinforcement learning. In: 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 1043–1048, August 2017Google Scholar
  7. 7.
    Kurek, M.: Deep reinforcement learning in keepaway soccer. Master’s thesis, Poznan University of Technology, Poland (2015)Google Scholar
  8. 8.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2015). http://arxiv.org/abs/1509.02971
  9. 9.
    Liu, Y., Nejat, G.: Multirobot cooperative learning for semiautonomous control in urban search and rescue applications. J. Field Rob. 33(4), 512–536 (2016)CrossRefGoogle Scholar
  10. 10.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  11. 11.
    Pham, H.X., La, H.M., Feil-Seifer, D., Nguyen, L.V.: Cooperative and distributed reinforcement learning of drones for field coverage. CoRR abs/1803.07250 (2018). http://arxiv.org/abs/1803.07250
  12. 12.
    RoboCup Technical Committee: Robocup standard platform league (nao) rule book. Rules2018.pdf, August 2018. https://spl.robocup.org/downloads/
  13. 13.
    Röfer, T., et al.: B-human: team report and code release 2017. Technical report, Deutsches Forschungszentrum für Künstliche Intelligenz, Universität Bremen (2017)Google Scholar
  14. 14.
    Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)Google Scholar
  16. 16.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jim Martin Catacora Ocana
    • 1
    Email author
  • Francesco Riccio
    • 1
  • Roberto Capobianco
    • 1
  • Daniele Nardi
    • 1
  1. 1.Department of Computer, Control and Management EngineeringSapienza University of RomeRomeItaly

Personalised recommendations