Cooperative Multi-agent Deep Reinforcement Learning in a 2 Versus 2 Free-Kick Task
Abstract
In multi-robot reinforcement learning the goal is to enable a team of robots to learn a coordinated behavior from direct interaction with the environment. Here, we provide a comparison of the two main approaches to tackle this challenge, namely independent learners (IL) and joint-action learners (JAL). IL is suitable for highly scalable domains, but it faces non-stationarity issues. Whereas, JAL overcomes non-stationarity and can generate highly coordinated behaviors, but it presents scalability issues due to the increased size of the search space. We implement and evaluate these methods in a new multi-robot cooperative and adversarial soccer scenario, called 2 versus 2 free-kick task, where scalability issues affecting JAL are less relevant given the small number of learners. In this work, we implement and deploy these methodologies on a team of simulated NAO humanoid robots. We describe the implementation details of our scenario and show that both approaches are able to achieve satisfying solutions. Notably, we observe joint-action learners to have a better performance than independent learners in terms of success rate and quality of the learned policies. Finally, we discuss the results and provide conclusions based on our findings.
References
- 1.Hafner, R., Riedmiller, M.: Reinforcement learning in feedback control. Mach. Learn. 84(1–2), 137–169 (2011)MathSciNetCrossRefGoogle Scholar
- 2.Haksar, R.N., Schwager, M.: Distributed deep reinforcement learning for fighting forest fires with a network of aerial robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pp. 1067–1074 (2018)Google Scholar
- 3.Hausknecht, M.J.: Cooperation and communication in multiagent deep reinforcement learning. Ph.D. thesis, University of Texas at Austin, USA (2016)Google Scholar
- 4.Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)zbMATHGoogle Scholar
- 5.Kalyanakrishnan, S., Stone, P.: Learning complementary multiagent behaviors: a case study. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS (LNAI), vol. 5949, pp. 153–165. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11876-0_14CrossRefGoogle Scholar
- 6.Knopp, M., Aykın, C., Feldmaier, J., Shen, H.: Formation control using GQ(\(\lambda \)) reinforcement learning. In: 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 1043–1048, August 2017Google Scholar
- 7.Kurek, M.: Deep reinforcement learning in keepaway soccer. Master’s thesis, Poznan University of Technology, Poland (2015)Google Scholar
- 8.Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2015). http://arxiv.org/abs/1509.02971
- 9.Liu, Y., Nejat, G.: Multirobot cooperative learning for semiautonomous control in urban search and rescue applications. J. Field Rob. 33(4), 512–536 (2016)CrossRefGoogle Scholar
- 10.Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
- 11.Pham, H.X., La, H.M., Feil-Seifer, D., Nguyen, L.V.: Cooperative and distributed reinforcement learning of drones for field coverage. CoRR abs/1803.07250 (2018). http://arxiv.org/abs/1803.07250
- 12.RoboCup Technical Committee: Robocup standard platform league (nao) rule book. Rules2018.pdf, August 2018. https://spl.robocup.org/downloads/
- 13.Röfer, T., et al.: B-human: team report and code release 2017. Technical report, Deutsches Forschungszentrum für Künstliche Intelligenz, Universität Bremen (2017)Google Scholar
- 14.Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)MathSciNetCrossRefGoogle Scholar
- 15.Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)Google Scholar
- 16.Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)zbMATHGoogle Scholar