Multi-agent Reinforcement Learning: An Overview

Buşoniu, Lucian; Babuška, Robert; De Schutter, Bart

doi:10.1007/978-3-642-14435-6_7

Lucian Buşoniu⁴,
Robert Babuška⁴ &
Bart De Schutter⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 310))

8130 Accesses
224 Citations
6 Altmetric

Abstract

Multi-agent systems can be used to address problems in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks. The benefits and challenges of multi-agent reinforcement learning are described. A central challenge in the field is the formal statement of a multi-agent learning goal; this chapter reviews the learning goals proposed in the literature. The problem domains where multi-agent reinforcement learning techniques have been applied are briefly discussed. Several multi-agent reinforcement learning algorithms are applied to an illustrative example involving the coordinated transportation of an object by two cooperative robots. In an outlook for the multi-agent reinforcement learning field, a set of important open issues are identified, and promising research directions to address these issues are outlined.

Portions reprinted, with permission, from [20], ‘A Comprehensive Survey of Multiagent Reinforcement Learning’, by Lucian Buşoniu, Robert Babuška, and Bart De Schutter, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, vol. 38, no. 2, March 2008, pages 156–172. © 2008 IEEE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abul, O., Polat, F., Alhajj, R.: Multiagent reinforcement learning using function approximation. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 4(4), 485–497 (2000)
Article Google Scholar
Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)
MATH Google Scholar
Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Society for Industrial and Applied Mathematics, SIAM (1999)
Google Scholar
Bakker, B., Steingrover, M., Schouten, R., Nijhuis, E., Kester, L.: Cooperative multi-agent reinforcement learning of traffic lights. In: Workshop on Cooperative Multi-Agent Learning, 16th European Conference on Machine Learning (ECML-2005), Porto, Portugal (2005)
Google Scholar
Banerjee, B., Peng, J.: Adaptive policy gradient in multiagent learning. In: Proceedings 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2003), Melbourne, Australia, pp. 686–692 (2003)
Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13(5), 833–846 (1983)
Google Scholar
Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478–485 (2003)
Article Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. 2. Athena Scientific (2007)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Google Scholar
Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54(3), 207–213 (2005)
Article MATH MathSciNet Google Scholar
Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK-1996), pp. 195–210. De Zeeuwse Stromen, The Netherlands (1996)
Google Scholar
Bowling, M.: Convergence problems of general-sum multiagent reinforcement learning. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), Stanford University, US, pp. 89–94 (2000)
Google Scholar
Bowling, M.: Multiagent learning in the presence of agents with limitations. Ph.D. thesis, Computer Science Dept., Carnegie Mellon University, Pittsburgh, US (2003)
Google Scholar
Bowling, M.: Convergence and no-regret in multiagent learning. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 209–216. MIT Press, Cambridge (2005)
Google Scholar
Bowling, M., Veloso, M.: An analysis of stochastic game theory for multiagent reinforcement learning. Tech. rep., Computer Science Dept., Carnegie Mellon University, Pittsburgh, US (2000), http://www.cs.ualberta.ca/~bowling/papers/00tr.pdf
Bowling, M., Veloso, M.: Rational and convergent learning in stochastic games. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI-2001), San Francisco, US, pp. 1021–1026 (2001)
Google Scholar
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)
Article MATH MathSciNet Google Scholar
Boyan, J.A., Littman, M.L.: Packet routing in dynamically changing networks: A reinforcement learning approach. In: Moody, J. (ed.) Advances in Neural Information Processing Systems 6, pp. 671–678. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Brown, G.W.: Iterative solutions of games by fictitious play. In: Koopmans, T.C. (ed.) Activitiy Analysis of Production and Allocation, ch. XXIV, pp. 374–376. Wiley, Chichester (1951)
Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics. Part C: Applications and Reviews 38(2), 156–172 (2008)
Article Google Scholar
Buşoniu, L., De Schutter, B., Babuška, R.: Multiagent reinforcement learning with adaptive state focus. In: Proceedings 17th Belgian-Dutch Conference on Artificial Intelligence (BNAIC-2005), Brussels, Belgium, pp. 35–42 (2005)
Google Scholar
Buşoniu, L., De Schutter, B., Babuška, R.: Decentralized reinforcement learning control of a robotic manipulator. In: Proceedings 9th International Conference of Control, Automation, Robotics, and Vision (ICARCV-2006), Singapore, pp. 1347–1352 (2006)
Google Scholar
Buşoniu, L., De Schutter, B., Babuška, R.: Approximate dynamic programming and reinforcement learning. In: Babuška, R., Groen, F.C.A. (eds.) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol. 281, pp. 3–44. Springer, Heidelberg (2010)
Google Scholar
Buffet, O., Dutech, A., Charpillet, F.: Shaping multi-agent systems with gradient reinforcement learning. Autonomous Agents and Multi-Agent Systems 15(2), 197–220 (2007)
Article Google Scholar
Carmel, D., Markovitch, S.: Opponent modeling in multi-agent systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi-Agent Systems, ch. 3, pp. 40–52. Springer, Heidelberg (1996)
Google Scholar
Chalkiadakis, G.: Multiagent reinforcement learning: Stochastic games with multiple learning players. Tech. rep., Dept. of Computer Science, University of Toronto, Canada (2003), http://www.cs.toronto.edu/~gehalk/DepthReport/DepthReport.ps
Cherkassky, V., Mulier, F.: Learning from Data: Concepts, Theory, And Methods. Wiley, Chichester (1998)
MATH Google Scholar
Choi, S.P.M., Yeung, D.Y.: Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. In: Touretzky, D.S., Mozer, M., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 945–951. MIT Press, Cambridge (1996)
Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings 15th National Conference on Artificial Intelligence and 10th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-1998), Madison, US, pp. 746–752 (1998)
Google Scholar
Clouse, J.: Learning from an automated training agent. In: Working Notes Workshop on Agents that Learn from Other Agents, 12th International Conference on Machine Learning (ICML-1995), Tahoe City, US (1995)
Google Scholar
Conitzer, V., Sandholm, T.: AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 83–90 (2003)
Google Scholar
Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 1017–1023. MIT Press, Cambridge (1996)
Google Scholar
Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2–3), 235–262 (1998)
Article MATH Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
MathSciNet Google Scholar
Fernández, F., Parker, L.E.: Learning in large cooperative multi-robot systems. International Journal of Robotics and Automation, Special Issue on Computational Intelligence Techniques in Cooperative Robots 16(4), 217–226 (2001)
Google Scholar
Ficici, S.G., Pollack, J.B.: A game-theoretic approach to the simple coevolutionary algorithm. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 467–476. Springer, Heidelberg (2000)
Chapter Google Scholar
Fischer, F., Rovatsos, M., Weiss, G.: Hierarchical reinforcement learning in communication-mediated multiagent coordination. In: Proceedings 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004), New York, US, pp. 1334–1335 (2004)
Google Scholar
Fitch, R., Hengst, B., Suc, D., Calbert, G., Scholz, J.B.: Structural abstraction experiments in reinforcement learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 164–175. Springer, Heidelberg (2005)
Chapter Google Scholar
Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. MIT Press, Cambridge (1998)
MATH Google Scholar
Ghavamzadeh, M., Mahadevan, S., Makar, R.: Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems 13(2), 197–229 (2006)
Article Google Scholar
Glorennec, P.Y.: Reinforcement learning: An overview. In: Proceedings European Symposium on Intelligent Techniques (ESIT-2000), Aachen, Germany, pp. 17–35 (2000)
Google Scholar
Greenwald, A., Hall, K.: Correlated-Q learning. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 242–249 (2003)
Google Scholar
Guestrin, C., Lagoudakis, M.G., Parr, R.: Coordinated reinforcement learning. In: Proceedings 19th International Conference on Machine Learning (ICML-2002), Sydney, Australia, pp. 227–234 (2002)
Google Scholar
Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: Proceedings 19th National Conference on Artificial Intelligence (AAAI-2004), San Jose, US, pp. 709–715 (2004)
Google Scholar
Haynes, T., Wainwright, R., Sen, S., Schoenefeld, D.: Strongly typed genetic programming in evolving cooperation strategies. In: Proceedings 6th International Conference on Genetic Algorithms (ICGA-1995), Pittsburgh, US, pp. 271–278 (1995)
Google Scholar
Ho, F., Kamel, M.: Learning coordination strategies for cooperative multiagent systems. Machine Learning 33(2–3), 155–177 (1998)
Article MATH Google Scholar
Horiuchi, T., Fujino, A., Katai, O., Sawaragi, T.: Fuzzy interpolation-based Q-learning with continuous states and actions. In: Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE-1996), New Orleans, US, pp. 594–600 (1996)
Google Scholar
Hsu, W.T., Soo, V.W.: Market performance of adaptive trading agents in synchronous double auctions. In: Yuan, S.-T., Yokoo, M. (eds.) PRIMA 2001. LNCS (LNAI), vol. 2132, pp. 108–121. Springer, Heidelberg (2001)
Chapter Google Scholar
Hu, J., Wellman, M.P.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings 15th International Conference on Machine Learning (ICML-1998), Madison, US, pp. 242–250 (1998)
Google Scholar
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)
Article MathSciNet Google Scholar
Ishii, S., Fujita, H., Mitsutake, M., Yamazaki, T., Matsuda, J., Matsuno, Y.: A reinforcement learning scheme for a partially-observable multi-agent game. Machine Learning 59(1–2), 31–54 (2005)
Article MATH Google Scholar
Ishiwaka, Y., Sato, T., Kakazu, Y.: An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robotics and Autonomous Systems 43(4), 245–256 (2003)
Article Google Scholar
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185–1201 (1994)
Article MATH Google Scholar
Jafari, A., Greenwald, A.R., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and Nash equilibrium. In: Proceedings 18th International Conference on Machine Learning (ICML-2001), pp. 226–233. Williams College, Williamstown, US (2001)
Google Scholar
Jong, K.D.: Evolutionary Computation: A Unified Approach. MIT Press, Cambridge (2005)
Google Scholar
Jouffe, L.: Fuzzy inference system learning by reinforcement methods. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 28(3), 338–355 (1998)
Article Google Scholar
Jung, T., Polani, D.: Kernelizing LSPE(λ). In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-2007), Honolulu, US, pp. 338–345 (2007)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: Proceedings 18th National Conference on Artificial Intelligence and 14th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-2002), Menlo Park, US, pp. 326–331 (2002)
Google Scholar
Kok, J.R., ’t Hoen, P.J., Bakker, B., Vlassis, N.: Utile coordination: Learning interdependencies among cooperative agents. In: Proceedings IEEE Symposium on Computational Intelligence and Games (CIG 2005), Colchester, United Kingdom, pp. 29–36 (2005)
Google Scholar
Kok, J.R., Spaan, M.T.J., Vlassis, N.: Non-communicative multi-robot coordination in dynamic environment. Robotics and Autonomous Systems 50(2–3), 99–114 (2005)
Article Google Scholar
Kok, J.R., Vlassis, N.: Sparse cooperative Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML-2004), Banff, Canada, pp. 481–488 (2004)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
Article MATH MathSciNet Google Scholar
Könönen, V.: Asymmetric multiagent reinforcement learning. In: Proceedings IEEE/WIC International Conference on Intelligent Agent Technology (IAT-2003), Halifax, Canada, pp. 336–342 (2003)
Google Scholar
Könönen, V.: Gradient based method for symmetric and asymmetric multiagent reinforcement learning. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 68–75. Springer, Heidelberg (2003)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Article MathSciNet Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), Stanford University, US, pp. 535–542 (2000)
Google Scholar
Lee, J.-W., Jang Min, O.: A multi-agent Q-learning framework for optimizing stock trading systems. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 153–162. Springer, Heidelberg (2002)
Chapter Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings 11th International Conference on Machine Learning (ICML-1994), New Brunswick, US, pp. 157–163 (1994)
Google Scholar
Littman, M.L.: Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2(1), 55–66 (2001)
Article Google Scholar
Littman, M.L., Stone, P.: Implicit negotiation in repeated games. In: Meyer, J.-J.C., Tambe, M. (eds.) ATAL 2001. LNCS (LNAI), vol. 2333, pp. 96–105. Springer, Heidelberg (2002)
Chapter Google Scholar
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1), 162–175 (1991)
Article MATH MathSciNet Google Scholar
Matarić, M.J.: Reward functions for accelerated learning. In: Proceedings 11th International Conference on Machine Learning (ICML-1994), New Brunswick, US, pp. 181–189 (1994)
Google Scholar
Matarić, M.J.: Learning in multi-robot systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi–Agent Systems, ch. 10, pp. 152–163. Springer, Heidelberg (1996)
Google Scholar
Matarić, M.J.: Reinforcement learning in the multi-robot domain. Autonomous Robots 4(1), 73–83 (1997)
Article Google Scholar
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proceedings 25th International Conference on Machine Learning (ICML-2008), Helsinki, Finland, pp. 664–671 (2008)
Google Scholar
Merke, A., Riedmiller, M.A.: Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer. In: Birk, A., Coradeschi, S., Tadokoro, S. (eds.) RoboCup 2001. LNCS (LNAI), vol. 2377, pp. 435–440. Springer, Heidelberg (2002)
Chapter Google Scholar
Miconi, T.: When evolving populations is better than coevolving individuals: The blind mice problem. In: Proceedings 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 647–652 (2003)
Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Google Scholar
Munos, R., Szepesvári, C.: Finite time bounds for fitted value iteration. Journal of Machine Learning Research 9, 815–857 (2008)
Google Scholar
Nagendra Prasad, M.V., Lesser, V.R., Lander, S.E.: Learning organizational roles for negotiated search in a multiagent system. International Journal of Human-Computer Studies 48(1), 51–67 (1998)
Article Google Scholar
Nash, S., Sofer, A.: Linear and Nonlinear Programming. McGraw-Hill, New York (1996)
Google Scholar
Nedić, A., Bertsekas, D.P.: Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications 13(1–2), 79–110 (2003)
MATH MathSciNet Google Scholar
Negenborn, R.R., De Schutter, B., Hellendoorn, H.: Multi-agent model predictive control for transportation networks: Serial versus parallel schemes. Engineering Applications of Artificial Intelligence 21(3), 353–366 (2008)
Article Google Scholar
Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2–3), 161–178 (2002)
Article MATH Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005)
Article Google Scholar
Panait, L., Wiegand, R.P., Luke, S.: Improving coevolutionary search for optimal multiagent behaviors. In: Proceedings 18th International Joint Conference on Artificial Intelligence (IJCAI-2003), Acapulco, Mexico, pp. 653–660 (2003)
Google Scholar
Parunak, H.V.D.: Industrial and practical applications of DAI. In: Weiss, G. (ed.) Multi–Agent Systems: A Modern Approach to Distributed Artificial Intelligence, ch. 9, pp. 377–412. MIT Press, Cambridge (1999)
Google Scholar
Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22(1–3), 283–290 (1996)
Google Scholar
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)
Article Google Scholar
Potter, M.A., Jong, K.A.D.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249–257. Springer, Heidelberg (1994)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, Chichester (2007)
Book MATH Google Scholar
Powers, R., Shoham, Y.: New criteria and a new algorithm for learning in multi-agent systems. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 1089–1096. MIT Press, Cambridge (2005)
Google Scholar
Price, B., Boutilier, C.: Implicit imitation in multiagent reinforcement learning. In: Proceedings 16th International Conference on Machine Learning (ICML-1999), Bled, Slovenia, pp. 325–334 (1999)
Google Scholar
Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research 19, 569–629 (2003)
MATH Google Scholar
Puterman, M.L.: Markov Decision Processes—Discrete Stochastic Dynamic Programming. Wiley, Chichester (1994)
MATH Google Scholar
Pynadath, D.V., Tambe, M.: The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16, 389–423 (2002)
MATH MathSciNet Google Scholar
Raju, C., Narahari, Y., Ravikumar, K.: Reinforcement learning applications in dynamic pricing of retail markets. In: Proceedings 2003 IEEE International Conference on E-Commerce (CEC-2003), Newport Beach, US, pp. 339–346 (2003)
Google Scholar
Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Chapter Google Scholar
Riedmiller, M.A., Moore, A.W., Schneider, J.G.: Reinforcement learning for cooperating and communicating reactive agents in electrical power grids. In: Hannebauer, M., Wendler, J., Pagello, E. (eds.) Balancing Reactivity and Social Deliberation in Multi-Agent Systems, pp. 137–149. Springer, Heidelberg (2000)
Google Scholar
Salustowicz, R., Wiering, M., Schmidhuber, J.: Learning team strategies: Soccer case studies. Machine Learning 33(2–3), 263–282 (1998)
Article MATH Google Scholar
Schaerf, A., Shoham, Y., Tennenholtz, M.: Adaptive load balancing: A study in multi-agent learning. Journal of Artificial Intelligence Research 2, 475–500 (1995)
MATH Google Scholar
Schmidhuber, J.: A general method for incremental self-improvement and multi-agent learning. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications, ch. 3, pp. 81–123. World Scientific, Singapore (1999)
Google Scholar
Sejnowski, T.J., Hinton, G.E. (eds.): Unsupervised Learning: Foundations of Neural Computation. MIT Press, Cambridge (1999)
Google Scholar
Sen, S., Sekaran, M., Hale, J.: Learning to coordinate without sharing information. In: Proceedings 12th National Conference on Artificial Intelligence (AAAI-1994), Seattle, US, pp. 426–431 (1994)
Google Scholar
Sen, S., Weiss, G.: Learning in multiagent systems. In: Weiss, G. (ed.) Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, ch. 6, pp. 259–298. MIT Press, Cambridge (1999)
Google Scholar
Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game Theoretic and Logical Foundations. Cambridge University Press, Cambridge (2008)
Google Scholar
Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171(7), 365–377 (2007)
Article MATH MathSciNet Google Scholar
Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. In: Proceedings 16th Conference on Uncertainty in Artificial Intelligence (UAI 2000), San Francisco, US, pp. 541–548 (2000)
Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 361–368. MIT Press, Cambridge (1995)
Google Scholar
Smith, J.M.: Evolution and the Theory of Games. Cambridge University Press, Cambridge (1982)
MATH Google Scholar
Spaan, M.T.J., Vlassis, N., Groen, F.C.A.: High level coordination of agents based on multiagent Markov decision processes with roles. In: Workshop on Cooperative Robotics, 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2002), Lausanne, Switzerland, pp. 66–73 (2002)
Google Scholar
Stephan, V., Debes, K., Gross, H.M., Wintrich, F., Wintrich, H.: A reinforcement learning based neural multi-agent-system for control of a combustion process. In: Proceedings IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN-2000), Como, Italy, pp. 6217–6222 (2000)
Google Scholar
Stone, P., Veloso, M.: Team-partitioned, opaque-transition reinforcement learning. In: Proceedings 3rd International Conference on Autonomous Agents (Agents-1999), Seattle, US, pp. 206–212 (1999)
Google Scholar
Stone, P., Veloso, M.: Multiagent systems: A survey from the machine learning perspective. Autonomous Robots 8(3), 345–383 (2000)
Article Google Scholar
Suematsu, N., Hayashi, A.: A multiagent reinforcement learning algorithm using extended optimal response. In: Proceedings 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2002), Bologna, Italy, pp. 370–377 (2002)
Google Scholar
Sueyoshi, T., Tadiparthi, G.R.: An agent-based decision support system for wholesale electricity markets. Decision Support Systems 44, 425–446 (2008)
Article Google Scholar
Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML-1990), Austin, US, pp. 216–224 (1990)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Szepesvári, C., Smart, W.D.: Interpolation-based Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML-2004), Bannf, Canada, pp. 791–798 (2004)
Google Scholar
Tamakoshi, H., Ishii, S.: Multiagent reinforcement learning applied to a chase problem in a continuous world. Artificial Life and Robotics 5(4), 202–206 (2001)
Article Google Scholar
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings 10th International Conference on Machine Learning (ICML 1993), Amherst, US, pp. 330–337 (1993)
Google Scholar
Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
Google Scholar
Tesauro, G., Kephart, J.O.: Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems 5(3), 289–304 (2002)
Article Google Scholar
Tillotson, P., Wu, Q., Hughes, P.: Multi-agent learning for routing control within an Internet environment. Engineering Applications of Artificial Intelligence 17(2), 179–185 (2004)
Article Google Scholar
Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3–4), 251–281 (1997)
Article Google Scholar
Touzet, C.F.: Robot awareness in cooperative mobile robot learning. Autonomous Robots 8(1), 87–97 (2000)
Article Google Scholar
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(1), 185–202 (1994)
MATH Google Scholar
Tuyls, K., ’t Hoen, P.J., Vanschoenwinkel, B.: An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-Agent Systems 12(1), 115–153 (2006)
Google Scholar
Tuyls, K., Maes, S., Manderick, B.: Q-learning in simulated robotic soccer – large state spaces and incomplete information. In: Proceedings 2002 International Conference on Machine Learning and Applications (ICMLA-2002), Las Vegas, US, pp. 226–232 (2002)
Google Scholar
Tuyls, K., Nowé, A.: Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review 20(1), 63–90 (2005)
Article Google Scholar
Uther, W.T., Veloso, M.: Adversarial reinforcement learning. Tech. rep., School of Computer Science, Carnegie Mellon University, Pittsburgh, US (1997), http://www.cs.cmu.edu/afs/cs/user/will/www/papers/Uther97a.ps
Vidal, J.M.: Learning in multiagent systems: An introduction from a game-theoretic perspective. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2000 and AAMAS 2002. LNCS (LNAI), vol. 2636, pp. 202–215. Springer, Heidelberg (2003)
Chapter Google Scholar
Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures in Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2007)
Google Scholar
Wang, X., Sandholm, T.: Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1571–1578. MIT Press, Cambridge (2003)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar
Weinberg, M., Rosenschein, J.S.: Best-response multiagent learning in non-stationary environments. In: Proceedings 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004), New York, US, pp. 506–513 (2004)
Google Scholar
Weiss, G. (ed.): Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge (1999)
Google Scholar
Wellman, M.P., Greenwald, A.R., Stone, P., Wurman, P.R.: The 2001 Trading Agent Competition. Electronic Markets 13(1) (2003)
Google Scholar
Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), pp. 1151–1158. Stanford University, US (2000)
Google Scholar
Wiering, M., Salustowicz, R., Schmidhuber, J.: Reinforcement learning soccer teams with incomplete world models. Autonomous Robots 7(1), 77–88 (1999)
Article Google Scholar
Zapechelnyuk, A.: Limit behavior of no-regret dynamics. Discussion Papers 21, Kyiv School of Economics, Kyiv, Ucraine (2009)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 928–936 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Systems and Control, Delft University of Technology, The Netherlands
Lucian Buşoniu & Robert Babuška
Center for Systems and Control & Marine and Transport Technology Department, Delft University of Technology, The Netherlands
Bart De Schutter

Authors

Lucian Buşoniu
View author publications
You can also search for this author in PubMed Google Scholar
Robert Babuška
View author publications
You can also search for this author in PubMed Google Scholar
Bart De Schutter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical & Computer Engineering, National University of Singapore, 10 Kent Ridge Crescent, 119260, Singapore
Dipti Srinivasan
School of Electrical and Information Engineering, University of South Australia, Adelaide Mawson Lakes Campus, South Australia, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Buşoniu, L., Babuška, R., De Schutter, B. (2010). Multi-agent Reinforcement Learning: An Overview. In: Srinivasan, D., Jain, L.C. (eds) Innovations in Multi-Agent Systems and Applications - 1. Studies in Computational Intelligence, vol 310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14435-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-14435-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14434-9
Online ISBN: 978-3-642-14435-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics