Abstract
Multi-agent systems can be used to address problems in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks. The benefits and challenges of multi-agent reinforcement learning are described. A central challenge in the field is the formal statement of a multi-agent learning goal; this chapter reviews the learning goals proposed in the literature. The problem domains where multi-agent reinforcement learning techniques have been applied are briefly discussed. Several multi-agent reinforcement learning algorithms are applied to an illustrative example involving the coordinated transportation of an object by two cooperative robots. In an outlook for the multi-agent reinforcement learning field, a set of important open issues are identified, and promising research directions to address these issues are outlined.
Portions reprinted, with permission, from [20], ‘A Comprehensive Survey of Multiagent Reinforcement Learning’, by Lucian Buşoniu, Robert Babuška, and Bart De Schutter, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, vol. 38, no. 2, March 2008, pages 156–172. © 2008 IEEE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abul, O., Polat, F., Alhajj, R.: Multiagent reinforcement learning using function approximation. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 4(4), 485–497 (2000)
Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)
Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Society for Industrial and Applied Mathematics, SIAM (1999)
Bakker, B., Steingrover, M., Schouten, R., Nijhuis, E., Kester, L.: Cooperative multi-agent reinforcement learning of traffic lights. In: Workshop on Cooperative Multi-Agent Learning, 16th European Conference on Machine Learning (ECML-2005), Porto, Portugal (2005)
Banerjee, B., Peng, J.: Adaptive policy gradient in multiagent learning. In: Proceedings 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2003), Melbourne, Australia, pp. 686–692 (2003)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13(5), 833–846 (1983)
Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478–485 (2003)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. 2. Athena Scientific (2007)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54(3), 207–213 (2005)
Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK-1996), pp. 195–210. De Zeeuwse Stromen, The Netherlands (1996)
Bowling, M.: Convergence problems of general-sum multiagent reinforcement learning. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), Stanford University, US, pp. 89–94 (2000)
Bowling, M.: Multiagent learning in the presence of agents with limitations. Ph.D. thesis, Computer Science Dept., Carnegie Mellon University, Pittsburgh, US (2003)
Bowling, M.: Convergence and no-regret in multiagent learning. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 209–216. MIT Press, Cambridge (2005)
Bowling, M., Veloso, M.: An analysis of stochastic game theory for multiagent reinforcement learning. Tech. rep., Computer Science Dept., Carnegie Mellon University, Pittsburgh, US (2000), http://www.cs.ualberta.ca/~bowling/papers/00tr.pdf
Bowling, M., Veloso, M.: Rational and convergent learning in stochastic games. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI-2001), San Francisco, US, pp. 1021–1026 (2001)
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)
Boyan, J.A., Littman, M.L.: Packet routing in dynamically changing networks: A reinforcement learning approach. In: Moody, J. (ed.) Advances in Neural Information Processing Systems 6, pp. 671–678. Morgan Kaufmann, San Francisco (1994)
Brown, G.W.: Iterative solutions of games by fictitious play. In: Koopmans, T.C. (ed.) Activitiy Analysis of Production and Allocation, ch. XXIV, pp. 374–376. Wiley, Chichester (1951)
Buşoniu, L., Babuška, R., De Schutter, B.: A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics. Part C: Applications and Reviews 38(2), 156–172 (2008)
Buşoniu, L., De Schutter, B., Babuška, R.: Multiagent reinforcement learning with adaptive state focus. In: Proceedings 17th Belgian-Dutch Conference on Artificial Intelligence (BNAIC-2005), Brussels, Belgium, pp. 35–42 (2005)
Buşoniu, L., De Schutter, B., Babuška, R.: Decentralized reinforcement learning control of a robotic manipulator. In: Proceedings 9th International Conference of Control, Automation, Robotics, and Vision (ICARCV-2006), Singapore, pp. 1347–1352 (2006)
Buşoniu, L., De Schutter, B., Babuška, R.: Approximate dynamic programming and reinforcement learning. In: Babuška, R., Groen, F.C.A. (eds.) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol. 281, pp. 3–44. Springer, Heidelberg (2010)
Buffet, O., Dutech, A., Charpillet, F.: Shaping multi-agent systems with gradient reinforcement learning. Autonomous Agents and Multi-Agent Systems 15(2), 197–220 (2007)
Carmel, D., Markovitch, S.: Opponent modeling in multi-agent systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi-Agent Systems, ch. 3, pp. 40–52. Springer, Heidelberg (1996)
Chalkiadakis, G.: Multiagent reinforcement learning: Stochastic games with multiple learning players. Tech. rep., Dept. of Computer Science, University of Toronto, Canada (2003), http://www.cs.toronto.edu/~gehalk/DepthReport/DepthReport.ps
Cherkassky, V., Mulier, F.: Learning from Data: Concepts, Theory, And Methods. Wiley, Chichester (1998)
Choi, S.P.M., Yeung, D.Y.: Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. In: Touretzky, D.S., Mozer, M., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 945–951. MIT Press, Cambridge (1996)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings 15th National Conference on Artificial Intelligence and 10th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-1998), Madison, US, pp. 746–752 (1998)
Clouse, J.: Learning from an automated training agent. In: Working Notes Workshop on Agents that Learn from Other Agents, 12th International Conference on Machine Learning (ICML-1995), Tahoe City, US (1995)
Conitzer, V., Sandholm, T.: AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 83–90 (2003)
Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 1017–1023. MIT Press, Cambridge (1996)
Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2–3), 235–262 (1998)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
Fernández, F., Parker, L.E.: Learning in large cooperative multi-robot systems. International Journal of Robotics and Automation, Special Issue on Computational Intelligence Techniques in Cooperative Robots 16(4), 217–226 (2001)
Ficici, S.G., Pollack, J.B.: A game-theoretic approach to the simple coevolutionary algorithm. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 467–476. Springer, Heidelberg (2000)
Fischer, F., Rovatsos, M., Weiss, G.: Hierarchical reinforcement learning in communication-mediated multiagent coordination. In: Proceedings 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004), New York, US, pp. 1334–1335 (2004)
Fitch, R., Hengst, B., Suc, D., Calbert, G., Scholz, J.B.: Structural abstraction experiments in reinforcement learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 164–175. Springer, Heidelberg (2005)
Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. MIT Press, Cambridge (1998)
Ghavamzadeh, M., Mahadevan, S., Makar, R.: Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems 13(2), 197–229 (2006)
Glorennec, P.Y.: Reinforcement learning: An overview. In: Proceedings European Symposium on Intelligent Techniques (ESIT-2000), Aachen, Germany, pp. 17–35 (2000)
Greenwald, A., Hall, K.: Correlated-Q learning. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 242–249 (2003)
Guestrin, C., Lagoudakis, M.G., Parr, R.: Coordinated reinforcement learning. In: Proceedings 19th International Conference on Machine Learning (ICML-2002), Sydney, Australia, pp. 227–234 (2002)
Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: Proceedings 19th National Conference on Artificial Intelligence (AAAI-2004), San Jose, US, pp. 709–715 (2004)
Haynes, T., Wainwright, R., Sen, S., Schoenefeld, D.: Strongly typed genetic programming in evolving cooperation strategies. In: Proceedings 6th International Conference on Genetic Algorithms (ICGA-1995), Pittsburgh, US, pp. 271–278 (1995)
Ho, F., Kamel, M.: Learning coordination strategies for cooperative multiagent systems. Machine Learning 33(2–3), 155–177 (1998)
Horiuchi, T., Fujino, A., Katai, O., Sawaragi, T.: Fuzzy interpolation-based Q-learning with continuous states and actions. In: Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE-1996), New Orleans, US, pp. 594–600 (1996)
Hsu, W.T., Soo, V.W.: Market performance of adaptive trading agents in synchronous double auctions. In: Yuan, S.-T., Yokoo, M. (eds.) PRIMA 2001. LNCS (LNAI), vol. 2132, pp. 108–121. Springer, Heidelberg (2001)
Hu, J., Wellman, M.P.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings 15th International Conference on Machine Learning (ICML-1998), Madison, US, pp. 242–250 (1998)
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)
Ishii, S., Fujita, H., Mitsutake, M., Yamazaki, T., Matsuda, J., Matsuno, Y.: A reinforcement learning scheme for a partially-observable multi-agent game. Machine Learning 59(1–2), 31–54 (2005)
Ishiwaka, Y., Sato, T., Kakazu, Y.: An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robotics and Autonomous Systems 43(4), 245–256 (2003)
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185–1201 (1994)
Jafari, A., Greenwald, A.R., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and Nash equilibrium. In: Proceedings 18th International Conference on Machine Learning (ICML-2001), pp. 226–233. Williams College, Williamstown, US (2001)
Jong, K.D.: Evolutionary Computation: A Unified Approach. MIT Press, Cambridge (2005)
Jouffe, L.: Fuzzy inference system learning by reinforcement methods. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 28(3), 338–355 (1998)
Jung, T., Polani, D.: Kernelizing LSPE(λ). In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-2007), Honolulu, US, pp. 338–345 (2007)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: Proceedings 18th National Conference on Artificial Intelligence and 14th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-2002), Menlo Park, US, pp. 326–331 (2002)
Kok, J.R., ’t Hoen, P.J., Bakker, B., Vlassis, N.: Utile coordination: Learning interdependencies among cooperative agents. In: Proceedings IEEE Symposium on Computational Intelligence and Games (CIG 2005), Colchester, United Kingdom, pp. 29–36 (2005)
Kok, J.R., Spaan, M.T.J., Vlassis, N.: Non-communicative multi-robot coordination in dynamic environment. Robotics and Autonomous Systems 50(2–3), 99–114 (2005)
Kok, J.R., Vlassis, N.: Sparse cooperative Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML-2004), Banff, Canada, pp. 481–488 (2004)
Konda, V.R., Tsitsiklis, J.N.: On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
Könönen, V.: Asymmetric multiagent reinforcement learning. In: Proceedings IEEE/WIC International Conference on Intelligent Agent Technology (IAT-2003), Halifax, Canada, pp. 336–342 (2003)
Könönen, V.: Gradient based method for symmetric and asymmetric multiagent reinforcement learning. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 68–75. Springer, Heidelberg (2003)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), Stanford University, US, pp. 535–542 (2000)
Lee, J.-W., Jang Min, O.: A multi-agent Q-learning framework for optimizing stock trading systems. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 153–162. Springer, Heidelberg (2002)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings 11th International Conference on Machine Learning (ICML-1994), New Brunswick, US, pp. 157–163 (1994)
Littman, M.L.: Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2(1), 55–66 (2001)
Littman, M.L., Stone, P.: Implicit negotiation in repeated games. In: Meyer, J.-J.C., Tambe, M. (eds.) ATAL 2001. LNCS (LNAI), vol. 2333, pp. 96–105. Springer, Heidelberg (2002)
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1), 162–175 (1991)
Matarić, M.J.: Reward functions for accelerated learning. In: Proceedings 11th International Conference on Machine Learning (ICML-1994), New Brunswick, US, pp. 181–189 (1994)
Matarić, M.J.: Learning in multi-robot systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi–Agent Systems, ch. 10, pp. 152–163. Springer, Heidelberg (1996)
Matarić, M.J.: Reinforcement learning in the multi-robot domain. Autonomous Robots 4(1), 73–83 (1997)
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proceedings 25th International Conference on Machine Learning (ICML-2008), Helsinki, Finland, pp. 664–671 (2008)
Merke, A., Riedmiller, M.A.: Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer. In: Birk, A., Coradeschi, S., Tadokoro, S. (eds.) RoboCup 2001. LNCS (LNAI), vol. 2377, pp. 435–440. Springer, Heidelberg (2002)
Miconi, T.: When evolving populations is better than coevolving individuals: The blind mice problem. In: Proceedings 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 647–652 (2003)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Munos, R., Szepesvári, C.: Finite time bounds for fitted value iteration. Journal of Machine Learning Research 9, 815–857 (2008)
Nagendra Prasad, M.V., Lesser, V.R., Lander, S.E.: Learning organizational roles for negotiated search in a multiagent system. International Journal of Human-Computer Studies 48(1), 51–67 (1998)
Nash, S., Sofer, A.: Linear and Nonlinear Programming. McGraw-Hill, New York (1996)
Nedić, A., Bertsekas, D.P.: Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications 13(1–2), 79–110 (2003)
Negenborn, R.R., De Schutter, B., Hellendoorn, H.: Multi-agent model predictive control for transportation networks: Serial versus parallel schemes. Engineering Applications of Artificial Intelligence 21(3), 353–366 (2008)
Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2–3), 161–178 (2002)
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005)
Panait, L., Wiegand, R.P., Luke, S.: Improving coevolutionary search for optimal multiagent behaviors. In: Proceedings 18th International Joint Conference on Artificial Intelligence (IJCAI-2003), Acapulco, Mexico, pp. 653–660 (2003)
Parunak, H.V.D.: Industrial and practical applications of DAI. In: Weiss, G. (ed.) Multi–Agent Systems: A Modern Approach to Distributed Artificial Intelligence, ch. 9, pp. 377–412. MIT Press, Cambridge (1999)
Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22(1–3), 283–290 (1996)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)
Potter, M.A., Jong, K.A.D.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249–257. Springer, Heidelberg (1994)
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, Chichester (2007)
Powers, R., Shoham, Y.: New criteria and a new algorithm for learning in multi-agent systems. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 1089–1096. MIT Press, Cambridge (2005)
Price, B., Boutilier, C.: Implicit imitation in multiagent reinforcement learning. In: Proceedings 16th International Conference on Machine Learning (ICML-1999), Bled, Slovenia, pp. 325–334 (1999)
Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research 19, 569–629 (2003)
Puterman, M.L.: Markov Decision Processes—Discrete Stochastic Dynamic Programming. Wiley, Chichester (1994)
Pynadath, D.V., Tambe, M.: The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16, 389–423 (2002)
Raju, C., Narahari, Y., Ravikumar, K.: Reinforcement learning applications in dynamic pricing of retail markets. In: Proceedings 2003 IEEE International Conference on E-Commerce (CEC-2003), Newport Beach, US, pp. 339–346 (2003)
Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Riedmiller, M.A., Moore, A.W., Schneider, J.G.: Reinforcement learning for cooperating and communicating reactive agents in electrical power grids. In: Hannebauer, M., Wendler, J., Pagello, E. (eds.) Balancing Reactivity and Social Deliberation in Multi-Agent Systems, pp. 137–149. Springer, Heidelberg (2000)
Salustowicz, R., Wiering, M., Schmidhuber, J.: Learning team strategies: Soccer case studies. Machine Learning 33(2–3), 263–282 (1998)
Schaerf, A., Shoham, Y., Tennenholtz, M.: Adaptive load balancing: A study in multi-agent learning. Journal of Artificial Intelligence Research 2, 475–500 (1995)
Schmidhuber, J.: A general method for incremental self-improvement and multi-agent learning. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications, ch. 3, pp. 81–123. World Scientific, Singapore (1999)
Sejnowski, T.J., Hinton, G.E. (eds.): Unsupervised Learning: Foundations of Neural Computation. MIT Press, Cambridge (1999)
Sen, S., Sekaran, M., Hale, J.: Learning to coordinate without sharing information. In: Proceedings 12th National Conference on Artificial Intelligence (AAAI-1994), Seattle, US, pp. 426–431 (1994)
Sen, S., Weiss, G.: Learning in multiagent systems. In: Weiss, G. (ed.) Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, ch. 6, pp. 259–298. MIT Press, Cambridge (1999)
Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game Theoretic and Logical Foundations. Cambridge University Press, Cambridge (2008)
Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171(7), 365–377 (2007)
Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. In: Proceedings 16th Conference on Uncertainty in Artificial Intelligence (UAI 2000), San Francisco, US, pp. 541–548 (2000)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 361–368. MIT Press, Cambridge (1995)
Smith, J.M.: Evolution and the Theory of Games. Cambridge University Press, Cambridge (1982)
Spaan, M.T.J., Vlassis, N., Groen, F.C.A.: High level coordination of agents based on multiagent Markov decision processes with roles. In: Workshop on Cooperative Robotics, 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2002), Lausanne, Switzerland, pp. 66–73 (2002)
Stephan, V., Debes, K., Gross, H.M., Wintrich, F., Wintrich, H.: A reinforcement learning based neural multi-agent-system for control of a combustion process. In: Proceedings IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN-2000), Como, Italy, pp. 6217–6222 (2000)
Stone, P., Veloso, M.: Team-partitioned, opaque-transition reinforcement learning. In: Proceedings 3rd International Conference on Autonomous Agents (Agents-1999), Seattle, US, pp. 206–212 (1999)
Stone, P., Veloso, M.: Multiagent systems: A survey from the machine learning perspective. Autonomous Robots 8(3), 345–383 (2000)
Suematsu, N., Hayashi, A.: A multiagent reinforcement learning algorithm using extended optimal response. In: Proceedings 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2002), Bologna, Italy, pp. 370–377 (2002)
Sueyoshi, T., Tadiparthi, G.R.: An agent-based decision support system for wholesale electricity markets. Decision Support Systems 44, 425–446 (2008)
Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9–44 (1988)
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML-1990), Austin, US, pp. 216–224 (1990)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Szepesvári, C., Smart, W.D.: Interpolation-based Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML-2004), Bannf, Canada, pp. 791–798 (2004)
Tamakoshi, H., Ishii, S.: Multiagent reinforcement learning applied to a chase problem in a continuous world. Artificial Life and Robotics 5(4), 202–206 (2001)
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings 10th International Conference on Machine Learning (ICML 1993), Amherst, US, pp. 330–337 (1993)
Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
Tesauro, G., Kephart, J.O.: Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems 5(3), 289–304 (2002)
Tillotson, P., Wu, Q., Hughes, P.: Multi-agent learning for routing control within an Internet environment. Engineering Applications of Artificial Intelligence 17(2), 179–185 (2004)
Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3–4), 251–281 (1997)
Touzet, C.F.: Robot awareness in cooperative mobile robot learning. Autonomous Robots 8(1), 87–97 (2000)
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(1), 185–202 (1994)
Tuyls, K., ’t Hoen, P.J., Vanschoenwinkel, B.: An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-Agent Systems 12(1), 115–153 (2006)
Tuyls, K., Maes, S., Manderick, B.: Q-learning in simulated robotic soccer – large state spaces and incomplete information. In: Proceedings 2002 International Conference on Machine Learning and Applications (ICMLA-2002), Las Vegas, US, pp. 226–232 (2002)
Tuyls, K., Nowé, A.: Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review 20(1), 63–90 (2005)
Uther, W.T., Veloso, M.: Adversarial reinforcement learning. Tech. rep., School of Computer Science, Carnegie Mellon University, Pittsburgh, US (1997), http://www.cs.cmu.edu/afs/cs/user/will/www/papers/Uther97a.ps
Vidal, J.M.: Learning in multiagent systems: An introduction from a game-theoretic perspective. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2000 and AAMAS 2002. LNCS (LNAI), vol. 2636, pp. 202–215. Springer, Heidelberg (2003)
Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures in Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2007)
Wang, X., Sandholm, T.: Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1571–1578. MIT Press, Cambridge (2003)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Weinberg, M., Rosenschein, J.S.: Best-response multiagent learning in non-stationary environments. In: Proceedings 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004), New York, US, pp. 506–513 (2004)
Weiss, G. (ed.): Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge (1999)
Wellman, M.P., Greenwald, A.R., Stone, P., Wurman, P.R.: The 2001 Trading Agent Competition. Electronic Markets 13(1) (2003)
Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), pp. 1151–1158. Stanford University, US (2000)
Wiering, M., Salustowicz, R., Schmidhuber, J.: Reinforcement learning soccer teams with incomplete world models. Autonomous Robots 7(1), 77–88 (1999)
Zapechelnyuk, A.: Limit behavior of no-regret dynamics. Discussion Papers 21, Kyiv School of Economics, Kyiv, Ucraine (2009)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 928–936 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Buşoniu, L., Babuška, R., De Schutter, B. (2010). Multi-agent Reinforcement Learning: An Overview. In: Srinivasan, D., Jain, L.C. (eds) Innovations in Multi-Agent Systems and Applications - 1. Studies in Computational Intelligence, vol 310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14435-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-14435-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14434-9
Online ISBN: 978-3-642-14435-6
eBook Packages: EngineeringEngineering (R0)