Multi-agent Reinforcement Learning: An Overview

  • Lucian Buşoniu
  • Robert Babuška
  • Bart De Schutter
Part of the Studies in Computational Intelligence book series (SCI, volume 310)


Multi-agent systems can be used to address problems in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks. The benefits and challenges of multi-agent reinforcement learning are described. A central challenge in the field is the formal statement of a multi-agent learning goal; this chapter reviews the learning goals proposed in the literature. The problem domains where multi-agent reinforcement learning techniques have been applied are briefly discussed. Several multi-agent reinforcement learning algorithms are applied to an illustrative example involving the coordinated transportation of an object by two cooperative robots. In an outlook for the multi-agent reinforcement learning field, a set of important open issues are identified, and promising research directions to address these issues are outlined.


Nash Equilibrium Reinforcement Learning Multiagent System Markov Decision Process Reward Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abul, O., Polat, F., Alhajj, R.: Multiagent reinforcement learning using function approximation. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 4(4), 485–497 (2000)CrossRefGoogle Scholar
  2. 2.
    Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)zbMATHGoogle Scholar
  3. 3.
    Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Society for Industrial and Applied Mathematics, SIAM (1999)Google Scholar
  4. 4.
    Bakker, B., Steingrover, M., Schouten, R., Nijhuis, E., Kester, L.: Cooperative multi-agent reinforcement learning of traffic lights. In: Workshop on Cooperative Multi-Agent Learning, 16th European Conference on Machine Learning (ECML-2005), Porto, Portugal (2005)Google Scholar
  5. 5.
    Banerjee, B., Peng, J.: Adaptive policy gradient in multiagent learning. In: Proceedings 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2003), Melbourne, Australia, pp. 686–692 (2003)Google Scholar
  6. 6.
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13(5), 833–846 (1983)Google Scholar
  7. 7.
    Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478–485 (2003)CrossRefGoogle Scholar
  8. 8.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. 2. Athena Scientific (2007)Google Scholar
  9. 9.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)Google Scholar
  10. 10.
    Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54(3), 207–213 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK-1996), pp. 195–210. De Zeeuwse Stromen, The Netherlands (1996)Google Scholar
  12. 12.
    Bowling, M.: Convergence problems of general-sum multiagent reinforcement learning. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), Stanford University, US, pp. 89–94 (2000)Google Scholar
  13. 13.
    Bowling, M.: Multiagent learning in the presence of agents with limitations. Ph.D. thesis, Computer Science Dept., Carnegie Mellon University, Pittsburgh, US (2003)Google Scholar
  14. 14.
    Bowling, M.: Convergence and no-regret in multiagent learning. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 209–216. MIT Press, Cambridge (2005)Google Scholar
  15. 15.
    Bowling, M., Veloso, M.: An analysis of stochastic game theory for multiagent reinforcement learning. Tech. rep., Computer Science Dept., Carnegie Mellon University, Pittsburgh, US (2000),
  16. 16.
    Bowling, M., Veloso, M.: Rational and convergent learning in stochastic games. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI-2001), San Francisco, US, pp. 1021–1026 (2001)Google Scholar
  17. 17.
    Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Boyan, J.A., Littman, M.L.: Packet routing in dynamically changing networks: A reinforcement learning approach. In: Moody, J. (ed.) Advances in Neural Information Processing Systems 6, pp. 671–678. Morgan Kaufmann, San Francisco (1994)Google Scholar
  19. 19.
    Brown, G.W.: Iterative solutions of games by fictitious play. In: Koopmans, T.C. (ed.) Activitiy Analysis of Production and Allocation, ch. XXIV, pp. 374–376. Wiley, Chichester (1951)Google Scholar
  20. 20.
    Buşoniu, L., Babuška, R., De Schutter, B.: A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics. Part C: Applications and Reviews 38(2), 156–172 (2008)CrossRefGoogle Scholar
  21. 21.
    Buşoniu, L., De Schutter, B., Babuška, R.: Multiagent reinforcement learning with adaptive state focus. In: Proceedings 17th Belgian-Dutch Conference on Artificial Intelligence (BNAIC-2005), Brussels, Belgium, pp. 35–42 (2005)Google Scholar
  22. 22.
    Buşoniu, L., De Schutter, B., Babuška, R.: Decentralized reinforcement learning control of a robotic manipulator. In: Proceedings 9th International Conference of Control, Automation, Robotics, and Vision (ICARCV-2006), Singapore, pp. 1347–1352 (2006)Google Scholar
  23. 23.
    Buşoniu, L., De Schutter, B., Babuška, R.: Approximate dynamic programming and reinforcement learning. In: Babuška, R., Groen, F.C.A. (eds.) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol. 281, pp. 3–44. Springer, Heidelberg (2010)Google Scholar
  24. 24.
    Buffet, O., Dutech, A., Charpillet, F.: Shaping multi-agent systems with gradient reinforcement learning. Autonomous Agents and Multi-Agent Systems 15(2), 197–220 (2007)CrossRefGoogle Scholar
  25. 25.
    Carmel, D., Markovitch, S.: Opponent modeling in multi-agent systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi-Agent Systems, ch. 3, pp. 40–52. Springer, Heidelberg (1996)Google Scholar
  26. 26.
    Chalkiadakis, G.: Multiagent reinforcement learning: Stochastic games with multiple learning players. Tech. rep., Dept. of Computer Science, University of Toronto, Canada (2003),
  27. 27.
    Cherkassky, V., Mulier, F.: Learning from Data: Concepts, Theory, And Methods. Wiley, Chichester (1998)zbMATHGoogle Scholar
  28. 28.
    Choi, S.P.M., Yeung, D.Y.: Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. In: Touretzky, D.S., Mozer, M., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 945–951. MIT Press, Cambridge (1996)Google Scholar
  29. 29.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings 15th National Conference on Artificial Intelligence and 10th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-1998), Madison, US, pp. 746–752 (1998)Google Scholar
  30. 30.
    Clouse, J.: Learning from an automated training agent. In: Working Notes Workshop on Agents that Learn from Other Agents, 12th International Conference on Machine Learning (ICML-1995), Tahoe City, US (1995)Google Scholar
  31. 31.
    Conitzer, V., Sandholm, T.: AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 83–90 (2003)Google Scholar
  32. 32.
    Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8, pp. 1017–1023. MIT Press, Cambridge (1996)Google Scholar
  33. 33.
    Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2–3), 235–262 (1998)zbMATHCrossRefGoogle Scholar
  34. 34.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetGoogle Scholar
  35. 35.
    Fernández, F., Parker, L.E.: Learning in large cooperative multi-robot systems. International Journal of Robotics and Automation, Special Issue on Computational Intelligence Techniques in Cooperative Robots 16(4), 217–226 (2001)Google Scholar
  36. 36.
    Ficici, S.G., Pollack, J.B.: A game-theoretic approach to the simple coevolutionary algorithm. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 467–476. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  37. 37.
    Fischer, F., Rovatsos, M., Weiss, G.: Hierarchical reinforcement learning in communication-mediated multiagent coordination. In: Proceedings 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004), New York, US, pp. 1334–1335 (2004)Google Scholar
  38. 38.
    Fitch, R., Hengst, B., Suc, D., Calbert, G., Scholz, J.B.: Structural abstraction experiments in reinforcement learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 164–175. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  39. 39.
    Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  40. 40.
    Ghavamzadeh, M., Mahadevan, S., Makar, R.: Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems 13(2), 197–229 (2006)CrossRefGoogle Scholar
  41. 41.
    Glorennec, P.Y.: Reinforcement learning: An overview. In: Proceedings European Symposium on Intelligent Techniques (ESIT-2000), Aachen, Germany, pp. 17–35 (2000)Google Scholar
  42. 42.
    Greenwald, A., Hall, K.: Correlated-Q learning. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 242–249 (2003)Google Scholar
  43. 43.
    Guestrin, C., Lagoudakis, M.G., Parr, R.: Coordinated reinforcement learning. In: Proceedings 19th International Conference on Machine Learning (ICML-2002), Sydney, Australia, pp. 227–234 (2002)Google Scholar
  44. 44.
    Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: Proceedings 19th National Conference on Artificial Intelligence (AAAI-2004), San Jose, US, pp. 709–715 (2004)Google Scholar
  45. 45.
    Haynes, T., Wainwright, R., Sen, S., Schoenefeld, D.: Strongly typed genetic programming in evolving cooperation strategies. In: Proceedings 6th International Conference on Genetic Algorithms (ICGA-1995), Pittsburgh, US, pp. 271–278 (1995)Google Scholar
  46. 46.
    Ho, F., Kamel, M.: Learning coordination strategies for cooperative multiagent systems. Machine Learning 33(2–3), 155–177 (1998)zbMATHCrossRefGoogle Scholar
  47. 47.
    Horiuchi, T., Fujino, A., Katai, O., Sawaragi, T.: Fuzzy interpolation-based Q-learning with continuous states and actions. In: Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE-1996), New Orleans, US, pp. 594–600 (1996)Google Scholar
  48. 48.
    Hsu, W.T., Soo, V.W.: Market performance of adaptive trading agents in synchronous double auctions. In: Yuan, S.-T., Yokoo, M. (eds.) PRIMA 2001. LNCS (LNAI), vol. 2132, pp. 108–121. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  49. 49.
    Hu, J., Wellman, M.P.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings 15th International Conference on Machine Learning (ICML-1998), Madison, US, pp. 242–250 (1998)Google Scholar
  50. 50.
    Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)CrossRefMathSciNetGoogle Scholar
  51. 51.
    Ishii, S., Fujita, H., Mitsutake, M., Yamazaki, T., Matsuda, J., Matsuno, Y.: A reinforcement learning scheme for a partially-observable multi-agent game. Machine Learning 59(1–2), 31–54 (2005)zbMATHCrossRefGoogle Scholar
  52. 52.
    Ishiwaka, Y., Sato, T., Kakazu, Y.: An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robotics and Autonomous Systems 43(4), 245–256 (2003)CrossRefGoogle Scholar
  53. 53.
    Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185–1201 (1994)zbMATHCrossRefGoogle Scholar
  54. 54.
    Jafari, A., Greenwald, A.R., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and Nash equilibrium. In: Proceedings 18th International Conference on Machine Learning (ICML-2001), pp. 226–233. Williams College, Williamstown, US (2001)Google Scholar
  55. 55.
    Jong, K.D.: Evolutionary Computation: A Unified Approach. MIT Press, Cambridge (2005)Google Scholar
  56. 56.
    Jouffe, L.: Fuzzy inference system learning by reinforcement methods. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 28(3), 338–355 (1998)CrossRefGoogle Scholar
  57. 57.
    Jung, T., Polani, D.: Kernelizing LSPE(λ). In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-2007), Honolulu, US, pp. 338–345 (2007)Google Scholar
  58. 58.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  59. 59.
    Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: Proceedings 18th National Conference on Artificial Intelligence and 14th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-2002), Menlo Park, US, pp. 326–331 (2002)Google Scholar
  60. 60.
    Kok, J.R., ’t Hoen, P.J., Bakker, B., Vlassis, N.: Utile coordination: Learning interdependencies among cooperative agents. In: Proceedings IEEE Symposium on Computational Intelligence and Games (CIG 2005), Colchester, United Kingdom, pp. 29–36 (2005)Google Scholar
  61. 61.
    Kok, J.R., Spaan, M.T.J., Vlassis, N.: Non-communicative multi-robot coordination in dynamic environment. Robotics and Autonomous Systems 50(2–3), 99–114 (2005)CrossRefGoogle Scholar
  62. 62.
    Kok, J.R., Vlassis, N.: Sparse cooperative Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML-2004), Banff, Canada, pp. 481–488 (2004)Google Scholar
  63. 63.
    Konda, V.R., Tsitsiklis, J.N.: On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  64. 64.
    Könönen, V.: Asymmetric multiagent reinforcement learning. In: Proceedings IEEE/WIC International Conference on Intelligent Agent Technology (IAT-2003), Halifax, Canada, pp. 336–342 (2003)Google Scholar
  65. 65.
    Könönen, V.: Gradient based method for symmetric and asymmetric multiagent reinforcement learning. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 68–75. Springer, Heidelberg (2003)Google Scholar
  66. 66.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)CrossRefMathSciNetGoogle Scholar
  67. 67.
    Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), Stanford University, US, pp. 535–542 (2000)Google Scholar
  68. 68.
    Lee, J.-W., Jang Min, O.: A multi-agent Q-learning framework for optimizing stock trading systems. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 153–162. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  69. 69.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings 11th International Conference on Machine Learning (ICML-1994), New Brunswick, US, pp. 157–163 (1994)Google Scholar
  70. 70.
    Littman, M.L.: Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2(1), 55–66 (2001)CrossRefGoogle Scholar
  71. 71.
    Littman, M.L., Stone, P.: Implicit negotiation in repeated games. In: Meyer, J.-J.C., Tambe, M. (eds.) ATAL 2001. LNCS (LNAI), vol. 2333, pp. 96–105. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  72. 72.
    Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1), 162–175 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  73. 73.
    Matarić, M.J.: Reward functions for accelerated learning. In: Proceedings 11th International Conference on Machine Learning (ICML-1994), New Brunswick, US, pp. 181–189 (1994)Google Scholar
  74. 74.
    Matarić, M.J.: Learning in multi-robot systems. In: Weiß, G., Sen, S. (eds.) Adaptation and Learning in Multi–Agent Systems, ch. 10, pp. 152–163. Springer, Heidelberg (1996)Google Scholar
  75. 75.
    Matarić, M.J.: Reinforcement learning in the multi-robot domain. Autonomous Robots 4(1), 73–83 (1997)CrossRefGoogle Scholar
  76. 76.
    Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proceedings 25th International Conference on Machine Learning (ICML-2008), Helsinki, Finland, pp. 664–671 (2008)Google Scholar
  77. 77.
    Merke, A., Riedmiller, M.A.: Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer. In: Birk, A., Coradeschi, S., Tadokoro, S. (eds.) RoboCup 2001. LNCS (LNAI), vol. 2377, pp. 435–440. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  78. 78.
    Miconi, T.: When evolving populations is better than coevolving individuals: The blind mice problem. In: Proceedings 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 647–652 (2003)Google Scholar
  79. 79.
    Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)Google Scholar
  80. 80.
    Munos, R., Szepesvári, C.: Finite time bounds for fitted value iteration. Journal of Machine Learning Research 9, 815–857 (2008)Google Scholar
  81. 81.
    Nagendra Prasad, M.V., Lesser, V.R., Lander, S.E.: Learning organizational roles for negotiated search in a multiagent system. International Journal of Human-Computer Studies 48(1), 51–67 (1998)CrossRefGoogle Scholar
  82. 82.
    Nash, S., Sofer, A.: Linear and Nonlinear Programming. McGraw-Hill, New York (1996)Google Scholar
  83. 83.
    Nedić, A., Bertsekas, D.P.: Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications 13(1–2), 79–110 (2003)zbMATHMathSciNetGoogle Scholar
  84. 84.
    Negenborn, R.R., De Schutter, B., Hellendoorn, H.: Multi-agent model predictive control for transportation networks: Serial versus parallel schemes. Engineering Applications of Artificial Intelligence 21(3), 353–366 (2008)CrossRefGoogle Scholar
  85. 85.
    Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2–3), 161–178 (2002)zbMATHCrossRefGoogle Scholar
  86. 86.
    Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005)CrossRefGoogle Scholar
  87. 87.
    Panait, L., Wiegand, R.P., Luke, S.: Improving coevolutionary search for optimal multiagent behaviors. In: Proceedings 18th International Joint Conference on Artificial Intelligence (IJCAI-2003), Acapulco, Mexico, pp. 653–660 (2003)Google Scholar
  88. 88.
    Parunak, H.V.D.: Industrial and practical applications of DAI. In: Weiss, G. (ed.) Multi–Agent Systems: A Modern Approach to Distributed Artificial Intelligence, ch. 9, pp. 377–412. MIT Press, Cambridge (1999)Google Scholar
  89. 89.
    Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22(1–3), 283–290 (1996)Google Scholar
  90. 90.
    Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)CrossRefGoogle Scholar
  91. 91.
    Potter, M.A., Jong, K.A.D.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249–257. Springer, Heidelberg (1994)Google Scholar
  92. 92.
    Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, Chichester (2007)zbMATHCrossRefGoogle Scholar
  93. 93.
    Powers, R., Shoham, Y.: New criteria and a new algorithm for learning in multi-agent systems. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 1089–1096. MIT Press, Cambridge (2005)Google Scholar
  94. 94.
    Price, B., Boutilier, C.: Implicit imitation in multiagent reinforcement learning. In: Proceedings 16th International Conference on Machine Learning (ICML-1999), Bled, Slovenia, pp. 325–334 (1999)Google Scholar
  95. 95.
    Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research 19, 569–629 (2003)zbMATHGoogle Scholar
  96. 96.
    Puterman, M.L.: Markov Decision Processes—Discrete Stochastic Dynamic Programming. Wiley, Chichester (1994)zbMATHGoogle Scholar
  97. 97.
    Pynadath, D.V., Tambe, M.: The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16, 389–423 (2002)zbMATHMathSciNetGoogle Scholar
  98. 98.
    Raju, C., Narahari, Y., Ravikumar, K.: Reinforcement learning applications in dynamic pricing of retail markets. In: Proceedings 2003 IEEE International Conference on E-Commerce (CEC-2003), Newport Beach, US, pp. 339–346 (2003)Google Scholar
  99. 99.
    Riedmiller, M.: Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  100. 100.
    Riedmiller, M.A., Moore, A.W., Schneider, J.G.: Reinforcement learning for cooperating and communicating reactive agents in electrical power grids. In: Hannebauer, M., Wendler, J., Pagello, E. (eds.) Balancing Reactivity and Social Deliberation in Multi-Agent Systems, pp. 137–149. Springer, Heidelberg (2000)Google Scholar
  101. 101.
    Salustowicz, R., Wiering, M., Schmidhuber, J.: Learning team strategies: Soccer case studies. Machine Learning 33(2–3), 263–282 (1998)zbMATHCrossRefGoogle Scholar
  102. 102.
    Schaerf, A., Shoham, Y., Tennenholtz, M.: Adaptive load balancing: A study in multi-agent learning. Journal of Artificial Intelligence Research 2, 475–500 (1995)zbMATHGoogle Scholar
  103. 103.
    Schmidhuber, J.: A general method for incremental self-improvement and multi-agent learning. In: Yao, X. (ed.) Evolutionary Computation: Theory and Applications, ch. 3, pp. 81–123. World Scientific, Singapore (1999)Google Scholar
  104. 104.
    Sejnowski, T.J., Hinton, G.E. (eds.): Unsupervised Learning: Foundations of Neural Computation. MIT Press, Cambridge (1999)Google Scholar
  105. 105.
    Sen, S., Sekaran, M., Hale, J.: Learning to coordinate without sharing information. In: Proceedings 12th National Conference on Artificial Intelligence (AAAI-1994), Seattle, US, pp. 426–431 (1994)Google Scholar
  106. 106.
    Sen, S., Weiss, G.: Learning in multiagent systems. In: Weiss, G. (ed.) Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, ch. 6, pp. 259–298. MIT Press, Cambridge (1999)Google Scholar
  107. 107.
    Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game Theoretic and Logical Foundations. Cambridge University Press, Cambridge (2008)Google Scholar
  108. 108.
    Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171(7), 365–377 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  109. 109.
    Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. In: Proceedings 16th Conference on Uncertainty in Artificial Intelligence (UAI 2000), San Francisco, US, pp. 541–548 (2000)Google Scholar
  110. 110.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 361–368. MIT Press, Cambridge (1995)Google Scholar
  111. 111.
    Smith, J.M.: Evolution and the Theory of Games. Cambridge University Press, Cambridge (1982)zbMATHGoogle Scholar
  112. 112.
    Spaan, M.T.J., Vlassis, N., Groen, F.C.A.: High level coordination of agents based on multiagent Markov decision processes with roles. In: Workshop on Cooperative Robotics, 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2002), Lausanne, Switzerland, pp. 66–73 (2002)Google Scholar
  113. 113.
    Stephan, V., Debes, K., Gross, H.M., Wintrich, F., Wintrich, H.: A reinforcement learning based neural multi-agent-system for control of a combustion process. In: Proceedings IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN-2000), Como, Italy, pp. 6217–6222 (2000)Google Scholar
  114. 114.
    Stone, P., Veloso, M.: Team-partitioned, opaque-transition reinforcement learning. In: Proceedings 3rd International Conference on Autonomous Agents (Agents-1999), Seattle, US, pp. 206–212 (1999)Google Scholar
  115. 115.
    Stone, P., Veloso, M.: Multiagent systems: A survey from the machine learning perspective. Autonomous Robots 8(3), 345–383 (2000)CrossRefGoogle Scholar
  116. 116.
    Suematsu, N., Hayashi, A.: A multiagent reinforcement learning algorithm using extended optimal response. In: Proceedings 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2002), Bologna, Italy, pp. 370–377 (2002)Google Scholar
  117. 117.
    Sueyoshi, T., Tadiparthi, G.R.: An agent-based decision support system for wholesale electricity markets. Decision Support Systems 44, 425–446 (2008)CrossRefGoogle Scholar
  118. 118.
    Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
  119. 119.
    Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML-1990), Austin, US, pp. 216–224 (1990)Google Scholar
  120. 120.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  121. 121.
    Szepesvári, C., Smart, W.D.: Interpolation-based Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML-2004), Bannf, Canada, pp. 791–798 (2004)Google Scholar
  122. 122.
    Tamakoshi, H., Ishii, S.: Multiagent reinforcement learning applied to a chase problem in a continuous world. Artificial Life and Robotics 5(4), 202–206 (2001)CrossRefGoogle Scholar
  123. 123.
    Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings 10th International Conference on Machine Learning (ICML 1993), Amherst, US, pp. 330–337 (1993)Google Scholar
  124. 124.
    Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)Google Scholar
  125. 125.
    Tesauro, G., Kephart, J.O.: Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems 5(3), 289–304 (2002)CrossRefGoogle Scholar
  126. 126.
    Tillotson, P., Wu, Q., Hughes, P.: Multi-agent learning for routing control within an Internet environment. Engineering Applications of Artificial Intelligence 17(2), 179–185 (2004)CrossRefGoogle Scholar
  127. 127.
    Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3–4), 251–281 (1997)CrossRefGoogle Scholar
  128. 128.
    Touzet, C.F.: Robot awareness in cooperative mobile robot learning. Autonomous Robots 8(1), 87–97 (2000)CrossRefGoogle Scholar
  129. 129.
    Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(1), 185–202 (1994)zbMATHGoogle Scholar
  130. 130.
    Tuyls, K., ’t Hoen, P.J., Vanschoenwinkel, B.: An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-Agent Systems 12(1), 115–153 (2006)Google Scholar
  131. 131.
    Tuyls, K., Maes, S., Manderick, B.: Q-learning in simulated robotic soccer – large state spaces and incomplete information. In: Proceedings 2002 International Conference on Machine Learning and Applications (ICMLA-2002), Las Vegas, US, pp. 226–232 (2002)Google Scholar
  132. 132.
    Tuyls, K., Nowé, A.: Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review 20(1), 63–90 (2005)CrossRefGoogle Scholar
  133. 133.
    Uther, W.T., Veloso, M.: Adversarial reinforcement learning. Tech. rep., School of Computer Science, Carnegie Mellon University, Pittsburgh, US (1997),
  134. 134.
    Vidal, J.M.: Learning in multiagent systems: An introduction from a game-theoretic perspective. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2000 and AAMAS 2002. LNCS (LNAI), vol. 2636, pp. 202–215. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  135. 135.
    Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures in Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2007)Google Scholar
  136. 136.
    Wang, X., Sandholm, T.: Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1571–1578. MIT Press, Cambridge (2003)Google Scholar
  137. 137.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)zbMATHGoogle Scholar
  138. 138.
    Weinberg, M., Rosenschein, J.S.: Best-response multiagent learning in non-stationary environments. In: Proceedings 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004), New York, US, pp. 506–513 (2004)Google Scholar
  139. 139.
    Weiss, G. (ed.): Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge (1999)Google Scholar
  140. 140.
    Wellman, M.P., Greenwald, A.R., Stone, P., Wurman, P.R.: The 2001 Trading Agent Competition. Electronic Markets 13(1) (2003)Google Scholar
  141. 141.
    Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Proceedings 17th International Conference on Machine Learning (ICML-2000), pp. 1151–1158. Stanford University, US (2000)Google Scholar
  142. 142.
    Wiering, M., Salustowicz, R., Schmidhuber, J.: Reinforcement learning soccer teams with incomplete world models. Autonomous Robots 7(1), 77–88 (1999)CrossRefGoogle Scholar
  143. 143.
    Zapechelnyuk, A.: Limit behavior of no-regret dynamics. Discussion Papers 21, Kyiv School of Economics, Kyiv, Ucraine (2009)Google Scholar
  144. 144.
    Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, US, pp. 928–936 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Lucian Buşoniu
    • 1
  • Robert Babuška
    • 1
  • Bart De Schutter
    • 2
  1. 1.Center for Systems and ControlDelft University of TechnologyThe Netherlands
  2. 2.Center for Systems and Control & Marine and Transport Technology DepartmentDelft University of TechnologyThe Netherlands

Personalised recommendations