Shaping multi-agent systems with gradient reinforcement learning

Buffet, Olivier; Dutech, Alain; Charpillet, François

doi:10.1007/s10458-006-9010-5

Shaping multi-agent systems with gradient reinforcement learning

Published: 10 January 2007

Volume 15, pages 197–220, (2007)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Olivier Buffet¹,
Alain Dutech² &
François Charpillet²

367 Accesses
28 Citations
Explore all metrics

Abstract

An original reinforcement learning (RL) methodology is proposed for the design of multi-agent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. To that end, we design simple reactive agents in a decentralized way as independent learners. But to cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face a sequence of progressively more complex tasks. We illustrate this general framework by computer experiments where agents have to coordinate to reach a global goal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiagent Reinforcement Learning

Revisited: Machine Intelligence in Heterogeneous Multi-Agent Systems

Multiagent Learning Paradigms

Abbreviations

RL:: Reinforcement learning
MAS:: Multi-agent system
MDP:: Markov decision process
POMDP:: Partially observable Markov decision process

References

Asada M., Noda S., Tawaratsumida S., Hosodaal K. (1996). Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning 23(2–3): 279–303
Google Scholar
Bartlett P., Baxter J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Technical report, Australian National University
Baxter J., Bartlett P. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350
Article MATH MathSciNet Google Scholar
Baxter J., Bartlett P., Weaver L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research 15, 351–381
MATH MathSciNet Google Scholar
Bernstein D., Givan R., Immerman N., Zilberstein S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4): 819–840
Article MATH MathSciNet Google Scholar
Bertsekas, D., & Tsitsiklis, J. (1996). Neurodynamic programming. Athena Scientific.
Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Y. Shoham (Ed.), Proceedings of the sixth conference on theoretical aspects of rationality and knowledge (TARK ’96) (pp. 195–210).
Buffet, O. (2003). Une double approche modulaire de l’apprentissage par renforcement pour des agents intelligents adaptatifs. Ph.D. thesis, Université Henri Poincaré, Nancy 1. Laboratoire Lorrain de recherche en informatique et ses applications (LORIA).
Buffet, O., & Aberdeen, D. (2006). The factored policy gradient planner (IPC-06 Version). In A. Gerevini, B. Bonet, & B. Givan (Eds.), Proceedings of the fifth international planning competition (IPC-5) (pp. 69–71). Winner, probabilistic track of the 5th International Planning Competition.
Buffet, O., Dutech, A., & Charpillet, F. (2004). Self-growth of basic behaviors in an action selection based agent. In S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam, & J.-A. Meyer (Eds.), From animals to animats 8: Proceedings of the eighth international conference on simulation of adaptive behavior (SAB’04) (pp. 223–232).
Buffet O., Dutech A., Charpillet F. (2005). Développement autonome des comportements de base d’un agent. Revue d’Intelligence Artificielle, 19(4–5): 603–632
Article Google Scholar
Carmel, D., & Markovitch, S. (1996). Adaption and learning in multi-agent systems, Vol. 1042, Lecture notes in artificial intelligence, Chapt. Opponent modeling in multi-agent systems (pp. 40–52). Springer-Verlag.
Cassandra, A. R. (1998). Exact and approximate algorithms for partially observable Markov decision processes. Ph.D. thesis, Brown University, Department of Computer Science, Providence, RI.
Dorigo, M., & Di Caro, G. (1999). Ant colony optimization: A new meta-heuristic. In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, & A. Zalzala (Eds.), Proceedings of the congress on evolutionary computation (CEC-99) (pp. 1470–1477).
Dutech, A. (2000). Solving POMDP using selected past-events. In W. Horn (Ed.), Proceedings of the fourteenth european conference on artificial intelligence (ECAI’00) (pp. 281–285).
Fernández F., Parker L. (2001). Learning in large cooperative multi-robot domains. International Journal of Robotics and Automation 16(4): 217–226
Google Scholar
Gerkey B., Matarić M. (2004). A formal analysis and taxonomy of task allocation in multi-robot systems. International Journal of Robotics Research 23(9): 939–954
Article Google Scholar
Gmytrasiewicz, P., & Doshi, P. (2004). Interactive POMDPs: Properties and preliminary results. In Proceedings of the third international joint conference on autonomous agents and multi-agent systems (AAMAS’04).
Goldman, C., Allen, M., & Zilberstein, S. (2004). Decentralized language learning through acting. In Proceedings of the third international joint conference on autonomous agents and multi-agent systems (AAMAS’04).
Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In C. Sammut & A. G. Hoffmann (Eds.), Proceedings of the nineteenth international conference on machine learning (ICML’02) (pp. 243–250).
Hu, J., & Wellman, M. (1998). Online learning about other agents in a dynamic multiagent system. In K. P. Sycara & M. Wooldridge (Eds.), Proceedings of the second international conference on autonomous agents (Agents’98) (pp. 239–246).
Jaakkola T., Jordan M., Singh S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6): 1186–1201
Google Scholar
Jong, E. D. (2000). Attractors in the development of communication. In J.-A. Meyer, A. Berthoz, D. Floreano, H. L. Roitblat, & S. W. Wilson (Eds.), From animals to animats 6: Proceedings of the sixth international conference on simulation of adaptive behavior (SAB-00).
Kaelbling L., Littman M., Moore A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285
Google Scholar
Laud, A. (2004). Theory and application of reward shaping in reinforcement learning. Ph.D. thesis, University of Illinois at Urbana-Champaign.
Littman, M., Cassandra, A., & Kaelbling, L. (1995). Learning policies for partially observable environments: Scaling up. In A. Prieditis & S. J. Russell (Eds.), Proceedings of the twelveth international conference on machine learning (ICML’95) (pp. 362–370).
Matarić M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots 4(1): 73–83
Article Google Scholar
McCallum, R. A. (1995). Reinforcement learning with selective perception and hidden state. Ph.D. thesis, University of Rochester.
Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In I. Bratko & S. Dzeroski (Eds.), Proceedings of the sixteenth international conference on machine learning (ICML’99) (pp. 278–287).
Peshkin, L., Kim, K., Meuleau, N., & Kaelbling, L. (2000). Learning to cooperate via policy search. In C. Boutilier & M. Goldszmidt (Eds.), Proceedings of the sixteenth conference on uncertainty in artificial intelligence (UAI’00) (pp. 489–496).
Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. In J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds.), Proceedings of the sixteenth european conference on machine learning (ECML’05), Vol. 3720, Lecture notes in computer science.
Puterman, M. L. (1994). Markov decision processes—Discrete stochastic dynamic programming. New York, USA: Wiley.
Pynadath D., Tambe M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16: 389–423
MATH MathSciNet Google Scholar
Randløv, J. (2000). Shaping in reinforcement learning by changing the physics of the problem. In P. Langley (Ed.), Proceedings of the seventeenth international conference on machine learning (ICML’00) (pp. 767–774).
Randløv, J., & Alstrøm, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In J. W. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (ICML’98) (pp. 463–471).
Rogowski, C. (2004). Model-based opponent modelling in domains beyond the prisoner’s dilemma. In Proceedings of modeling other agents from observations (MOO 2004), AAMAS’04 workshop.
Salustowicz R., Wiering M., Schmidhuber J. (1998). Learning team strategies: Soccer case studies. Machine Learning 33, 263–282
Article MATH Google Scholar
Shoham, Y., Powers, R., & Grenager, T. (2003). Multi-agent reinforcement learning: A critical survey. Technical report, Stanford.
Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state estimation in partially observable Markovian decision processes. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the eleventh international conference on machine learning (ICML’94).
Skinner B. (1953). Science and human behavior. New-York, Collier-Macmillian
Google Scholar
Staddon, J. (1983). Adaptative behavior and learning. Cambridge University Press.
Stone, P., & Veloso, M. (2000a). Layered learning. In R. L. de Mántaras & E. Plaza (Eds.), Proceedings of the eleventh european conference on machine learning (ECML’00), Vol. 1810, Lecture notes in computer science.
Stone, P., & Veloso, M. (2000b). Multiagent systems: A survey from a machine learning perspective. Autonomous Robotics, 8(3).
Stone, P., & Veloso, M. (2000c). Team-partitioned, opaque-transition reinforcement learning. In Proceedings of the third international conference on autonomous agents (Agents’00).
Sutton, R., & Barto, G. (1998). Reinforcement learning: An introduction. Cambridge, MA: Bradford Book, MIT Press.
Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems 11 (NIPS’99), Vol. 12 (pp. 1057–1063).
Tumer, K., & Wolpert, D. (2000). Collective intelligence and Braess paradox. In Proceedings of the sixteenth national conference on artificial intelligence (AAAI’00) (pp. 104–109).
Tyrrell, T. (1993). Computational mechanisms for action selection. Ph.D. thesis, University of Edinburgh.
Vidal, J., & Durfee, E. (1997). Agent learning about agents: A framework and analysis. In S. Sen (Ed.), Collected papers from the AAAI-97 workshop on multiagent learning (pp. 71–76).
Watkins, C. (1989). Learning from delayed rewards. Ph.D. thesis, King’s College of Cambridge, UK.
Wolpert, D., & Tumer, K. (1999). An introduction to collective intelligence. Technical Report NASA-ARC-IC-99-63, NASA AMES Research Center.
Wolpert, D., Wheeler, K., & Tumer, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third international conference on autonomous agents (Agents’99) (pp. 77–83).
Xuan, P., Lesser, V., & Zilberstein, S. (2000). Communication in multi-agent Markov decision processes. In S. Parsons & P. Gmytrasiewicz (Eds.), Proceedings of ICMAS workshop on game theoretic and decision theoretic agents.

Download references

Author information

Authors and Affiliations

LAAS/CNRS, Groupe RIS, 7 Avenue du Colonel Roche, 31077, Toulouse Cedex 4, France
Olivier Buffet
Loria - INRIA-Lorraine, Campus Scientifique - BP 239, 54506, Vandœuvre-lès-Nancy Cedex, France
Alain Dutech & François Charpillet

Authors

Olivier Buffet
View author publications
You can also search for this author in PubMed Google Scholar
Alain Dutech
View author publications
You can also search for this author in PubMed Google Scholar
François Charpillet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivier Buffet.

Additional information

This work has been conducted in part in NICTA’s Canberra laboratory.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buffet, O., Dutech, A. & Charpillet, F. Shaping multi-agent systems with gradient reinforcement learning. Auton Agent Multi-Agent Syst 15, 197–220 (2007). https://doi.org/10.1007/s10458-006-9010-5

Download citation

Published: 10 January 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s10458-006-9010-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shaping multi-agent systems with gradient reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Multiagent Reinforcement Learning

Revisited: Machine Intelligence in Heterogeneous Multi-Agent Systems

Multiagent Learning Paradigms

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Shaping multi-agent systems with gradient reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Multiagent Reinforcement Learning

Revisited: Machine Intelligence in Heterogeneous Multi-Agent Systems

Multiagent Learning Paradigms

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation