Cooperative reinforcement learning in topology-based multi-agent systems

Xiao, Dan; Tan, Ah-Hwee

doi:10.1007/s10458-011-9183-4

Cooperative reinforcement learning in topology-based multi-agent systems

Published: 07 October 2011

Volume 26, pages 86–119, (2013)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Dan Xiao¹ &
Ah-Hwee Tan¹

727 Accesses
8 Citations
Explore all metrics

Abstract

Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual agents and enable policy sharing across agents. Our complexity analysis indicates that multi-agent systems with the BTF have a much smaller state space and a higher level of flexibility, compared with the general form of n-ary (n > 2) tree formation. We have applied the proposed cooperative learning strategy to a class of reinforcement learning agents known as temporal difference-fusion architecture for learning and cognition (TD-FALCON). Comparative experiments based on a generic network routing problem, which is a typical TMAS domain, show that the TD-FALCON BTF teams outperform alternative methods, including TD-FALCON teams in single agent and n-ary tree formation, a Q-learning method based on the table lookup mechanism, as well as a classical linear programming algorithm. Our study further shows that TD-FALCON BTF can adapt and function well under various scales of network complexity and traffic volume in TMAS domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Panait, L., & Luke, S. (2003). Cooperative multi-agent learning: The state of the art. Tech. Rep., George Mason University, Technical Report GMU-CS-TR-2003-1.
Busoniu, L., Babuska, R., & De Schutter, B. (2006). Multi-agent reinforcement learning: A survey. In Proceedings of 9th international conference on control, automation, robotics and vision (ICARCV) (pp. 1–6).
Lesser, V. R., Corkill, D. D., & Durfee, E. H. (1987). An update on the distributed vehicle monitoring testbed. Tech. Rep., Computer and Information Science Department, Amherst, MA, USA.
Nunes, L., & Oliveira, E. (2004). Learning from multiple sources. In Proceedings of third international joint conference on autonomous agents and multi agent systems (AAMAS-2004).
Boyan J. A., Littman M. L. (1994) Packet routing in dynamically changing networks: A reinforcement learning approach. In: Cowan J. D., Tesauro G., Alspector J. (eds) Advances in neural information processing systems. Morgan Kaufmann Publishers Inc, San Francisco, CA, pp 671–678
Google Scholar
Chang Y. H., Ho, T., & Kaelbling L. P. (2004). Mobilized ad-hoc networks: A reinforcement learning approach. In Proceedings of 2004 international conference on autonomic computing (pp. 240–247).
Schneider, J., Wong, W. K., Moore, A., & Riedmiller, M. (1999). Distributed value functions. In Proceedings of 16th international conference on machine learning(pp. 371–378). San Francisco, CA: Morgan Kaufmann.
Varga L. Z., Jennings N. R., Cockburn D. (1994) Integrating intelligent systems into a cooperating community for electricity distribution management. Expert Systems with Applications 7(4): 563–579
Article Google Scholar
Tan, A.-H. (2006). Self-organizing neural architecture for reinforcement learning. In Proceedings of international symposium on neural networks (ISNN’06), LNCS 3971, Chengdu, China (pp. 470–475).
Tan A.-H., Lu N., Xiao D. (2008) Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Transactions on Neural Networks 9(2): 230–244
Google Scholar
Carpenter G. A., Grossberg S. (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing 37: 54–115
Article MATH Google Scholar
Carpenter G. A., Grossberg S. (1987) ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics 26: 4919–4930
Article Google Scholar
Xiao, D., & Tan, A. H. (2005). Cooperative cognitive agents and reinforcement learning in pursuit game. In Proceedings of third international conference on computational intelligence, robotics and autonomous systems (CIRAS’05), Singapore.
Xiao D., Tan A.-H. (2007) Self-organizing neural architectures and cooperative learning in multi-agent environment. IEEE Transactions on Systems, Man, and Cybernetics-Part B 37(6): 1567–1580
Article Google Scholar
Xiao, D., & Tan, A.-H. (2008). Scaling up multi-agent reinforcement learning in complex domains. In Proceedings of 2008 IEEE/WIC/ACM international conference on intelligent agent technology, Sydney (pp. 326–329).
Ahuja R. K., Magnanti T. L., Orlin J. B. (1991) Some recent advances in network flows. SIAM Review 33(2): 175–219
Article MathSciNet MATH Google Scholar
Weihmayer, R., & Velthuijsen, H. (1994). Application of distributed ai and cooperative problem solving to telecommunications. AI Approaches to Telecommunications and Network Management.
Brauer, W., & Weiß, G. (1998). Multi-machine scheduling—a multi-agent learning approach. In Proceedings of the third international conference on multi-agent systems (pp. 42–48).
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., & Osawa, E. (1997). Robocup: The robot world cup initiative. In Proceedings of the first international conference on autonomous agents (Agents97), New York, 5–8 Feb 1997 (pp. 340–347). New York: ACM Press.
Riley P., Veloso M. (2000) On behavior classification in adversarial environments. Distributed Autonomous Robotic Systems 4: 371–380
Article Google Scholar
Steeb R., Cammarata S., Hayes-Roth F., Thorndyke P., Wesson R. (1988) Distributed intelligence for air fleet control. Readings in Distributed Artificial Intelligence, 101(3): 90–101
Google Scholar
Huang, J., Jennings, N. R., & Fox, J. (1995). An agent architecture for distributed medical care. Intelligent Agents: Theories, Architectures, and Languages, LNAI 890.
Chalupsky, H., Gil, Y., Knoblock, C. A., Lerman, K., Oh, J., Pynadath, D., Russ, T., & Tambe, M. (2002). Electric elves: Agent technology for supporting human organizations. AI Magazine, 23.
Crawford, E., & Veloso, M. (2004). Opportunities for learning in multi-agent meeting scheduling. In Proceedings of artificial multiagent learning, Carnegie Mellon University, Pittsburgh, PA, USA, Technical Report FS-04-02.
Wooldridge, M., Bussmann, S., & Klosterberg, M. (1996). Production sequencing as negotiation. In Proceedings of first international conference on the practical application of intelligent agents and multi-agent technology (PAAM-96) (pp. 709–726).
Vannelli, A. (1989). An interior point method for solving the global routing problem. In Custom integrated circuits conference, 1989, Proceedings of the IEEE 1989, San Diego, CA, USA, 15–18 May 1989 (pp. 3.4/1–3.4/4).
Roling, P. C., & Visser, H. G. (2008). Optimal airport surface traffic planning using mixed-integer linear programming. International Journal of Aerospace Engineering, 2008, 1–11.
Hillier, F. S., Lieberman, G. J. (eds) (2001) Introduction to operations research. McGraw-Hill, Oakland, CA
Google Scholar
LeBlanc L.J., Hill J.A., Greenwell G.W., Czesnat A.O., Galbreth M.R. (2003) Optimizing nu-kote’s supply chain with linear programming. Interfaces, 34(2): 139–146
Article Google Scholar
Goutis C. (1995) A graphical method for solving a decision analysis problem. IEEE Transactions on Systems, Man and Cybernetics 25: 1181–1193
Article Google Scholar
Das B. C. (2010) Effect of graphical method for solving mathematical programming problem. Daffodil International University Journal of Science and Technology 5(1): 29–36
Article Google Scholar
Szozda N., Świerczek A. (2008) The success factors for supply chains of a short life cycle product. Total Logistic Management 1: 163–173
Google Scholar
Zhu, K. Q., Tan, K. C., & Lee, L. H. (2000). Heuristics for vehicle routing problem with time windows. In Proceedings of 6th international symposium on artificial intelligence and mathematics, AMAI 2000.
Sun, L.-J., Hu, X.-P., Li, Y.-X., Lu, J., & Yang, D.-L. (2008). A heuristic algorithm and a system for vehicle routing with multiple destinations in embedded equipment. In Proceedings of 7th international conference on mobile business, 2008, ICMB ’08, Barcelona, 7–8 July 2008 (pp. 1–8).
Lau R. R., Redlawsk D. P. (2001) Advantages and disadvantages of cognitive heuristics in political decision making. American Journal of Political Science 45(4): 951–971
Article Google Scholar
Wang P. (1994) Heuristics and normative models of judgment under uncertainty. International Journal of Approximate Reasoning 14: 221–235
Article Google Scholar
Rathnasabapathy, B., & Gmytrasiewicz, P. (2002) Formalizing multi-agent pomdps in the context of network routing. AAAI Technical Report WS-02-12, Department of Computer Science, University of Illinois at Chicago.
Schurr, N. (2007). Toward human-multiagent teams, Ph.D. thesis, Faculty of the Graduate School, University of Southern California, Los Angeles, CA, USA.
Han, J. (2006). Network-adaptive qos routing using local information. In Proceedings of 9th Asia-Pacific network operations and management symposium, APNOMS 2006, Pusan, Korea, 27–29 September 2006 (pp. 190–199).
Munetomo, M., Takai, Y., & Sato, Y. (1997). An intelligent network routing algorithm by a genetic algorithm. In Proceedings of fourth international conference on neural information processing (pp. 547–550).
Hu X.-B., Paolo E. D. (2009) An efficient genetic algorithm with uniform crossover for air traffic control. Computers and Operations Research 36(1): 245–259
Article MATH Google Scholar
Yeh W.-C. (2006) An efficient memetic algorithm for the multi-stage supply chain network problem. International Journal of Advanced Manufacturing Technology 29(7–8): 803–813
Article Google Scholar
Han C.-W., Nobuhara H. (2007) Advanced genetic algorithms based on adaptive partitioning method. Journal of Advanced Computational Intelligence and Intelligent Informatics 11(6): 677–680
Google Scholar
Littman, M., & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Tech. Rep., Robotics Institute, Pittsburgh, PA, USA, CMU-CS-93-165.
Baek J.-G., Kim C. O., Kwon I.-H. (2006) An adaptive inventory control model for a supply chain with nonstationary customer demands. Computers and Operations Research 4099/2006: 895–900
Google Scholar
Wan, A. D. M., & Braspenning, P. J. (1995). The bifurcation of DAI and adaptivism as synthesis. In Proceedings of the 1995 Dutch conference on AI (NAIC) (pp. 253–262).
Leopold, T., Kern-Isberner, G., & Peters, G. (2008). Combining reinforcement learning and belief revision—a learning system for active vision. In Proceedings of the 19th British machine vision conference, Leeds, UK, 1–4 Sep 2008.
Stephan, V., Debes, K., Gross, H.-M., Wintrich, F., & Wintrich, H. (2000). A reinforcement learning based neural multiagent system for controlof a combustion process. In Proceedings of IEEE-INNS-ENNS international joint conference on IJCNN 2000, Como, Italy (Vol. 6, pp. 217–222).
Bradley, J., & Hayes, G. (2005). Adapting reinforcement learning for computer games: Using group utility functions. In Proceedings of IEEE symposium on computational intelligence and games, Colchester, Essex, UK.
Tan, A.-H., & Xiao, D. (2005). Self-organizing cognitive agents and reinforcement learning in a multi-agent environment. In Proceedings of IEEE/ACM/WIC international conference on intelligent agent technologies (pp. 351–357).
Wolpert, D. H., Tumer, K., & Frank, J. (1999). Using collective intelligence to route internet traffic. In Proceedings of the 1998 conference on advances in neural information processing systems II, Cambridge, MA, USA (pp. 952–958). Cambridge, MA: MIT Press.
Subramanian, D., Druschel, P., & Chen, J. (1997). Ants and reinforcement learning: A case study in routing in dynamic networks. In Proceedings of fifteenth international joint conference on artificial intelligence (IJCAI-97) (pp. 832–838). San Francisco, CA: Morgan Kaufmann.
Moore, B. (1988). ART 1 and pattern clustering. In Proceedings of 1988 connectionist models summer school (pp. 174–185).
Carpenter G. A., Grossberg S., Rosen D. B. (1991) Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks 4: 759–771
Article Google Scholar
Tan, A.-H. (2007). Direct code access in self-organizing neural architectures for reinforcement learning. In Proceedings, international joint conference on artificial intelligence (IJCAI07), Hyderabad, India (pp. 1071–1076).
Pérez-Uribe, A. (2002). Structure-adaptable digital neural networks, Ph.D. thesis, Swiss Federal Institute of Technology-Lausanne.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of 10th international conference on machine learning (pp. 330–337).
Onat, A. (1998). Q-learning with recurrent neural networks as a controller for the inverted pendulum problem. In Proceedings of the fifth international conference on neural information processing, Japan, 21–23 Oct 1998 (pp. 837–840).
Sandholm, T., & Crites, R. H. (1995). On multiagent q-learning in a semi-competitive domain. In Proceedings of the workshop on adaption and learning in multi-agent systems (IJCAI’95), London, UK (pp. 191–205). London: Springer.
Benhamadou M. (2002) On the simplex algorithm ’revised form. Advances in Engineering Software 33(11–12): 769–777
Article Google Scholar
Dantzig G. B., Orden A., Wolfe P. (1955) The generalized simplex method for minimizing a linear form under linear inequality restraints. Pacific Journal of Mathematics 5(2): 183–195
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798, Singapore
Dan Xiao & Ah-Hwee Tan

Authors

Dan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Ah-Hwee Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Xiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, D., Tan, AH. Cooperative reinforcement learning in topology-based multi-agent systems. Auton Agent Multi-Agent Syst 26, 86–119 (2013). https://doi.org/10.1007/s10458-011-9183-4

Download citation

Published: 07 October 2011
Issue Date: January 2013
DOI: https://doi.org/10.1007/s10458-011-9183-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cooperative reinforcement learning in topology-based multi-agent systems

Abstract

Access this article

Similar content being viewed by others

Robust Multi-agent Patrolling Strategies Using Reinforcement Learning

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Constrained Multiagent Reinforcement Learning for Large Agent Population

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cooperative reinforcement learning in topology-based multi-agent systems

Abstract

Access this article

Similar content being viewed by others

Robust Multi-agent Patrolling Strategies Using Reinforcement Learning

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Constrained Multiagent Reinforcement Learning for Large Agent Population

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation