Skip to main content
Log in

Cooperative reinforcement learning in topology-based multi-agent systems

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual agents and enable policy sharing across agents. Our complexity analysis indicates that multi-agent systems with the BTF have a much smaller state space and a higher level of flexibility, compared with the general form of n-ary (n > 2) tree formation. We have applied the proposed cooperative learning strategy to a class of reinforcement learning agents known as temporal difference-fusion architecture for learning and cognition (TD-FALCON). Comparative experiments based on a generic network routing problem, which is a typical TMAS domain, show that the TD-FALCON BTF teams outperform alternative methods, including TD-FALCON teams in single agent and n-ary tree formation, a Q-learning method based on the table lookup mechanism, as well as a classical linear programming algorithm. Our study further shows that TD-FALCON BTF can adapt and function well under various scales of network complexity and traffic volume in TMAS domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Panait, L., & Luke, S. (2003). Cooperative multi-agent learning: The state of the art. Tech. Rep., George Mason University, Technical Report GMU-CS-TR-2003-1.

  2. Busoniu, L., Babuska, R., & De Schutter, B. (2006). Multi-agent reinforcement learning: A survey. In Proceedings of 9th international conference on control, automation, robotics and vision (ICARCV) (pp. 1–6).

  3. Lesser, V. R., Corkill, D. D., & Durfee, E. H. (1987). An update on the distributed vehicle monitoring testbed. Tech. Rep., Computer and Information Science Department, Amherst, MA, USA.

  4. Nunes, L., & Oliveira, E. (2004). Learning from multiple sources. In Proceedings of third international joint conference on autonomous agents and multi agent systems (AAMAS-2004).

  5. Boyan J. A., Littman M. L. (1994) Packet routing in dynamically changing networks: A reinforcement learning approach. In: Cowan J. D., Tesauro G., Alspector J. (eds) Advances in neural information processing systems. Morgan Kaufmann Publishers Inc, San Francisco, CA, pp 671–678

    Google Scholar 

  6. Chang Y. H., Ho, T., & Kaelbling L. P. (2004). Mobilized ad-hoc networks: A reinforcement learning approach. In Proceedings of 2004 international conference on autonomic computing (pp. 240–247).

  7. Schneider, J., Wong, W. K., Moore, A., & Riedmiller, M. (1999). Distributed value functions. In Proceedings of 16th international conference on machine learning(pp. 371–378). San Francisco, CA: Morgan Kaufmann.

  8. Varga L. Z., Jennings N. R., Cockburn D. (1994) Integrating intelligent systems into a cooperating community for electricity distribution management. Expert Systems with Applications 7(4): 563–579

    Article  Google Scholar 

  9. Tan, A.-H. (2006). Self-organizing neural architecture for reinforcement learning. In Proceedings of international symposium on neural networks (ISNN’06), LNCS 3971, Chengdu, China (pp. 470–475).

  10. Tan A.-H., Lu N., Xiao D. (2008) Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Transactions on Neural Networks 9(2): 230–244

    Google Scholar 

  11. Carpenter G. A., Grossberg S. (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing 37: 54–115

    Article  MATH  Google Scholar 

  12. Carpenter G. A., Grossberg S. (1987) ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics 26: 4919–4930

    Article  Google Scholar 

  13. Xiao, D., & Tan, A. H. (2005). Cooperative cognitive agents and reinforcement learning in pursuit game. In Proceedings of third international conference on computational intelligence, robotics and autonomous systems (CIRAS’05), Singapore.

  14. Xiao D., Tan A.-H. (2007) Self-organizing neural architectures and cooperative learning in multi-agent environment. IEEE Transactions on Systems, Man, and Cybernetics-Part B 37(6): 1567–1580

    Article  Google Scholar 

  15. Xiao, D., & Tan, A.-H. (2008). Scaling up multi-agent reinforcement learning in complex domains. In Proceedings of 2008 IEEE/WIC/ACM international conference on intelligent agent technology, Sydney (pp. 326–329).

  16. Ahuja R. K., Magnanti T. L., Orlin J. B. (1991) Some recent advances in network flows. SIAM Review 33(2): 175–219

    Article  MathSciNet  MATH  Google Scholar 

  17. Weihmayer, R., & Velthuijsen, H. (1994). Application of distributed ai and cooperative problem solving to telecommunications. AI Approaches to Telecommunications and Network Management.

  18. Brauer, W., & Weiß, G. (1998). Multi-machine scheduling—a multi-agent learning approach. In Proceedings of the third international conference on multi-agent systems (pp. 42–48).

  19. Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., & Osawa, E. (1997). Robocup: The robot world cup initiative. In Proceedings of the first international conference on autonomous agents (Agents97), New York, 5–8 Feb 1997 (pp. 340–347). New York: ACM Press.

  20. Riley P., Veloso M. (2000) On behavior classification in adversarial environments. Distributed Autonomous Robotic Systems 4: 371–380

    Article  Google Scholar 

  21. Steeb R., Cammarata S., Hayes-Roth F., Thorndyke P., Wesson R. (1988) Distributed intelligence for air fleet control. Readings in Distributed Artificial Intelligence, 101(3): 90–101

    Google Scholar 

  22. Huang, J., Jennings, N. R., & Fox, J. (1995). An agent architecture for distributed medical care. Intelligent Agents: Theories, Architectures, and Languages, LNAI 890.

  23. Chalupsky, H., Gil, Y., Knoblock, C. A., Lerman, K., Oh, J., Pynadath, D., Russ, T., & Tambe, M. (2002). Electric elves: Agent technology for supporting human organizations. AI Magazine, 23.

  24. Crawford, E., & Veloso, M. (2004). Opportunities for learning in multi-agent meeting scheduling. In Proceedings of artificial multiagent learning, Carnegie Mellon University, Pittsburgh, PA, USA, Technical Report FS-04-02.

  25. Wooldridge, M., Bussmann, S., & Klosterberg, M. (1996). Production sequencing as negotiation. In Proceedings of first international conference on the practical application of intelligent agents and multi-agent technology (PAAM-96) (pp. 709–726).

  26. Vannelli, A. (1989). An interior point method for solving the global routing problem. In Custom integrated circuits conference, 1989, Proceedings of the IEEE 1989, San Diego, CA, USA, 15–18 May 1989 (pp. 3.4/1–3.4/4).

  27. Roling, P. C., & Visser, H. G. (2008). Optimal airport surface traffic planning using mixed-integer linear programming. International Journal of Aerospace Engineering, 2008, 1–11.

  28. Hillier, F. S., Lieberman, G. J. (eds) (2001) Introduction to operations research. McGraw-Hill, Oakland, CA

    Google Scholar 

  29. LeBlanc L.J., Hill J.A., Greenwell G.W., Czesnat A.O., Galbreth M.R. (2003) Optimizing nu-kote’s supply chain with linear programming. Interfaces, 34(2): 139–146

    Article  Google Scholar 

  30. Goutis C. (1995) A graphical method for solving a decision analysis problem. IEEE Transactions on Systems, Man and Cybernetics 25: 1181–1193

    Article  Google Scholar 

  31. Das B. C. (2010) Effect of graphical method for solving mathematical programming problem. Daffodil International University Journal of Science and Technology 5(1): 29–36

    Article  Google Scholar 

  32. Szozda N., Świerczek A. (2008) The success factors for supply chains of a short life cycle product. Total Logistic Management 1: 163–173

    Google Scholar 

  33. Zhu, K. Q., Tan, K. C., & Lee, L. H. (2000). Heuristics for vehicle routing problem with time windows. In Proceedings of 6th international symposium on artificial intelligence and mathematics, AMAI 2000.

  34. Sun, L.-J., Hu, X.-P., Li, Y.-X., Lu, J., & Yang, D.-L. (2008). A heuristic algorithm and a system for vehicle routing with multiple destinations in embedded equipment. In Proceedings of 7th international conference on mobile business, 2008, ICMB ’08, Barcelona, 7–8 July 2008 (pp. 1–8).

  35. Lau R. R., Redlawsk D. P. (2001) Advantages and disadvantages of cognitive heuristics in political decision making. American Journal of Political Science 45(4): 951–971

    Article  Google Scholar 

  36. Wang P. (1994) Heuristics and normative models of judgment under uncertainty. International Journal of Approximate Reasoning 14: 221–235

    Article  Google Scholar 

  37. Rathnasabapathy, B., & Gmytrasiewicz, P. (2002) Formalizing multi-agent pomdps in the context of network routing. AAAI Technical Report WS-02-12, Department of Computer Science, University of Illinois at Chicago.

  38. Schurr, N. (2007). Toward human-multiagent teams, Ph.D. thesis, Faculty of the Graduate School, University of Southern California, Los Angeles, CA, USA.

  39. Han, J. (2006). Network-adaptive qos routing using local information. In Proceedings of 9th Asia-Pacific network operations and management symposium, APNOMS 2006, Pusan, Korea, 27–29 September 2006 (pp. 190–199).

  40. Munetomo, M., Takai, Y., & Sato, Y. (1997). An intelligent network routing algorithm by a genetic algorithm. In Proceedings of fourth international conference on neural information processing (pp. 547–550).

  41. Hu X.-B., Paolo E. D. (2009) An efficient genetic algorithm with uniform crossover for air traffic control. Computers and Operations Research 36(1): 245–259

    Article  MATH  Google Scholar 

  42. Yeh W.-C. (2006) An efficient memetic algorithm for the multi-stage supply chain network problem. International Journal of Advanced Manufacturing Technology 29(7–8): 803–813

    Article  Google Scholar 

  43. Han C.-W., Nobuhara H. (2007) Advanced genetic algorithms based on adaptive partitioning method. Journal of Advanced Computational Intelligence and Intelligent Informatics 11(6): 677–680

    Google Scholar 

  44. Littman, M., & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Tech. Rep., Robotics Institute, Pittsburgh, PA, USA, CMU-CS-93-165.

  45. Baek J.-G., Kim C. O., Kwon I.-H. (2006) An adaptive inventory control model for a supply chain with nonstationary customer demands. Computers and Operations Research 4099/2006: 895–900

    Google Scholar 

  46. Wan, A. D. M., & Braspenning, P. J. (1995). The bifurcation of DAI and adaptivism as synthesis. In Proceedings of the 1995 Dutch conference on AI (NAIC) (pp. 253–262).

  47. Leopold, T., Kern-Isberner, G., & Peters, G. (2008). Combining reinforcement learning and belief revision—a learning system for active vision. In Proceedings of the 19th British machine vision conference, Leeds, UK, 1–4 Sep 2008.

  48. Stephan, V., Debes, K., Gross, H.-M., Wintrich, F., & Wintrich, H. (2000). A reinforcement learning based neural multiagent system for controlof a combustion process. In Proceedings of IEEE-INNS-ENNS international joint conference on IJCNN 2000, Como, Italy (Vol. 6, pp. 217–222).

  49. Bradley, J., & Hayes, G. (2005). Adapting reinforcement learning for computer games: Using group utility functions. In Proceedings of IEEE symposium on computational intelligence and games, Colchester, Essex, UK.

  50. Tan, A.-H., & Xiao, D. (2005). Self-organizing cognitive agents and reinforcement learning in a multi-agent environment. In Proceedings of IEEE/ACM/WIC international conference on intelligent agent technologies (pp. 351–357).

  51. Wolpert, D. H., Tumer, K., & Frank, J. (1999). Using collective intelligence to route internet traffic. In Proceedings of the 1998 conference on advances in neural information processing systems II, Cambridge, MA, USA (pp. 952–958). Cambridge, MA: MIT Press.

  52. Subramanian, D., Druschel, P., & Chen, J. (1997). Ants and reinforcement learning: A case study in routing in dynamic networks. In Proceedings of fifteenth international joint conference on artificial intelligence (IJCAI-97) (pp. 832–838). San Francisco, CA: Morgan Kaufmann.

  53. Moore, B. (1988). ART 1 and pattern clustering. In Proceedings of 1988 connectionist models summer school (pp. 174–185).

  54. Carpenter G. A., Grossberg S., Rosen D. B. (1991) Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks 4: 759–771

    Article  Google Scholar 

  55. Tan, A.-H. (2007). Direct code access in self-organizing neural architectures for reinforcement learning. In Proceedings, international joint conference on artificial intelligence (IJCAI07), Hyderabad, India (pp. 1071–1076).

  56. Pérez-Uribe, A. (2002). Structure-adaptable digital neural networks, Ph.D. thesis, Swiss Federal Institute of Technology-Lausanne.

  57. Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of 10th international conference on machine learning (pp. 330–337).

  58. Onat, A. (1998). Q-learning with recurrent neural networks as a controller for the inverted pendulum problem. In Proceedings of the fifth international conference on neural information processing, Japan, 21–23 Oct 1998 (pp. 837–840).

  59. Sandholm, T., & Crites, R. H. (1995). On multiagent q-learning in a semi-competitive domain. In Proceedings of the workshop on adaption and learning in multi-agent systems (IJCAI’95), London, UK (pp. 191–205). London: Springer.

  60. Benhamadou M. (2002) On the simplex algorithm ’revised form. Advances in Engineering Software 33(11–12): 769–777

    Article  Google Scholar 

  61. Dantzig G. B., Orden A., Wolfe P. (1955) The generalized simplex method for minimizing a linear form under linear inequality restraints. Pacific Journal of Mathematics 5(2): 183–195

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Xiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, D., Tan, AH. Cooperative reinforcement learning in topology-based multi-agent systems. Auton Agent Multi-Agent Syst 26, 86–119 (2013). https://doi.org/10.1007/s10458-011-9183-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-011-9183-4

Keywords

Navigation