Abstract
Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual agents and enable policy sharing across agents. Our complexity analysis indicates that multi-agent systems with the BTF have a much smaller state space and a higher level of flexibility, compared with the general form of n-ary (n > 2) tree formation. We have applied the proposed cooperative learning strategy to a class of reinforcement learning agents known as temporal difference-fusion architecture for learning and cognition (TD-FALCON). Comparative experiments based on a generic network routing problem, which is a typical TMAS domain, show that the TD-FALCON BTF teams outperform alternative methods, including TD-FALCON teams in single agent and n-ary tree formation, a Q-learning method based on the table lookup mechanism, as well as a classical linear programming algorithm. Our study further shows that TD-FALCON BTF can adapt and function well under various scales of network complexity and traffic volume in TMAS domains.
Similar content being viewed by others
References
Panait, L., & Luke, S. (2003). Cooperative multi-agent learning: The state of the art. Tech. Rep., George Mason University, Technical Report GMU-CS-TR-2003-1.
Busoniu, L., Babuska, R., & De Schutter, B. (2006). Multi-agent reinforcement learning: A survey. In Proceedings of 9th international conference on control, automation, robotics and vision (ICARCV) (pp. 1–6).
Lesser, V. R., Corkill, D. D., & Durfee, E. H. (1987). An update on the distributed vehicle monitoring testbed. Tech. Rep., Computer and Information Science Department, Amherst, MA, USA.
Nunes, L., & Oliveira, E. (2004). Learning from multiple sources. In Proceedings of third international joint conference on autonomous agents and multi agent systems (AAMAS-2004).
Boyan J. A., Littman M. L. (1994) Packet routing in dynamically changing networks: A reinforcement learning approach. In: Cowan J. D., Tesauro G., Alspector J. (eds) Advances in neural information processing systems. Morgan Kaufmann Publishers Inc, San Francisco, CA, pp 671–678
Chang Y. H., Ho, T., & Kaelbling L. P. (2004). Mobilized ad-hoc networks: A reinforcement learning approach. In Proceedings of 2004 international conference on autonomic computing (pp. 240–247).
Schneider, J., Wong, W. K., Moore, A., & Riedmiller, M. (1999). Distributed value functions. In Proceedings of 16th international conference on machine learning(pp. 371–378). San Francisco, CA: Morgan Kaufmann.
Varga L. Z., Jennings N. R., Cockburn D. (1994) Integrating intelligent systems into a cooperating community for electricity distribution management. Expert Systems with Applications 7(4): 563–579
Tan, A.-H. (2006). Self-organizing neural architecture for reinforcement learning. In Proceedings of international symposium on neural networks (ISNN’06), LNCS 3971, Chengdu, China (pp. 470–475).
Tan A.-H., Lu N., Xiao D. (2008) Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Transactions on Neural Networks 9(2): 230–244
Carpenter G. A., Grossberg S. (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing 37: 54–115
Carpenter G. A., Grossberg S. (1987) ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics 26: 4919–4930
Xiao, D., & Tan, A. H. (2005). Cooperative cognitive agents and reinforcement learning in pursuit game. In Proceedings of third international conference on computational intelligence, robotics and autonomous systems (CIRAS’05), Singapore.
Xiao D., Tan A.-H. (2007) Self-organizing neural architectures and cooperative learning in multi-agent environment. IEEE Transactions on Systems, Man, and Cybernetics-Part B 37(6): 1567–1580
Xiao, D., & Tan, A.-H. (2008). Scaling up multi-agent reinforcement learning in complex domains. In Proceedings of 2008 IEEE/WIC/ACM international conference on intelligent agent technology, Sydney (pp. 326–329).
Ahuja R. K., Magnanti T. L., Orlin J. B. (1991) Some recent advances in network flows. SIAM Review 33(2): 175–219
Weihmayer, R., & Velthuijsen, H. (1994). Application of distributed ai and cooperative problem solving to telecommunications. AI Approaches to Telecommunications and Network Management.
Brauer, W., & Weiß, G. (1998). Multi-machine scheduling—a multi-agent learning approach. In Proceedings of the third international conference on multi-agent systems (pp. 42–48).
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., & Osawa, E. (1997). Robocup: The robot world cup initiative. In Proceedings of the first international conference on autonomous agents (Agents97), New York, 5–8 Feb 1997 (pp. 340–347). New York: ACM Press.
Riley P., Veloso M. (2000) On behavior classification in adversarial environments. Distributed Autonomous Robotic Systems 4: 371–380
Steeb R., Cammarata S., Hayes-Roth F., Thorndyke P., Wesson R. (1988) Distributed intelligence for air fleet control. Readings in Distributed Artificial Intelligence, 101(3): 90–101
Huang, J., Jennings, N. R., & Fox, J. (1995). An agent architecture for distributed medical care. Intelligent Agents: Theories, Architectures, and Languages, LNAI 890.
Chalupsky, H., Gil, Y., Knoblock, C. A., Lerman, K., Oh, J., Pynadath, D., Russ, T., & Tambe, M. (2002). Electric elves: Agent technology for supporting human organizations. AI Magazine, 23.
Crawford, E., & Veloso, M. (2004). Opportunities for learning in multi-agent meeting scheduling. In Proceedings of artificial multiagent learning, Carnegie Mellon University, Pittsburgh, PA, USA, Technical Report FS-04-02.
Wooldridge, M., Bussmann, S., & Klosterberg, M. (1996). Production sequencing as negotiation. In Proceedings of first international conference on the practical application of intelligent agents and multi-agent technology (PAAM-96) (pp. 709–726).
Vannelli, A. (1989). An interior point method for solving the global routing problem. In Custom integrated circuits conference, 1989, Proceedings of the IEEE 1989, San Diego, CA, USA, 15–18 May 1989 (pp. 3.4/1–3.4/4).
Roling, P. C., & Visser, H. G. (2008). Optimal airport surface traffic planning using mixed-integer linear programming. International Journal of Aerospace Engineering, 2008, 1–11.
Hillier, F. S., Lieberman, G. J. (eds) (2001) Introduction to operations research. McGraw-Hill, Oakland, CA
LeBlanc L.J., Hill J.A., Greenwell G.W., Czesnat A.O., Galbreth M.R. (2003) Optimizing nu-kote’s supply chain with linear programming. Interfaces, 34(2): 139–146
Goutis C. (1995) A graphical method for solving a decision analysis problem. IEEE Transactions on Systems, Man and Cybernetics 25: 1181–1193
Das B. C. (2010) Effect of graphical method for solving mathematical programming problem. Daffodil International University Journal of Science and Technology 5(1): 29–36
Szozda N., Świerczek A. (2008) The success factors for supply chains of a short life cycle product. Total Logistic Management 1: 163–173
Zhu, K. Q., Tan, K. C., & Lee, L. H. (2000). Heuristics for vehicle routing problem with time windows. In Proceedings of 6th international symposium on artificial intelligence and mathematics, AMAI 2000.
Sun, L.-J., Hu, X.-P., Li, Y.-X., Lu, J., & Yang, D.-L. (2008). A heuristic algorithm and a system for vehicle routing with multiple destinations in embedded equipment. In Proceedings of 7th international conference on mobile business, 2008, ICMB ’08, Barcelona, 7–8 July 2008 (pp. 1–8).
Lau R. R., Redlawsk D. P. (2001) Advantages and disadvantages of cognitive heuristics in political decision making. American Journal of Political Science 45(4): 951–971
Wang P. (1994) Heuristics and normative models of judgment under uncertainty. International Journal of Approximate Reasoning 14: 221–235
Rathnasabapathy, B., & Gmytrasiewicz, P. (2002) Formalizing multi-agent pomdps in the context of network routing. AAAI Technical Report WS-02-12, Department of Computer Science, University of Illinois at Chicago.
Schurr, N. (2007). Toward human-multiagent teams, Ph.D. thesis, Faculty of the Graduate School, University of Southern California, Los Angeles, CA, USA.
Han, J. (2006). Network-adaptive qos routing using local information. In Proceedings of 9th Asia-Pacific network operations and management symposium, APNOMS 2006, Pusan, Korea, 27–29 September 2006 (pp. 190–199).
Munetomo, M., Takai, Y., & Sato, Y. (1997). An intelligent network routing algorithm by a genetic algorithm. In Proceedings of fourth international conference on neural information processing (pp. 547–550).
Hu X.-B., Paolo E. D. (2009) An efficient genetic algorithm with uniform crossover for air traffic control. Computers and Operations Research 36(1): 245–259
Yeh W.-C. (2006) An efficient memetic algorithm for the multi-stage supply chain network problem. International Journal of Advanced Manufacturing Technology 29(7–8): 803–813
Han C.-W., Nobuhara H. (2007) Advanced genetic algorithms based on adaptive partitioning method. Journal of Advanced Computational Intelligence and Intelligent Informatics 11(6): 677–680
Littman, M., & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Tech. Rep., Robotics Institute, Pittsburgh, PA, USA, CMU-CS-93-165.
Baek J.-G., Kim C. O., Kwon I.-H. (2006) An adaptive inventory control model for a supply chain with nonstationary customer demands. Computers and Operations Research 4099/2006: 895–900
Wan, A. D. M., & Braspenning, P. J. (1995). The bifurcation of DAI and adaptivism as synthesis. In Proceedings of the 1995 Dutch conference on AI (NAIC) (pp. 253–262).
Leopold, T., Kern-Isberner, G., & Peters, G. (2008). Combining reinforcement learning and belief revision—a learning system for active vision. In Proceedings of the 19th British machine vision conference, Leeds, UK, 1–4 Sep 2008.
Stephan, V., Debes, K., Gross, H.-M., Wintrich, F., & Wintrich, H. (2000). A reinforcement learning based neural multiagent system for controlof a combustion process. In Proceedings of IEEE-INNS-ENNS international joint conference on IJCNN 2000, Como, Italy (Vol. 6, pp. 217–222).
Bradley, J., & Hayes, G. (2005). Adapting reinforcement learning for computer games: Using group utility functions. In Proceedings of IEEE symposium on computational intelligence and games, Colchester, Essex, UK.
Tan, A.-H., & Xiao, D. (2005). Self-organizing cognitive agents and reinforcement learning in a multi-agent environment. In Proceedings of IEEE/ACM/WIC international conference on intelligent agent technologies (pp. 351–357).
Wolpert, D. H., Tumer, K., & Frank, J. (1999). Using collective intelligence to route internet traffic. In Proceedings of the 1998 conference on advances in neural information processing systems II, Cambridge, MA, USA (pp. 952–958). Cambridge, MA: MIT Press.
Subramanian, D., Druschel, P., & Chen, J. (1997). Ants and reinforcement learning: A case study in routing in dynamic networks. In Proceedings of fifteenth international joint conference on artificial intelligence (IJCAI-97) (pp. 832–838). San Francisco, CA: Morgan Kaufmann.
Moore, B. (1988). ART 1 and pattern clustering. In Proceedings of 1988 connectionist models summer school (pp. 174–185).
Carpenter G. A., Grossberg S., Rosen D. B. (1991) Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks 4: 759–771
Tan, A.-H. (2007). Direct code access in self-organizing neural architectures for reinforcement learning. In Proceedings, international joint conference on artificial intelligence (IJCAI07), Hyderabad, India (pp. 1071–1076).
Pérez-Uribe, A. (2002). Structure-adaptable digital neural networks, Ph.D. thesis, Swiss Federal Institute of Technology-Lausanne.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of 10th international conference on machine learning (pp. 330–337).
Onat, A. (1998). Q-learning with recurrent neural networks as a controller for the inverted pendulum problem. In Proceedings of the fifth international conference on neural information processing, Japan, 21–23 Oct 1998 (pp. 837–840).
Sandholm, T., & Crites, R. H. (1995). On multiagent q-learning in a semi-competitive domain. In Proceedings of the workshop on adaption and learning in multi-agent systems (IJCAI’95), London, UK (pp. 191–205). London: Springer.
Benhamadou M. (2002) On the simplex algorithm ’revised form. Advances in Engineering Software 33(11–12): 769–777
Dantzig G. B., Orden A., Wolfe P. (1955) The generalized simplex method for minimizing a linear form under linear inequality restraints. Pacific Journal of Mathematics 5(2): 183–195
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, D., Tan, AH. Cooperative reinforcement learning in topology-based multi-agent systems. Auton Agent Multi-Agent Syst 26, 86–119 (2013). https://doi.org/10.1007/s10458-011-9183-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-011-9183-4