Abstract
Sophisticated agents operating in open environments must make decisions that efficiently trade off the use of their limited resources between dynamic deliberative actions and domain actions. This is the meta-level control problem for agents operating in resource-bounded multi-agent environments. Control activities involve decisions on when to invoke and the amount to effort to put into scheduling and coordination of domain activities. The focus of this paper is how to make effective meta-level control decisions. We show that meta-level control with bounded computational overhead allows complex agents to solve problems more efficiently than current approaches in dynamic open multi-agent environments. The meta-level control approach that we present is based on the decision-theoretic use of an abstract representation of the agent state. This abstraction concisely captures critical information necessary for decision making while bounding the cost of meta-level control and is appropriate for use in automatically learning the meta-level control policies.
Similar content being viewed by others
References
Barto, A., Sutton, R., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834–846.
Bertsekas D., Tsitsiklis J. (1996). Neuro-dynamic programming. Athena Scientific, Belmont, MA
Boddy M., Dean T. (1994). Decision-theoretic deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence 67(2): 245–286
Boutlier, C. (1999). Sequential optimality and coordination in multiagent systems. In Proceedings of the sixteenth international joint conference on artificial intelligence.
Crites R., Barto A. (1996). Improving elevator performance using reinforcement learning, Multi-ag In Advances in Neural Information Processing Systems, pages 8: 1017–1023
Dean, T., & Boddy, M. (1988). An analysis of time-dependent planning. In Proceedings of the seventh national conference on artificial intelligence (AAAI-88) (pp. 49–54). Saint Paul, Minnesota, USA: AAAI Press/MIT Press.
Decker, K. (1996). Taems: a framework for environment centered analysis and design of coordination mechanisms. In G. O’Hare & N. Jennings, (Eds.), Foundations of Distributed Artificial Intelligence, Chapter 16 (pp. 429–448). Wiley Inter-Science.
Dietterich T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13: 227–303
Doyle J. (1983). What is rational psychology? toward a modern mental philosophy. AI Magazine 4(3): 50–53
Garvey, A. & Lesser, V. (1996). Issues in design-to-time real-time scheduling. In AAAI Fall 1996 symposium on flexible computation, November.
Georgeff, M. & Lansky, A. (1987). Reactive reasoning and planning. In Proceedings of the sixth national conference on artificial intelligence, pp. 677–682 Seattle, WA.
Goldman, R., Musliner, D. & Krebsbach, K. (2003). Managing online self-adaptation in real-time environments. In LNCS, vol. 2614, SV, pp. 6–23.
Good, I. J. (1971). Twenty-seven principles of rationality. In V. P. Godambe & D. A. Sprott, (Eds.), Foundations of statistical inference (pp. 108–141). Toronto: Holt Rinehart Wilson.
Hansen E., Zilberstein S. (1996). Monitoring anytime algorithms. SIGART Bulletin 7(2): 28–33
Harada, D. & Russell, S. (1999). Extended abstract: Learning search strategies. In Proceedings AAAI spring symposium on search techniques for problem solving under uncertainty and incomplete information, Stanford, CA, 1999.
Hayes-Roth, B. (1993). Opportunistic control of action in intelligent agents. In Proceedings of IEEE transactions on systems, man and cybernetics, pp. SMC–23(6), 1575–1587.
Hayes-Roth, B., Uckun, S., Larsson, J. E., Gaba, D., Barr, J. & Chien, J. (1994). Guardian: A prototype intelligent agent for intensive-care monitoring. In Proceedings of the national conference on artificial intelligence, pp. 1503–1511.
Horling, B., Lesser, V. & Vincent, R. (2000). Multi-agent system simulation framework. In sixteenth IMACS World Congress 2000 on scientific computation, applied mathematics and simulation. Switzerland: EPFL, Lausanne.
Horling B., Lesser V., Vincent R., Wagner T. (2006). The soft real-time agent control architecture. Autonomous Agents and Multi-Agent Systems 12(1): 35–92
Horvitz, E. (1988). Reasoning under varying and uncertain resource constraints. In National conference on artificial intelligence of the american association for AI (AAAI-88), pp. 111–116.
Kaelbling, L. (1990). Learning in embedded systems. PhD thesis, Stanford University.
Kinney, M. & Tsatsoulis, C. (1998). Learning communication strategies in multiagent systems. Applied intelligence, 9(1), 71–91.
Kuwabara, K. (1996). Meta-level control of coordination protocols. In Proceedings of the third international conference on multi-agent systems (ICMAS96). pp. 104–111.
Lagoudakis, M. & Littman, M. (2000). Reinforcement learning for algorithm selection. In Proceedings of the seventeenth national conference on artificial intelligence (AAAI-2000), pp. 1081.
Littman, M. & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Technical Report CS-93-165.
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning (ML-94) (pp. 157–163). Morgan Kaufmann: New Brunswick, NJ.
Mataric, M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1), 73–83.
Musliner, D. J., Hendler, J. A., Agrawala, A. K., Durfee, E. H., Strosnider, J. K. & Paul, C. J. (1995). The Challenges of real-time AI. IEEE Computer, 28(1), 58–66.
Musliner, D. (1996). Plan execution in mission-critical domains. In Working notes of the AAAI fall symposium on plan execution–problems and issues.
Nakakuki, Y. & Sadeh, N. (1994). Increasing the efficiency of simulated annealing search by learning to recognize (un)promising runs. In Proceedings of the twelfth national conference on artificial intelligence (AAAI-94), pp. 1316–1322.
Parr, R. & Russell, S. (1997). Reinforcement learning with hierarchies of machines. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, vol. 10, The MIT Press.
Puterman, M. L. (1994). Markov decision processes – discrete stochastic dynamic programming. Games as a Framework for Multi-Agent Reinforcement Learning. New York: John Wiley and Sons, Inc.
Raja, A. (2003). Meta-level control in multi-agent systems. PhD thesis, University of Massachusetts at Amherst, Amherst, Massachusetts.
Raja, A., Alexander, G. & Mappillai, V. (2006). Leveraging problem classification in online meta-cognition. In Proceedings of AAAI 2006 spring symposium on distributed plan and schedule management (pp. 97–104) Stanford.
Raja, A., Lesser, V., & Wagner, T. (2000). Toward Robust Agent Control in Open Environments. In Proceedings of the fourth international conference on autonomous agents (pp. 84–91). Barcelona, Catalonia, Spain: ACM Press.
Russell, S. & Norvig, P. (1995). Artificial intelligence: A modern approach. Prentice Hall.
Russell, S. & Wefald, E. (1992). Do the right thing: studies in limited rationality. MIT press.
Russell, S. J., Subramanian, D. & Parr, R. (1993). Provably bounded optimal agents. In Proceedings of the thirteenth international joint conference on artificial intelligence (IJCAI-93), pp. 338–344.
Russell, S. & Wefald, E. (1989). Principles of metareasoning. In Proceedings of the first international conference on principles of knowledge representation and reasoning. pp. 400–411.
Sandholm T., Crites R. (1995). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems Journal 37: 147–166
Schut, M. & Wooldridge, M. (2001). The control of reasoning in resource-bounded agents. Knowledge Engineering Review, 16(3), 215–240.
Sen, S., Sekaran, M. & Hale, J. (1994). Learning to coordinate without sharing information. In Proceedings of the twelfth national conference on artificial intelligence, (pp. 426–431), Seattle, WA.
Simon, H., Latsis, S. J. (Ed.) (1976). From substantive to procedural rationality. In Method and Appraisal in Economic. Cambridge University Press, pp. 129–148.
Simon H., Kadane J. (1974). Optimal problem solving search: All-or-none solutions. Artificial Intelligence 6: 235–247
Simon, H. (1982). Models of bounded rationality. vol. 1. Cambridge, MA: The MIT Press.
Singh, S., Kearns, M., Litman, D. & Walker, M. (2000). Empirical evaluation of a reinforcement learning spoken dialogue system. In Proceedings of the seventeenth national conference on artificial intelligence, pp. 645–651.
Sugawara, T. & Lesser, V. (1993). On-line learning of coordination plans. In Proceedings of the twelth international workshop on distributed artificial intelligence, pp. 335–345,371–377.
Sutton, R. & Barto, A. (1998). Reinforcement learning. MIT Press.
Sutton, R. (1984). Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst.
Sutton R. (1988). Learning to predict by the method of temporal differences. Machine Learning 3(1): 9–44
Sutton, R., Precup, D. & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330–337.
Vincent, R., Horling, B. & Lesser, V. (2001). An agent infrastructure to build and evaluate multi-agent systems: The java agent framework and multi-agent system simulator. In Wagner and Rana (Eds.), Lecture notes in artificial intelligence: infrastructure for agents, multi-agent systems, and scalable multi-agent systems, vol. 1887. Springer.
Wagner, T., Garvey, A. & Lesser, V. (1998). Criteria-directed heuristic task scheduling. International Journal of Approximate Reasoning, Special Issue on Scheduling, 19(1–2), 91–118. A version also available as UMASS CS TR-97-59.
Watkins, C. (1989). Learning from delayed rewards. PhD thesis, Cambridge, England.
Whitehead S.D., Ballard D.H. (1991). Learning to perceive and act by trial and error. Machine Learning 7(1): 45–83
Zhang, X. & Lesser, V. (2002). Multi-linked negotiation in multi-agent system. In Proceedings of the first international joint conference on autonomous agents and multiagent systems (AAMAS 2002), pp. 1207–1214.
Zilberstein S., Mouaddib A. (1999). Reactive control of dynamic progressive processing. IJCAI, 1268–1273
Zilberstein, S. & Russell, S. J. (1992). Efficient resource-bounded reasoning in AT-RALPH. In James Hendler, (Edn.), Proceedings of the first international conference of artificial intelligence planning systems (AIPS 92) (pp. 260–268) Morgan Kaufmann: College Park, Maryland, USA.
Zilberstein S., Russell S.J. (1996). Optimal composition of real-time systems. Artificial Intelligence 82(1–2):181–213
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. International Conference in Machine Learning, pp. 929–936.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raja, A., Lesser, V. A framework for meta-level control in multi-agent systems. Auton Agent Multi-Agent Syst 15, 147–196 (2007). https://doi.org/10.1007/s10458-006-9008-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-006-9008-z