Skip to main content
Log in

A framework for meta-level control in multi-agent systems

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Sophisticated agents operating in open environments must make decisions that efficiently trade off the use of their limited resources between dynamic deliberative actions and domain actions. This is the meta-level control problem for agents operating in resource-bounded multi-agent environments. Control activities involve decisions on when to invoke and the amount to effort to put into scheduling and coordination of domain activities. The focus of this paper is how to make effective meta-level control decisions. We show that meta-level control with bounded computational overhead allows complex agents to solve problems more efficiently than current approaches in dynamic open multi-agent environments. The meta-level control approach that we present is based on the decision-theoretic use of an abstract representation of the agent state. This abstraction concisely captures critical information necessary for decision making while bounding the cost of meta-level control and is appropriate for use in automatically learning the meta-level control policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barto, A., Sutton, R., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834–846.

    Google Scholar 

  2. Bertsekas D., Tsitsiklis J. (1996). Neuro-dynamic programming. Athena Scientific, Belmont, MA

    MATH  Google Scholar 

  3. Boddy M., Dean T. (1994). Decision-theoretic deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence 67(2): 245–286

    Article  MATH  Google Scholar 

  4. Boutlier, C. (1999). Sequential optimality and coordination in multiagent systems. In Proceedings of the sixteenth international joint conference on artificial intelligence.

  5. Crites R., Barto A. (1996). Improving elevator performance using reinforcement learning, Multi-ag In Advances in Neural Information Processing Systems, pages 8: 1017–1023

  6. Dean, T., & Boddy, M. (1988). An analysis of time-dependent planning. In Proceedings of the seventh national conference on artificial intelligence (AAAI-88) (pp. 49–54). Saint Paul, Minnesota, USA: AAAI Press/MIT Press.

  7. Decker, K. (1996). Taems: a framework for environment centered analysis and design of coordination mechanisms. In G. O’Hare & N. Jennings, (Eds.), Foundations of Distributed Artificial Intelligence, Chapter 16 (pp. 429–448). Wiley Inter-Science.

  8. Dietterich T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13: 227–303

    MATH  Google Scholar 

  9. Doyle J. (1983). What is rational psychology? toward a modern mental philosophy. AI Magazine 4(3): 50–53

    Google Scholar 

  10. Garvey, A. & Lesser, V. (1996). Issues in design-to-time real-time scheduling. In AAAI Fall 1996 symposium on flexible computation, November.

  11. Georgeff, M. & Lansky, A. (1987). Reactive reasoning and planning. In Proceedings of the sixth national conference on artificial intelligence, pp. 677–682 Seattle, WA.

  12. Goldman, R., Musliner, D. & Krebsbach, K. (2003). Managing online self-adaptation in real-time environments. In LNCS, vol. 2614, SV, pp. 6–23.

  13. Good, I. J. (1971). Twenty-seven principles of rationality. In V. P. Godambe & D. A. Sprott, (Eds.), Foundations of statistical inference (pp. 108–141). Toronto: Holt Rinehart Wilson.

  14. Hansen E., Zilberstein S. (1996). Monitoring anytime algorithms. SIGART Bulletin 7(2): 28–33

    Article  Google Scholar 

  15. Harada, D. & Russell, S. (1999). Extended abstract: Learning search strategies. In Proceedings AAAI spring symposium on search techniques for problem solving under uncertainty and incomplete information, Stanford, CA, 1999.

  16. Hayes-Roth, B. (1993). Opportunistic control of action in intelligent agents. In Proceedings of IEEE transactions on systems, man and cybernetics, pp. SMC–23(6), 1575–1587.

  17. Hayes-Roth, B., Uckun, S., Larsson, J. E., Gaba, D., Barr, J. & Chien, J. (1994). Guardian: A prototype intelligent agent for intensive-care monitoring. In Proceedings of the national conference on artificial intelligence, pp. 1503–1511.

  18. Horling, B., Lesser, V. & Vincent, R. (2000). Multi-agent system simulation framework. In sixteenth IMACS World Congress 2000 on scientific computation, applied mathematics and simulation. Switzerland: EPFL, Lausanne.

  19. Horling B., Lesser V., Vincent R., Wagner T. (2006). The soft real-time agent control architecture. Autonomous Agents and Multi-Agent Systems 12(1): 35–92

    Article  Google Scholar 

  20. Horvitz, E. (1988). Reasoning under varying and uncertain resource constraints. In National conference on artificial intelligence of the american association for AI (AAAI-88), pp. 111–116.

  21. Kaelbling, L. (1990). Learning in embedded systems. PhD thesis, Stanford University.

  22. Kinney, M. & Tsatsoulis, C. (1998). Learning communication strategies in multiagent systems. Applied intelligence, 9(1), 71–91.

    Google Scholar 

  23. Kuwabara, K. (1996). Meta-level control of coordination protocols. In Proceedings of the third international conference on multi-agent systems (ICMAS96). pp. 104–111.

  24. Lagoudakis, M. & Littman, M. (2000). Reinforcement learning for algorithm selection. In Proceedings of the seventeenth national conference on artificial intelligence (AAAI-2000), pp. 1081.

  25. Littman, M. & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Technical Report CS-93-165.

  26. Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning (ML-94) (pp. 157–163). Morgan Kaufmann: New Brunswick, NJ.

  27. Mataric, M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1), 73–83.

    Google Scholar 

  28. Musliner, D. J., Hendler, J. A., Agrawala, A. K., Durfee, E. H., Strosnider, J. K. & Paul, C. J. (1995). The Challenges of real-time AI. IEEE Computer, 28(1), 58–66.

    Google Scholar 

  29. Musliner, D. (1996). Plan execution in mission-critical domains. In Working notes of the AAAI fall symposium on plan execution–problems and issues.

  30. Nakakuki, Y. & Sadeh, N. (1994). Increasing the efficiency of simulated annealing search by learning to recognize (un)promising runs. In Proceedings of the twelfth national conference on artificial intelligence (AAAI-94), pp. 1316–1322.

  31. Parr, R. & Russell, S. (1997). Reinforcement learning with hierarchies of machines. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, vol. 10, The MIT Press.

  32. Puterman, M. L. (1994). Markov decision processes – discrete stochastic dynamic programming. Games as a Framework for Multi-Agent Reinforcement Learning. New York: John Wiley and Sons, Inc.

  33. Raja, A. (2003). Meta-level control in multi-agent systems. PhD thesis, University of Massachusetts at Amherst, Amherst, Massachusetts.

  34. Raja, A., Alexander, G. & Mappillai, V. (2006). Leveraging problem classification in online meta-cognition. In Proceedings of AAAI 2006 spring symposium on distributed plan and schedule management (pp. 97–104) Stanford.

  35. Raja, A., Lesser, V., & Wagner, T. (2000). Toward Robust Agent Control in Open Environments. In Proceedings of the fourth international conference on autonomous agents (pp. 84–91). Barcelona, Catalonia, Spain: ACM Press.

  36. Russell, S. & Norvig, P. (1995). Artificial intelligence: A modern approach. Prentice Hall.

  37. Russell, S. & Wefald, E. (1992). Do the right thing: studies in limited rationality. MIT press.

  38. Russell, S. J., Subramanian, D. & Parr, R. (1993). Provably bounded optimal agents. In Proceedings of the thirteenth international joint conference on artificial intelligence (IJCAI-93), pp. 338–344.

  39. Russell, S. & Wefald, E. (1989). Principles of metareasoning. In Proceedings of the first international conference on principles of knowledge representation and reasoning. pp. 400–411.

  40. Sandholm T., Crites R. (1995). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems Journal 37: 147–166

    Article  Google Scholar 

  41. Schut, M. & Wooldridge, M. (2001). The control of reasoning in resource-bounded agents. Knowledge Engineering Review, 16(3), 215–240.

    Google Scholar 

  42. Sen, S., Sekaran, M. & Hale, J. (1994). Learning to coordinate without sharing information. In Proceedings of the twelfth national conference on artificial intelligence, (pp. 426–431), Seattle, WA.

  43. Simon, H., Latsis, S. J. (Ed.) (1976). From substantive to procedural rationality. In Method and Appraisal in Economic. Cambridge University Press, pp. 129–148.

  44. Simon H., Kadane J. (1974). Optimal problem solving search: All-or-none solutions. Artificial Intelligence 6: 235–247

    Article  Google Scholar 

  45. Simon, H. (1982). Models of bounded rationality. vol. 1. Cambridge, MA: The MIT Press.

  46. Singh, S., Kearns, M., Litman, D. & Walker, M. (2000). Empirical evaluation of a reinforcement learning spoken dialogue system. In Proceedings of the seventeenth national conference on artificial intelligence, pp. 645–651.

  47. Sugawara, T. & Lesser, V. (1993). On-line learning of coordination plans. In Proceedings of the twelth international workshop on distributed artificial intelligence, pp. 335–345,371–377.

  48. Sutton, R. & Barto, A. (1998). Reinforcement learning. MIT Press.

  49. Sutton, R. (1984). Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst.

  50. Sutton R. (1988). Learning to predict by the method of temporal differences. Machine Learning 3(1): 9–44

    Google Scholar 

  51. Sutton, R., Precup, D. & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.

    Google Scholar 

  52. Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330–337.

  53. Vincent, R., Horling, B. & Lesser, V. (2001). An agent infrastructure to build and evaluate multi-agent systems: The java agent framework and multi-agent system simulator. In Wagner and Rana (Eds.), Lecture notes in artificial intelligence: infrastructure for agents, multi-agent systems, and scalable multi-agent systems, vol. 1887. Springer.

  54. Wagner, T., Garvey, A. & Lesser, V. (1998). Criteria-directed heuristic task scheduling. International Journal of Approximate Reasoning, Special Issue on Scheduling, 19(1–2), 91–118. A version also available as UMASS CS TR-97-59.

    Google Scholar 

  55. Watkins, C. (1989). Learning from delayed rewards. PhD thesis, Cambridge, England.

  56. Whitehead S.D., Ballard D.H. (1991). Learning to perceive and act by trial and error. Machine Learning 7(1): 45–83

    Google Scholar 

  57. Zhang, X. & Lesser, V. (2002). Multi-linked negotiation in multi-agent system. In Proceedings of the first international joint conference on autonomous agents and multiagent systems (AAMAS 2002), pp. 1207–1214.

  58. Zilberstein S., Mouaddib A. (1999). Reactive control of dynamic progressive processing. IJCAI, 1268–1273

  59. Zilberstein, S. & Russell, S. J. (1992). Efficient resource-bounded reasoning in AT-RALPH. In James Hendler, (Edn.), Proceedings of the first international conference of artificial intelligence planning systems (AIPS 92) (pp. 260–268) Morgan Kaufmann: College Park, Maryland, USA.

  60. Zilberstein S., Russell S.J. (1996). Optimal composition of real-time systems. Artificial Intelligence 82(1–2):181–213

    Article  Google Scholar 

  61. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. International Conference in Machine Learning, pp. 929–936.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anita Raja.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raja, A., Lesser, V. A framework for meta-level control in multi-agent systems. Auton Agent Multi-Agent Syst 15, 147–196 (2007). https://doi.org/10.1007/s10458-006-9008-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-006-9008-z

Keywords

Navigation