Autonomous Agents and Multi-Agent Systems

, Volume 31, Issue 4, pp 821–860 | Cite as

Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

  • Muthukumaran Chandrasekaran
  • Prashant Doshi
  • Yifeng Zeng
  • Yingke Chen
Article

Abstract

Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations.

Keywords

Multiagent systems Ad hoc teamwork Sequential decision making and planning Reinforcement learning 

References

  1. 1.
    Adam, B., & Dekel, E. (1993). Hierarchies of beliefs and common knowledge. International Journal of Game Theory, 59(1), 189–198Google Scholar
  2. 2.
    Adoe, F., Chen, Y., & Doshi, P. (2015). Fast solving of influence diagrams for multiagent planning on GPU-enabled architectures. In International conference on agents and artificial intelligence (ICAART) (pp. 183–195)Google Scholar
  3. 3.
    Agmon, N., Barrett, S., & Stone, P. (2014). Modeling uncertainty in leading ad hoc teams. In Proceedings of the 13th international conference on autonomous agents and multiagent systems (AAMAS) Google Scholar
  4. 4.
    Agmon, N., & Stone, P. (2012). Leading ad hoc agents in joint action settings with multiple teammates. In Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (Vol. 1, pp. 341–348)Google Scholar
  5. 5.
    Agogino, A., & Turner, K. (2005). Multi-agent reward analysis for learning in noisy domains. In Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems (pp. 81–88). Providence, RI: ACMGoogle Scholar
  6. 6.
    Albrecht, S., Crandall, J., & Ramamoorthy, S. (2016). Belief and truth in hypothesised behaviours. Artificial Intelligence 235, 63–94Google Scholar
  7. 7.
    Albrecht, S., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. Tech. rep., Univ. of EdinburghGoogle Scholar
  8. 8.
    Albrecht, S., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems (extended abstract). In AAMAS (pp. 1155–1156)Google Scholar
  9. 9.
    Albrecht, S., & Ramamoorthy, S. (2014). On convergence and optimality of best-response learning with policy types in multiagent systems. In Proceedings of the 30th conference on uncertainty in artificial intelligence (UAI-14). Quebec CityGoogle Scholar
  10. 10.
    Albrecht, S., & Ramamoorthy, S. (2015). Are you doing what i think you are doing? criticising uncertain agent models. In Proceedings of the 31st conference on uncertainty in artificial intelligence (UAI-15). AmsterdamGoogle Scholar
  11. 11.
    Amato, C., Konidaris, G. D., & Kaelbling, L. P. (2014). Planning with macro-actions in decentralized pomdps. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International foundation for autonomous agents and multiagent systems (pp. 1273–1280)Google Scholar
  12. 12.
    Amato, C., & Oliehoek, F. A. (2015). Scalable planning and learning for multiagent pomdps. In Proceedings of the 29th AAAI conference on artificial intelligence Google Scholar
  13. 13.
    Aumann, R. J. (1999). Interactive epistemology II: Probability. International Journal of Game Theory, 28, 301–314.MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Banerjee, B., Lyle, J., Kraemer, L., & Yellamraju, R. (2012) Solving finite horizon decentralized pomdps by distributed reinforcement learning. In AAMAS workshop on MSDM (pp. 9–16)Google Scholar
  15. 15.
    Barrett, S., Stone, P., & Kraus, S. (2011). Empirical evaluation of ad hoc teamwork in the pursuit domain. In Autonomous agents and multi-agent systems Google Scholar
  16. 16.
    Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Bernstein, D. S., Hansen, E. A., & Zilberstein, S. (2005). Bounded policy iteration for decentralized pomdps. In International joint conference on artificial intelligence Google Scholar
  18. 18.
    Binmore, K. (1982). Essays on foundations of game theory. Boston, MA: Pitman.MATHGoogle Scholar
  19. 19.
    Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. IJCAI, 99, 478–485.Google Scholar
  20. 20.
    Bowling, M., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. AAAI, 5, 53–58.Google Scholar
  21. 21.
    Bowling, M.H., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. In Association for the advancement of artificial intelligence (pp. 53–58)Google Scholar
  22. 22.
    Brandenburger, A. (2007). The power of paradox: Some recent developments in interactive epistemology. International Journal of Game Theory, 35, 465–492.MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Brown, G. W. (1951). Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1), 374–376.MathSciNetMATHGoogle Scholar
  24. 24.
    Camerer, C. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press.MATHGoogle Scholar
  25. 25.
    Camerer, C. F., Ho, T. H., & Chong, J. K. (2004). A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3), 861–898.CrossRefMATHGoogle Scholar
  26. 26.
    Carlin, A., & Zilberstein, S. (2008). Value-based observation compression for dec-pomdps. In Proceedings of the 7th international joint conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (Vol. 1, pp. 501–508)Google Scholar
  27. 27.
    Chakraborty, D., & Stone, P. (2013). Cooperating with a markovian ad hoc teammate. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems. International foundation for autonomous agents and multiagent systems (pp. 1085–1092)Google Scholar
  28. 28.
    Chandrasekaran, M., Prashant, D., & Zeng, Y. (2010). Approximate solutions of interactive dynamic influence diagrams using epsilon-behavioral equivalence. In 11th international symposium on artificial intelligence and mathematics (ISAIM)Google Scholar
  29. 29.
    Chang, Y.h., Ho, T., & Kaelbling, L.P. (2004). All learning is local: Multi-agent learning in global reward games. In Advances in neural information processing systems (pp. 807–814)Google Scholar
  30. 30.
    Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Association for the advancement of artificial intelligence (pp. 183–188)Google Scholar
  31. 31.
    Dibangoye, J. S., Amato, C., Buffet, O., & Charpillet, F. (2013). Optimally solving Dec-POMDPs as continuous-state MDPs. In Proceedings of the twenty-third international joint conference on artificial intelligence (pp. 90–96). Palo Alto, CA: AAAI PressGoogle Scholar
  32. 32.
    Doshi, P. (2012). Decision making in complex mulitiagent contexts: A tale of two frameworks. AI Magazine, 4(33), 82–95.Google Scholar
  33. 33.
    Doshi, P., Chandrasekaran, M., & Zeng, Y. (2010). Epsilon-subjective equivalence of models for interactive dynamic influence diagrams. In IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology(WI-IAT) (Vol. 2, pp. 165–172)Google Scholar
  34. 34.
    Doshi, P., & Zeng, Y. (2009). Improved approximation of interactive dynamic influence diagrams using discriminative model updates. In Proceedings of The 8th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent system (Vol. 2, pp. 907–914)Google Scholar
  35. 35.
    Doshi, P., & Zeng, Y. (2009). Improved approximation of interactive dynamic influence diagrams using discriminative model updates. In Autonomous agents and multi-agent systems Google Scholar
  36. 36.
    Doshi, P., Zeng, Y., & Chen, Q. (2009). Graphical models for interactive pomdps: Representations and solutions. JAAMAS, 18(3), 376–416.Google Scholar
  37. 37.
    Gal, Y., & Pfeffer, A. (2003). A language for modeling agent’s decision-making processes in games. In Autonomous agents and multi-agent systems (pp. 265–272)Google Scholar
  38. 38.
    Gilboa, I., & Schmeidler, D. (2001). A theory of case-based decisions. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar
  39. 39.
    Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. JAIR, 24, 49–79.MATHGoogle Scholar
  40. 40.
    Goodwine, B., & Antsaklis, P. (2013). Multi-agent compositional stability exploiting system symmetries. Automatica, 49(11), 3158–3166.MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored mdps. NIPS, 1, 1523–1530.Google Scholar
  42. 42.
    Hansen, E.A., Bernstein, D.S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Association for the advancement of artificial intelligence (pp. 709–715)Google Scholar
  43. 43.
    Harsanyi, J. C. (1967). Games with incomplete information played by bayesian players. Management Science, 14(3), 159–182.MathSciNetCrossRefMATHGoogle Scholar
  44. 44.
    Hoang, T.N., & Low, K.H. (2013). Interactive pomdp lite: Towards practical planning to predict and exploit intentions for interacting with self-interested agents. In International joint conference on artificial intelligence (pp. 2298–2305)Google Scholar
  45. 45.
    Kalai, E., & Lehrer, E. (1993). Rational learning leads to nash equilibrium. Econometrica, 61(5), 1019–1045.MathSciNetCrossRefMATHGoogle Scholar
  46. 46.
    Kim, Y., Nair, R., Varakantham, P., Tambe, M., & Yokoo, M. (2006). Exploiting locality of interaction in networked distributed POMDPs. In AAAI Spring symposium on distributed plan and schedule management Google Scholar
  47. 47.
    Koller, D., & Milch, B. (2001). Multi-agent influence diagrams for representing and solving games. In International joint conference on artificial intelligence (pp. 1027–1034)Google Scholar
  48. 48.
    Koller, D., & Milch, B. (2003). Multi-agent influence diagrams for representing and solving games. Games and Economic Behavior, 45(1), 181–221.MathSciNetCrossRefMATHGoogle Scholar
  49. 49.
    Kumar, A., Zilberstein, S., & Toussaint, M. (2011). Scalable multiagent planning using probabilistic inference. In International joint conference on artificial intelligence Google Scholar
  50. 50.
    Liu, B., Singh, S., Lewis, R.L., & Qin, S. (2012). Optimal rewards in multiagent teams. In 2012 IEEE international conference on development and learning and epigenetic robotics (ICDL) (pp. 1–8)Google Scholar
  51. 51.
    Mccallum, A. K. (1996). Reinforcement learning with selective perception and hidden state. Ph.D. thesis, University of RochesterGoogle Scholar
  52. 52.
    Mertens, J., & Zamir, S. (1985). Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory, 14, 1–29.MathSciNetCrossRefMATHGoogle Scholar
  53. 53.
    Meuleau, N., Peshkin, L., eung Kim, K., & Kaelbling, L. P. (1999) Learning finite-state controllers for partially observable environments. In Uncertainty in artificial intelligence (pp. 427–436)Google Scholar
  54. 54.
    Nair, R., Tambe, M., Yokoo, M., Pynadath, D., & Marsella, S. (2003) Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International joint conference on artificial intelligence (pp. 705–711)Google Scholar
  55. 55.
    Nair, R., Varakantham, P., Tambe, M., & Yokoo, M. (2005). Networked distributed pomdps: A synthesis of distributed constraint optimization and pomdps. AAAI, 5, 133–139.Google Scholar
  56. 56.
    Ng, B., Boakye, K., Meyers, C., & Wang, A. (2012). Bayes-adaptive interactive pomdps. In Association for the advancement of artificial intelligence Google Scholar
  57. 57.
    Oliehoek, F.A., Spaan, M.T., Amato, C., & Whiteson, S. (2013) Incremental clustering and expansion for faster optimal planning in Dec-POMDPs. Journal of Artificial Intelligence Research 46, 449–509Google Scholar
  58. 58.
    Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. JAAMAS, 11(3), 387–434.Google Scholar
  59. 59.
    Perkins, T.J. (2002). Reinforcement learning for pomdps based on action values and stochastic optimization. In Association for the advancement of artificial intelligence (pp. 199–204)Google Scholar
  60. 60.
    Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27, 335–380Google Scholar
  61. 61.
    Pynadath, D., & Marsella, S. (2007). Minimal mental models. In Association for the advancement of artificial intelligence (pp. 1038–1044)Google Scholar
  62. 62.
    Pynadath, D. V., & Tambe, M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16, 389–423.MathSciNetMATHGoogle Scholar
  63. 63.
    Rathnasabapathy, B., Doshi, P., & Gmytrasiewicz, P. (2006). Exact solutions of interactive POMDPs using behavioral equivalence. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems (pp. 1025–1032). Providence, RI: ACMGoogle Scholar
  64. 64.
    Seuken, S., & Zilberstein, S. (2007). Improved memory-bounded dynamic programming for decentralized pomdps. In Uncertainty in artificial intelligence Google Scholar
  65. 65.
    Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems, 17(2), 190–250.CrossRefGoogle Scholar
  66. 66.
    Spaan, M., & Oliehoek, F. (2008). The multiagent decision process toolbox: Software for decision-theoretic planning in multiagent systems. In AAMAS workshop on MSDM (pp. 107–121)Google Scholar
  67. 67.
    Spaan, M. T. J. (2006). Decentralized planning under uncertainty for teams of communicating agents. In Autonomous agents and multi-agent systems (pp. 249–256)Google Scholar
  68. 68.
    Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Association for the advancement of artificial intelligence Google Scholar
  69. 69.
    Stone, P., Kaminka, G.A., & Rosenschein, J.S. (2010). Leading a best-response teammate in an ad hoc team. In Agent-mediated electronic commerce. Designing trading strategies and mechanisms for electronic markets (pp. 132–146). New York: SpringerGoogle Scholar
  70. 70.
    Stone, P., & Kraus, S. (2010). To teach or not to teach? decision making under uncertainty in ad hoc teams. In Autonomous agents and multi-agent systems Google Scholar
  71. 71.
    Tatman, J. A., & Shachter, R. D. (1990). Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man, and Cybernetics, 20(2), 365–379.MathSciNetCrossRefMATHGoogle Scholar
  72. 72.
    Wageman, R., & Baker, G. (1997). Incentives and cooperation: The joint effects of task and reward interdependence on group performance. Journal of Organizational Behavior, 18(2), 139–158.CrossRefGoogle Scholar
  73. 73.
    Wright, J. R., & Leyton-Brown, K. (2014). Level-0 meta-models for predicting human behavior in games. In Fifteenth ACM conference on economics and computation (EC) (pp. 857–874)Google Scholar
  74. 74.
    Wu, F., Zilberstein, S., & Chen, X. (2011). Online planning for ad hoc autonomous agent teams. In International joint conference on artificial intelligence (pp. 439–445)Google Scholar
  75. 75.
    Zeng, Y., & Doshi, P. (2012). Exploiting model equivalences for solving interactive dynamic influence diagrams. JAIR, 43, 211–255.MathSciNetMATHGoogle Scholar
  76. 76.
    Zeng, Y., Doshi, P., Pan, Y., Mao, H., Chandrasekaran, M., & Luo, J. (2011). Utilizing partial policies for identifying equivalence of behavioral models. In Proceedings of the 25th AAAI conference on artificial intelligence Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Muthukumaran Chandrasekaran
    • 1
  • Prashant Doshi
    • 1
  • Yifeng Zeng
    • 2
  • Yingke Chen
    • 3
  1. 1.THINC LabUniversity of GeorgiaAthensUSA
  2. 2.School of ComputingTeesside UniversityMiddlesbrough, Tees ValleyUK
  3. 3.Sichuan UniversityChengduChina

Personalised recommendations