Autonomous Agents and Multi-Agent Systems

, Volume 30, Issue 2, pp 175–219 | Cite as

Ad hoc teamwork by learning teammates’ task

Article

Abstract

This paper addresses the problem of ad hoc teamwork, where a learning agent engages in a cooperative task with other (unknown) agents. The agent must effectively coordinate with the other agents towards completion of the intended task, not relying on any pre-defined coordination strategy. We contribute a new perspective on the ad hoc teamwork problem and propose that, in general, the learning agent should not only identify (and coordinate with) the teammates’ strategy but also identify the task to be completed. In our approach to the ad hoc teamwork problem, we represent tasks as fully cooperative matrix games. Relying exclusively on observations of the behavior of the teammates, the learning agent must identify the task at hand (namely, the corresponding payoff function) from a set of possible tasks and adapt to the teammates’ behavior. Teammates are assumed to follow a bounded-rationality best-response model and thus also adapt their behavior to that of the learning agent. We formalize the ad hoc teamwork problem as a sequential decision problem and propose two novel approaches to address it. In particular, we propose (i) the use of an online learning approach that considers the different tasks depending on their ability to predict the behavior of the teammate; and (ii) a decision-theoretic approach that models the ad hoc teamwork problem as a partially observable Markov decision problem. We provide theoretical bounds of the performance of both approaches and evaluate their performance in several domains of different complexity.

Keywords

Ad hoc teamwork Online learning POMDP 

Notes

Acknowledgments

The authors gratefully acknowledge the anonymous reviewers for the many useful suggestions that greatly improved the clarity of the presentation. This work was partially supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and the Carnegie Mellon Portugal Program and its Information and Communications Technologies Institute, under Project CMUP-ERI/HCI/0051/2013.

Supplementary material

Supplementary material 1 (m4v 14596 KB)

References

  1. 1.
    Abbeel, P. (2008). Apprenticeship learning and reinforcement learning with application to robotic control. PhD thesis, Stanford University.Google Scholar
  2. 2.
    Agmon, N., & Stone, P. (2012). Leading ad hoc agents in joint action settings with multiple teammates. In Proceedings 11th International Conference on Autonomous Agents and Multiagent Systems (pp. 341–348).Google Scholar
  3. 3.
    Albrecht, S., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In: Proceedings 2013 International Conference on Autonomous Agents and Multiagent Systems (pp. 1155–1156).Google Scholar
  4. 4.
    Barrett, S., & Stone, P. (2011). Ad hoc teamwork modeled with multi-armed bandits: An extension to discounted infinite rewards. In Proceedings of 2011 AAMAS Workshop on Adaptive and Learning Agents (pp. 9–14).Google Scholar
  5. 5.
    Barrett, S., & Stone, P. (2012). An analysis framework for ad hoc teamwork tasks. In Proceedings of 11th International Conference on Autonomous Agents and Multiagent Systems (pp. 357–364).Google Scholar
  6. 6.
    Barrett, S., Stone, P., & Kraus, S. (2011). Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems (pp. 567–574).Google Scholar
  7. 7.
    Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2013). Teamwork with limited knowledge of reammates. In Proceedings of 27th AAAI Conference on Artificial Intelligence.Google Scholar
  8. 8.
    Barron, A. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical Report 7, University of Illinois at Urbana-Champaign.Google Scholar
  9. 9.
    Blackwell, D., & Dubbins, L. (1962). Merging of opinions with increasing information. The Annals of Mathematical Statistics, 33(3), 882–886.CrossRefMATHGoogle Scholar
  10. 10.
    Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Proceedings 6th Conference on Theoretical Aspects of Rationality and Knowledge (pp. 195–210).Google Scholar
  11. 11.
    Bowling, M., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. In Proceedings of 20th AAAI Conference on Artificial Intelligence (pp. 53–58).Google Scholar
  12. 12.
    Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning and games. New York: Cambridge University Press.CrossRefMATHGoogle Scholar
  13. 13.
    Chakraborty, D., & Stone, P. (2013). Cooperating with a Markovian ad hoc teammate. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (pp. 1085–1092).Google Scholar
  14. 14.
    Clarke, B., & Barron, A. (1990). Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory, 36(3), 371–453.CrossRefMathSciNetGoogle Scholar
  15. 15.
    de Farias, D., & Megiddo, N. (2006). Combining expert advice in reactive environments. The Journal of the ACM, 53(5), 762–799.CrossRefGoogle Scholar
  16. 16.
    Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massassachusetts Amherst.Google Scholar
  17. 17.
    Fu, J., & Kass, R. (1988). The exponential rates of convergence of posterior distributions. Annals of the Institute of Statistical Mathematics, 40(4), 683–691.CrossRefMathSciNetMATHGoogle Scholar
  18. 18.
    Fudenberg, D., & Levine, D. (1989). Reputation and equilibrium selection in games with a patient player. Econometrica, 57(4), 759–778.CrossRefMathSciNetMATHGoogle Scholar
  19. 19.
    Fudenberg, D., & Levine, D. (1993). Steady state learning and Nash equilibrium. Econometrica, 61(3), 547–573.CrossRefMathSciNetMATHGoogle Scholar
  20. 20.
    Fudenberg, D., & Levine, D. (1998). The theory of learning in games. Cambridge, MA: MIT Press.MATHGoogle Scholar
  21. 21.
    Ganzfried, S., & Sandholm, T. (2011). Game theory-based opponent modeling in large imperfect-information games. In Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems (pp. 533–540).Google Scholar
  22. 22.
    Genter, K., Agmon, N., & Stone, P. (2011). Role-based ad hoc teamwork. In Proceedings of 25th AAAI Conference on Artificial Intelligence (pp. 1782–1783).Google Scholar
  23. 23.
    Genter, K., Agmon, N., & Stone, P. (2013). Ad hoc teamwork for leading a flock. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (pp. 531–538).Google Scholar
  24. 24.
    Ghosal, S., & van der Vaart, A. (2007). Convergence rates of posterior distributions for non IID observations. The Annals of Statistics, 35(1), 192–223.CrossRefMathSciNetMATHGoogle Scholar
  25. 25.
    Ghosal, S., Ghosh, J., & van der Vaart, A. (2000). Convergence rates of posterior distributions. The Annals of Statistics, 28(2), 500–531.CrossRefMathSciNetMATHGoogle Scholar
  26. 26.
    Gittins, J. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society B, 41(2), 148–177.MathSciNetMATHGoogle Scholar
  27. 27.
    Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research, 24, 49–79.MATHGoogle Scholar
  28. 28.
    Gossner, O., & Tomala, T. (2008). Entropy bounds on Bayesian learning. Journal of Mathematical Economics, 44, 24–32.CrossRefMathSciNetMATHGoogle Scholar
  29. 29.
    Haussler, D., & Opper, M. (1997). Mutual information, metric entropy and cumulative entropy risk. Annals of Statistics, 25(6), 2451–2492.CrossRefMathSciNetMATHGoogle Scholar
  30. 30.
    Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.CrossRefMathSciNetMATHGoogle Scholar
  31. 31.
    Jordan, J. (1991). Bayesian learning in normal form games. Games and Economic Behavior, 3, 60–81.CrossRefMathSciNetMATHGoogle Scholar
  32. 32.
    Jordan, J. (1992). The exponential convergence of Bayesian learning in normal form games. Games and Economic Behavior, 4(2), 202–217.CrossRefMathSciNetMATHGoogle Scholar
  33. 33.
    Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.CrossRefMathSciNetMATHGoogle Scholar
  34. 34.
    Kalai, E., & Lehrer, E. (1993). Rational learning leads to Nash equilibrium. Econometrica, 61(5), 1019–1045.CrossRefMathSciNetMATHGoogle Scholar
  35. 35.
    Kauffman, E., Cappé, O., & Garivier, A (2012). On Bayesian upper confidence bounds for bandit problems. In Proceedings of 15th International Conference on Artificial Intelligence and Statistics (pp. 592–600).Google Scholar
  36. 36.
    Kauffman, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of 23rd International Conference on Algorithmic Learning Theory (pp. 199–213).Google Scholar
  37. 37.
    Kautz, H., Pelavin, R., Tenenberg, J., & Kaufmann, M. (1991). A formal theory of plan recognition and its implementation. Reasoning about plans (pp. 69–125). San Mateo, CA: Morgan Kaufmann.Google Scholar
  38. 38.
    Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of 17th European Conference on Machine Learning (pp. 282–293).Google Scholar
  39. 39.
    Lai, T., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.CrossRefMathSciNetMATHGoogle Scholar
  40. 40.
    Leyton-Brown, K., & Shoham, Y. (2008). Essential of game theory: A concise, multidisciplinary introduction. San Rafael, CA: Morgan & Claypool Publishers.Google Scholar
  41. 41.
    Liemhetcharat, S., & Veloso, M. (2014). Weighted synergy graphs for effective team formation with heterogeneous ad hoc agents. Artificial Intelligence, 208, 41–65.CrossRefMathSciNetGoogle Scholar
  42. 42.
    Littman, M. (2001). Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research, 2(1), 55–66.CrossRefGoogle Scholar
  43. 43.
    Madani, O., Hanks, S., & Condon, A. (1999). On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of 16th AAAI Conference Artificial Intelligence (pp. 541–548).Google Scholar
  44. 44.
    Nachbar, J. (1997). Prediction, optimization and learning in repeated games. Econometrica, 65(2), 275–309.CrossRefMathSciNetMATHGoogle Scholar
  45. 45.
    Ng, A., & Russel, S. (2000). Algorithms for inverse reinforcement learning. In Proceedings of 17th International Conference on Machine Learning (pp. 663–670).Google Scholar
  46. 46.
    Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27, 335–380.MATHGoogle Scholar
  47. 47.
    Poupart, P., Vlassis, N., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of 23rd International Conference on Machine Learning (pp. 697–704).Google Scholar
  48. 48.
    Pourmehr, S., & Dadkhah, C. (2012). An overview on opponent modeling in RoboCup soccer simulation 2D. Robot soccer world cup XV (pp. 402–414). Berlin: Springer.Google Scholar
  49. 49.
    Puterman, M. (2005). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.Google Scholar
  50. 50.
    Ramchurn, S., Osborne, M., Parson, O., Rahwan, T., Maleki, S., Reece, S., Huynh, T., Alam, M., Fischer, J., Rodden, T., Moreau, L., & Roberts, S. (2013). AgentSwitch: Towards smart energy tariff selection. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (pp. 981–988).Google Scholar
  51. 51.
    Rosenthal, S., Biswas, J., & Veloso, M. (2010). An effective personal mobile robot agent through symbiotic human-robot interaction. In Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (pp. 915–922).Google Scholar
  52. 52.
    Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multiagent Systems, 17(2), 190–250.CrossRefGoogle Scholar
  53. 53.
    Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based POMDP solvers. Journal of Autonomous Agents and Multiagent Systems, 27(1), 1–51.CrossRefGoogle Scholar
  54. 54.
    Shen, X., & Wasserman, L. (2001). Rates of convergence of posterior distributions. The Annals of Statistics, 29(3), 687–714.CrossRefMathSciNetMATHGoogle Scholar
  55. 55.
    Shiryaev, A. (1996). Probability. New York: Springer.CrossRefGoogle Scholar
  56. 56.
    Spaan, M., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Jornal of Artificial Intelligence Reseasrch, 24, 195–220.MATHGoogle Scholar
  57. 57.
    Spaan, M., Gordon, G., & Vlassis, N. (2006). Decentralized planning under uncertainty for teams of communicating agents. In Proceedings of 5th International Conference on Autonomous Agents and Multi Agent Systems (pp. 249–256).Google Scholar
  58. 58.
    Stone, P., & Kraus, S. (2010). To teach or not to teach? Decision-making under uncertainty in ad hoc teams. In Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (pp. 117–124).Google Scholar
  59. 59.
    Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.CrossRefGoogle Scholar
  60. 60.
    Stone, P., Kaminka, G., Kraus, S., & Rosenschein, J. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of 24th AAAI Conference on Artificial Intelligence (pp. 1504–1509).Google Scholar
  61. 61.
    Stone, P., Kaminka, G., & Rosenschein, J. (2010). Leading a best-response teammate in an ad hoc team. Agent-mediated electronic commerce. Designing trading strategies and mechanisms for electronic markets. Lecture notes in business information processing (pp. 132–146). Berlin: Springer.CrossRefGoogle Scholar
  62. 62.
    Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.Google Scholar
  63. 63.
    Walker, S., Lijoi, A., & Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. The Annals of Statistics, 35(2), 738–746.CrossRefMathSciNetMATHGoogle Scholar
  64. 64.
    Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Advances in Neural Information Processing Systems, 15, 1571–1578.Google Scholar
  65. 65.
    Watkins, C. (1989). Learning from delayed rewards. PhD thesis, King’s College, Cambridge Univ.Google Scholar
  66. 66.
    Wu, F., Zilberstein, S., & Chen, X. (2011). Online planning for ad hoc autonomous agent teams. In Proceedings of 22nd International Joint Conference on Artificial Intelligence (pp. 439–445).Google Scholar
  67. 67.
    Yorke-Smith, N., Saadati, S., Myers, K., & Morley, D. (2012). The design of a proactive personal agent for task management. International Journal on Artificial Intelligence Tools, 21(1), 90–119.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.INESC-ID and Instituto Superior TécnicoUniversidade de LisboaPorto SalvoPortugal
  2. 2.INESC-ID and Instituto Superior TécnicoUniversidade de LisboaPorto SalvoPortugal

Personalised recommendations