Rationality Assumptions and Optimality of Co-learning

  • Ron Sun
  • Dehu Qi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1881)

Abstract

This paper investigates the effect of different rationality assumptions on the performance of co-learning by multiple agents in extensive games. Extensive games involve sequences of steps and close interactions between agents, and are thus more difficult than more commonly investigated (one-step) strategic games. Rationality assumptions may thus have more complicated influences on learning, e.g., improving performance sometimes while hurting performance some other times. In testing different levels of rationality assumptions, a “double estimation” method for reinforcement learning suitable for extensive games is developed, whereby an agent learns not only its own value function but also those of other agents. Experiments based on such a reinforcement learning method are carried out using several typical examples of games. Our results indeed showed a complex pattern of effects resulting from (different levels of) rationality assumptions.

Keywords

Nash Tham 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Bellman, (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
  2. D. Bertsekas and J. Tsitsiklis, (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.MATHGoogle Scholar
  3. C. Claus and C. Boutilier, (1998). The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of A AAI’ 98. AAAI Press, San Mateo, CA.Google Scholar
  4. D. Fudenberg and D. Levine, (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.MATHGoogle Scholar
  5. T. Haynes and S. Sen, (1996). Co-adaptation in a team. International Journal of Computational Intelligence and Organizations.Google Scholar
  6. J. Hu and M. Wellman, (1998 a). Multiagent reinforcement learning: theore-tical framework and an algorithm. Proceedings of International Conference on Machine Learning, 242–250. Morgan Kaufmann, San Francisco, CA.Google Scholar
  7. J. Hu and M. Wellman, (1998 b). Online learning about other agents in a dynamic multiagent system. Second International Conference on Autonomous Agents. ACM Press, New York.Google Scholar
  8. M. Littman, (1994). Markov games as a framework for multi-agent reinfocement learning. Proc. of the 11th International conference on Machine Learning, 157–163. Morgan Kaufmann, San Francisco, CA.Google Scholar
  9. M. Osborne and A. Rubinstein, (1994). A Course on Game Theory. MIT Press, Cambridge, MA.Google Scholar
  10. R. Salustowicz, M. Wiering, and J. Schmidhuber, (1998). Learning team strategies: soccer case studies. Machine Learning. 1998Google Scholar
  11. S. Sen and M. Sekaran, (1998). Individual learning of coordination knowledge. Journal of Experimental and Theoretical Artificial Intelligence, 10, 333–356.MATHCrossRefGoogle Scholar
  12. Y. Shoham and M. Tennenholtz, (1994). Co-learning and the evolution of social activity. Technical Report STAN-CS-TR-94-1511, Stanford University.Google Scholar
  13. S. Singh, T. Jaakkola, and M. Jordan, (1994). Reinforcement learning with soft state aggregation. In: S.J. Hanson J. Cowan and C. L. Giles, eds. Advances in Neural Information Processing Systems 7. Morgan Kaufmann, San Mateo, CA.Google Scholar
  14. R. Sun and T. Peterson, (1999). Multi-agent reinforcement learning: weighting and partitioning. Neural Networks, Vol. 12, No. 4–5. pp. 127–153.Google Scholar
  15. R. Sun and C. Sessions, (1999). Bidding in reinforcement learning: a paradigm for multi-agent systems. Proc. of The Third International Conference on Autonomous Agents (AGENTS’99), Seattle, WA.Google Scholar
  16. M. Tan, (1993). Multi-agent reinforcement learning: independent vs. cooperative agents. Proceedings of Machine Learning Conference. Morgan Kaufmann, San Francisco, CA.Google Scholar
  17. C. Tham, (1995). Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems. 15, 247–274.CrossRefGoogle Scholar
  18. M. Vidal and E.H. Durfee, (1998). Learning nested models in an information economy. Journal of Experimental and Theoretical Artificial Intelligence, 10(3), 291–308.MATHCrossRefGoogle Scholar
  19. C. Watkins, (1989). Learning with Delayed Rewards. Ph.D Thesis, Cambridge University, Cambridge, UK.Google Scholar
  20. G. Weiss, (1995). Distributed reinforcement learning. Robotics and Autonomous Systems, 15, 135–142.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Ron Sun
    • 1
  • Dehu Qi
    • 1
  1. 1.CECSUniversity of MissouriColumbiaUSA

Personalised recommendations