An Extension of a Hierarchical Reinforcement Learning Algorithm for Multiagent Settings

  • Ioannis Lambrou
  • Vassilis Vassiliades
  • Chris Christodoulou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7188)


This paper compares and investigates single-agent reinforcement learning (RL) algorithms on the simple and an extended taxi problem domain, and multiagent RL algorithms on a multiagent extension of the simple taxi problem domain we created. In particular, we extend the Policy Hill Climbing (PHC) and the Win or Learn Fast-PHC (WoLF-PHC) algorithms by combining them with the MAXQ hierarchical decomposition and investigate their efficiency. The results are very promising for the multiagent domain as they indicate that these two newly-created algorithms are the most efficient ones from the algorithms we compared.


Hierarchical Reinforcement Learning Multiagent Reinforcement Learning Taxi Domain 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andre, D., Russell, S.J.: In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13 (NIPS 2000), pp. 1019–1025. MIT Press, Cambridge (2001)Google Scholar
  2. 2.
    Bowling, M.H., Veloso, M.M.: Artificial Intelligence 136(2), 215–250 (2002)Google Scholar
  3. 3.
    Dayan, P., Hinton, G.E.: In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems 5 (NIPS 1992), pp. 271–278. Morgan Kaufmann, San Francisco (1993)Google Scholar
  4. 4.
    Dietterich, T.G.: Journal of Artificial Intelligence Research 13, 227–303 (2000)Google Scholar
  5. 5.
    Diuk, C., Cohen, A., Littman, M.L.: In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 240–247. ACM, New York (2008)CrossRefGoogle Scholar
  6. 6.
    Fitch, R., Hengst, B., Šuc, D., Calbert, G., Scholz, J.: Structural Abstraction Experiments in Reinforcement Learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 164–175. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Ghavamzadeh, M., Mahadevan, S.: In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2004), vol. 3, pp. 1114–1121. IEEE Computer Society, Washington, DC (2004)Google Scholar
  8. 8.
    Hengst, B.: In: Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 243–250. Morgan Kaufmann, San Francisco (2002)Google Scholar
  9. 9.
    Hwang, K.-S., Lin, C.-J., Wu, C.-J., Lo, C.-Y.: Cooperation Between Multiple Agents Based on Partially Sharing Policy. In: Huang, D.-S., Heutte, L., Loog, M. (eds.) ICIC 2007. LNCS, vol. 4681, pp. 422–432. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Kaelbling, L.P.: In: Proceedings of the 10th International Conference on Machine Learning (ICML 1993), pp. 167–173. Morgan Kaufmann, San Francisco (1993)Google Scholar
  11. 11.
    Mehta, N., Tadepalli, P., Fern, A.: In: Driessens, K., Fern, A., van Otterlo, M. (eds.) Proceedings of the ICML 2005 Workshop on Rich Representations for Reinforcement Learning, Bonn, Germany, pp. 45–50 (2005)Google Scholar
  12. 12.
    Mehta, N., Ray, S., Tadepalli, P., Dietterich, T.: In: Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 648–655. ACM, New York (2008)Google Scholar
  13. 13.
    Mirzazadeh, F., Behsaz, B., Beigy, H.: In: Proceedings of the International Conference on Information and Communication Technology (ICICT 2007), pp. 105–108 (2007)Google Scholar
  14. 14.
    Parr, R.: Hierarchical control and learning for Markov decision processes. Ph.D. thesis, University of California at Berkeley (1998)Google Scholar
  15. 15.
    Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR 166, Cambridge University (1994)Google Scholar
  16. 16.
    Shen, J., Liu, H., Gu, G.: In: Yao, Y., Shi, Z., Wang, Y., Kinsner, W. (eds.) Proceedings of the 5th International Conference on Cognitive Informatics (ICCI 2006), pp. 584–588. IEEE (2006)Google Scholar
  17. 17.
    Singh, S.P.: Machine Learning 8, 323–339 (1992)zbMATHGoogle Scholar
  18. 18.
    Sutton, R.S., Precup, D., Singh, S.: Artificial Intelligence 112, 181–211 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Thrun, S.B.: Efficient Exploration in Reinforcement Learning. Tech. Rep. CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA (1992)Google Scholar
  20. 20.
    Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)Google Scholar
  21. 21.
    Wiering, M., Schmidhuber, J.: Adaptive Behavior 6, 219–246 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ioannis Lambrou
    • 1
  • Vassilis Vassiliades
    • 1
  • Chris Christodoulou
    • 1
  1. 1.Department of Computer ScienceUniversity of CyprusNicosiaCyprus

Personalised recommendations