Advertisement

Abstraction and Generalization in Reinforcement Learning: A Summary and Framework

  • Marc Ponsen
  • Matthew E. Taylor
  • Karl Tuyls
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5924)

Abstract

In this paper we survey the basics of reinforcement learning, generalization and abstraction. We start with an introduction to the fundamentals of reinforcement learning and motivate the necessity for generalization and abstraction. Next we summarize the most important techniques available to achieve both generalization and abstraction in reinforcement learning. We discuss basic function approximation techniques and delve into hierarchical, relational and transfer learning. All concepts and techniques are illustrated with examples.

Keywords

Reinforcement Learning Multiagent System Markov Decision Process Transfer Learning Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Albus, J.S.: Brains, Behavior, and Robotics. Byte Books, Peterborough (1981)Google Scholar
  2. 2.
    Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Application 13(4), 341–379 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  4. 4.
    Bengio, Y., Collobert, J.L.R., Weston, J.: Curriculum learning. In: Proceedings of the Twenty-Sixth International Conference on Machine Learning (June 2009)Google Scholar
  5. 5.
    Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. MIT Press, Cambridge (1995)Google Scholar
  6. 6.
    Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2003)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Brooks, R.A.: Intelligence without representation. Artificial Intelligence (47), 139–159 (1991)Google Scholar
  8. 8.
    Caruana, R.: Multitask learning. Machine Learning 28, 41–75 (1997)CrossRefGoogle Scholar
  9. 9.
    Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1017–1023. MIT Press, Cambridge (1996)Google Scholar
  10. 10.
    Croonenborghs, T., Driessens, K., Bruynooghe, M.: Learning relational options for inductive transfer in relational reinforcement learning. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 88–97. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 106–111 (1997)Google Scholar
  12. 12.
    Dietterich, T.: An overview of MAXQ hierarchical reinforcement learning. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 26–44. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Driessens, K.: Relational Reinforcement Learning. PhD thesis, DEPTCW (2004), http://www.cs.kuleuven.be/publicaties/doctoraten/cw/CW2004_05.abs.html
  14. 14.
    Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Giunchiglia, F., Walsh, T.: A theory of abstraction. Artificial Intelligence 57(2-3), 323–389 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Holte, R.C., Chouiery, B.Y.: Abstraction and reformulation in ai. Philosophical transactions of the Royal Society of London 358(1435:1), 197–204 (2003)Google Scholar
  18. 18.
    Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)zbMATHGoogle Scholar
  19. 19.
    Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The Seventh Symposium on Abstraction, Reformulation, and Approximation (July 2007)Google Scholar
  20. 20.
    Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998)Google Scholar
  21. 21.
    Konidaris, G., Barto, A.: Autonomous shaping: Knowledge transfer in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 489–496 (2006)Google Scholar
  22. 22.
    Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 895–900 (2007)Google Scholar
  23. 23.
    Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th Annual ICML, pp. 544–551 (2008)Google Scholar
  24. 24.
    Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics, pp. 531–539 (2006)Google Scholar
  25. 25.
    Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research 8, 2169–2231 (2007)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Noda, I., Matsubara, H., Hiraki, K., Frank, I.: Soccer server: A tool for research on multiagent systems. Applied Artificial Intelligence 12, 233–250 (1998)CrossRefGoogle Scholar
  28. 28.
    Ponsen, M., Croonenborghs, T., Ramon, J., Tuyls, K., Driessens, K., van den Herik, J., Postma, E.: Learning with whom to communicate using relational reinforcement learning. In: International Conference on Autonomous Agents and Multi Agent Systems, AAMAS (2009)Google Scholar
  29. 29.
    Puterman, M.: Markov decision processes: Discrete stochastic dynamic programming. John Wiley and Sons, New York (1994)CrossRefzbMATHGoogle Scholar
  30. 30.
    Pyeatt, L.D., Howe, A.E.: Decision tree function approximation in reinforcement learning. In: Proceedings of the Third International Symposium on Adaptive Systems: Evolutionary Computation & Probabilistic Graphical Models, pp. 70–77 (2001)Google Scholar
  31. 31.
    Ravindran, B., Barto, A.: Model minimization in hierarchical reinforcement learning. In: Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation (2002)Google Scholar
  32. 32.
    Ravindran, B., Barto, A.: An algebraic approach to abstraction in reinforcement learning. In: Twelfth Yale Workshop on Adaptive and Learning Systems, pp. 109–114 (2003)Google Scholar
  33. 33.
    Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report, Cambridge University Engineering Department (1994)Google Scholar
  34. 34.
    Skinner, B.F.: Science and Human Behavior. Colliler-Macmillian (1953)Google Scholar
  35. 35.
    Soni, V., Singh, S.: Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the Twenty First National Conference on Artificial Intelligence (July 2006)Google Scholar
  36. 36.
    Sorg, J., Singh, S.: Transfer via soft homomorphisms. In: Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, May 2009, pp. 741–748 (2009)Google Scholar
  37. 37.
    Stone, P., Kuhlmann, G., Taylor, M.E., Liu, Y.: Keepaway soccer: From machine learning testbed to benchmark. In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005. LNCS (LNAI), vol. 4020, pp. 93–105. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  38. 38.
    Sutton, R.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bulletin 2, 160–163 (1991)CrossRefGoogle Scholar
  39. 39.
    Sutton, R., Barto, A.: Reinforcement Learning: an introduction. MIT Press, Cambridge (1998)Google Scholar
  40. 40.
    Sutton, R., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Taylor, M.E.: Assisting transfer-enabled machine learning algorithms: Leveraging human knowledge for curriculum design. In: The AAAI 2009 Spring Symposium on Agents that Learn from Human Teachers (March 2009)Google Scholar
  42. 42.
    Taylor, M.E., Jong, N.K., Stone, P.: Transferring instances for model-based reinforcement learning. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 488–505. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  43. 43.
    Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning (June 2007)Google Scholar
  44. 44.
    Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685 (2009)MathSciNetzbMATHGoogle Scholar
  45. 45.
    Taylor, M.E., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research 8(1), 2125–2167 (2007)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2), 215–219 (1994)CrossRefGoogle Scholar
  47. 47.
    Thorndike, E., Woodworth, R.: The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review 8, 247–261 (1901)CrossRefGoogle Scholar
  48. 48.
    Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing Systems, vol. 8, pp. 640–646 (1996)Google Scholar
  49. 49.
    Torrey, L., Shavlik, J.W., Walker, T., Maclin, R.: Relational macros for transfer in reinforcement learning. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 254–268. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  50. 50.
    Watkins, C.: Learning with Delayed Rewards. PhD thesis, Cambridge University (1989)Google Scholar
  51. 51.
    Weiss, G.: A multiagent variant of dyna-q. In: Proceedings of the 4th International Conference on Multi-Agent Systems (ICMAS 2000), pp. 461–462 (2000)Google Scholar
  52. 52.
    Wiering, M.: Explorations in Efficient Reinforcement Learning. PhD thesis, Universiteit van Amsterdam (1999)Google Scholar
  53. 53.
    Zucker, J.D.: A grounded theory of abstraction in artificial intelligence. Philosophical transactions of the Royal Society of London 358(1435:1), 293–309 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marc Ponsen
    • 1
  • Matthew E. Taylor
    • 2
  • Karl Tuyls
    • 1
  1. 1.Universiteit MaastrichtMaastrichtThe Netherlands
  2. 2.The University of Southern CaliforniaLos Angeles

Personalised recommendations