Artificial Intelligence Review

, Volume 21, Issue 3–4, pp 375–398 | Cite as

Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty

  • Michael G. Madden
  • Tom Howley


This paper describes an extension to reinforcement learning (RL), in which a standard RL algorithm is augmented with a mechanism for transferring experience gained in one problem to new but related problems. In this approach, named Progressive RL, an agent acquires experience of operating in a simple environment through experimentation, and then engages in a period of introspection, during which it rationalises the experience gained and formulates symbolic knowledge describing how to behave in that simple environment. When subsequently experimenting in a more complex but related environment, it is guided by this knowledge until it gains direct experience. A test domain with 15 maze environments, arranged in order of difficulty, is described. A range of experiments in this domain are presented, that demonstrate the benefit of Progressive RL relative to a basic RL approach in which each puzzle is solved from scratch. The experiments also analyse the knowledge formed during introspection, illustrate how domain knowledge may be incorporated, and show that Progressive Reinforcement Learning may be used to solve complex puzzles more quickly.

C4.5 experience transfer Naive Bayes PART Progressive RL Q-learning reinforcement learning rule learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abbott, R. (1990). Mad Mazes: Intriguing Mind Twisters for Puzzle Buffs, Game Nuts and Other Smart People.Adams Media,1990 ( Scholar
  2. Andre, D. & Russell, S. J. (2002). State Abstraction for Programmable Reinforcement Learning Agents. In Proceedings of The 18th National Conference on Artificial Intelligence.Google Scholar
  3. Bernstein, D. S. (1999). Reusing Old Policies to Accelerate Learning on New MDPs.Technical Report 99-26, University of Massachusetts.Google Scholar
  4. Blockeel, H. & De Raedt, L. (1998). Top-Down Induction of First Order Logical Decision Trees. Artificial Intelligence 101(1-2): 285–297.Google Scholar
  5. Bowling, M. & Veloso, M. (1998). Reusing Learned Policies Between Similar Problems.In Proceedings of The AI*AI-98 Workshop on New Trends in Robotics. Padua, Italy.Google Scholar
  6. Bowling, M. & Veloso, M. (1999). Bounding the Suboptimality of Reusing Subproblems.In Proceedings of The 16th International Joint Conference on AI, 1340–1347.Sweden: Morgan-Kaufmann.Google Scholar
  7. Breisemeister, L., Scheffer, T. & Wysotzki, F. (1995). Combination of Problem Solving and Learning from Experience. Technical Report 20/95, TU Berlin.Google Scholar
  8. Breisemeister, L., Scheffer, T. & Wysotzki, F. (1996). A Concept Formation Based Algorithmic Model for Skill Acquisition. In Proceedings of The First European Workshop on Cognitive Modelling.Google Scholar
  9. Carroll, J. L., Peterson, T. S. & Owen, N. E. (2001). Memory-guided Exploration in Reinforcement Learning. In Proceedings of The International Joint Conference on Neural Networks. Washington, DC.Google Scholar
  10. Dietterich, T. (2000). Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research 13: 227–303.Google Scholar
  11. Dixon, K. R., Malak, R. J. & Khosla, P. K. (2000). Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents. Technical Report, Institute for Complex Engineered Systems, Carnegie Mellon University.Google Scholar
  12. Dreyfus, H. L. & Dreyfus, S. E. (1986). Mind Over Machine: The Power of Human Intuition and Experience in the Era of the Computer. Blackwell.Google Scholar
  13. Džeroski, S., De Raedt, L. & Driessens, K. (2001). Relational Reinforcement Learning.Machine Learning 43: 7–52.Google Scholar
  14. Kaelbling, L. P., Littman, M. L. & Moore, A. W. (1996). Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4: 237–285.Google Scholar
  15. Langley, P., Iba, W. & Thompson, K. (1992). An Analysis of Bayesian Classifiers. In Proceedings of The 10th National Conference on Artificial Intelligence, 223–228.Google Scholar
  16. Maclin, R. & Shavlik, J. W. (1996). Creating Advice-Taking Reinforcement Learners.Machine Learning 22: 251–282.Google Scholar
  17. Michie, D., Bain, M. & Hayes-Michie, J. (1990). Cognitive Models from Subcognitive Skills. In McGhee, J., Grimble, M. J. & Mowforth, P. (eds.) Knowledge-Based Systems for Industrial Control, 71–99. Peter Peregrinus: London.Google Scholar
  18. Ng, A. Y., Harada, D. & Russell, S. J. (1999). Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In Proceedings of The 16th International Conference on Machine Learning.Google Scholar
  19. Perkins, T. J. & Precup, D. (1999). Using Options for Knowledge Transfer in Reinforcement Learning. Technical Report 99-34, University of Massachusetts.Google Scholar
  20. Russell, S. J. & Norvig, P. (2003). Artificial Intelligence: A Modern Approach, 2nd edn.Prentice Hall.Google Scholar
  21. Šuc, D. (2001). Skill Machine Reconstruction of Human Control Strategies. Ph.D. dissertation, University of Ljubljana, Slovenia.Google Scholar
  22. Quinlan, J. R. (1990). Learning Logical Definitions from Relations. Machine Learning 5: 239–266.Google Scholar
  23. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.Google Scholar
  24. Sun, R., Peterson, T. & Merrill, E. (1996). Bottom-up Skill Learning in Reactive Sequential Decision Tasks. In Proceedings of The 18th Cognitive Science Society Conference.Google Scholar
  25. Sun, R. & Peterson, T. (1998). Autonomous Learning of Sequential Tasks: Experiments and Analyses. IEEE Transactions on Neural Networks 9: 1217–1234.Google Scholar
  26. Sun, R. & Merrill, E. (2001). From Implicit Skills to Explicit Knowledge: A Bottom-Up Model of Skill Learning. Cognitive Science 25(2): 203–244.Google Scholar
  27. Sutton, R. S. & Barto, A. S. (1998). Reinforcement Learning, an Introduction. MIT Press.Google Scholar
  28. Sutton, R. S., McAllester, D. A., Singh, S. P. & Mansour, Y. (2000). Policy GradientMethods for Reinforcement Learning with Function Approximation. In Proceedings ofAdvances in Neural Information Systems.Google Scholar
  29. Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning 8: 257–277.Google Scholar
  30. Thrun, S. (1996). Explanation-Based Neural Network Learning: A Lifelong Learning Approach. Kluwer Academic Publishers.Google Scholar
  31. Thrun, S. & Schwartz, A. (1995). Finding Structure in Reinforcement Learning. In Proceedings of Advances in Neural Information Systems, 385–392.Google Scholar
  32. Utgoff, P. E. & Cohen, P. R. (1998). Applicability of Reinforcement Learning. In Proceedings of AAAI Workshop on The Methodology of Applying Machine Learning.Google Scholar
  33. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. dissertation, Cambridge University.Google Scholar
  34. Whitehead, S. D. (1991). A Study of Cooperative Mechanisms for Faster Reinforcement Learning. Technical Report 365, University of Rochester, New York.Google Scholar
  35. Witten, I. H. & Frank, E. (2000). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Michael G. Madden
  • Tom Howley

There are no affiliations available

Personalised recommendations