Explanation-Based Learning and Reinforcement Learning: A Unified View

Dietterich, Thomas G.; Flann, Nicholas S.

doi:10.1023/A:1007355226281

Explanation-Based Learning and Reinforcement Learning: A Unified View

Published: August 1997

Volume 28, pages 169–210, (1997)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Explanation-Based Learning and Reinforcement Learning: A Unified View

Download PDF

Thomas G. Dietterich¹ &
Nicholas S. Flann²

1064 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. Most RL methods perform this propagation on a state-by-state basis, while EBL methods compute the weakest preconditions of operators, and hence, perform this propagation on a region-by-region basis. Barto, Bradtke, and Singh (1995) have observed that many algorithms for reinforcement learning can be viewed as asynchronous dynamic programming. Based on this observation, this paper shows how to develop dynamic programming versions of EBL, which we call region-based dynamic programming or Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of point-based dynamic programming and to standard EBL. The results show that region-based dynamic programming combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of reinforcement learning algorithms (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.

References

Atkeson, C. G. (1990). Using local models to control movement. In Advances in Neural Information Processing Systems, Vol. 2, pp. 316–323. Morgan Kaufmann.
Google Scholar
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Google Scholar
Bellman, R. E. (1957). Dynamic Programming. Princeton University Press.
Bern, M. (1990). Hidden surface removal for rectangles. Journal of Computer and System Sciences, 40, 49–69.
Google Scholar
Bertsekas, D. P., & Castanon, D. A. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, AC-34, 589–598.
Google Scholar
Bratko, I., & Michie, D. (1980). A representation for pattern-knowledge in chess endgames. In Clarke, M. R. (Ed.), Advances in Computer Chess, Vol. 2. Edinburgh University Press.
Chapman, D., & Kaelbling, L. P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 726–731. Morgan Kaufmann.
Christiansen, A. D. (1992). Learning to predict in uncertain continuous tasks. In Sleeman, D., & Edwards, P. (Eds.), Proceedings of the Ninth International Conference on Machine Learning, pp. 72–81 San Francisco. Morgan Kaufmann.
Cormen, T. H., Leiserson, C. E., & Rivest, R. L. (1990). Introduction to Algorithms. MIT Press.
Dijkstra, E.W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
Google Scholar
Edelsbrunner, H. (1983). A new approach to rectangle intersections. International Journal of Computer Mathematics, 13, 209–219.
Google Scholar
Flann, N. S. (1992). Correct Abstraction in Counter-planning: A Knowledge Compilation Approach. Ph.D. thesis, Oregon State University.
Kaelbling, L. P., Littman, M. L., & Moore, A.W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Google Scholar
Laird, J., Rosenbloom, P., & Newell, A. (1986). Chunking in Soar: the anatomy of a general learning mechanism. Machine Learning, 1(1), 11–46.
Google Scholar
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–322.
Google Scholar
Minton, S. (1988). Learning effective search control knowledge: An explanation-based approach. Ph.D. thesis, Carnegie-Mellon University. Technical Report CMU-CS-88-133.
Minton, S. (1990). Quantitative results concerning the utility of explanation-based learning. Artificial Intelligence, 42, 363–392.
Google Scholar
Mitchell, T. M., Keller, R. M., & Kedar-Cabelli, S. T. (1986). Explanation-based generalization: a unifying view. Machine Learning, 1(1), 47–80.
Google Scholar
Moore, A.W. (1993). The Parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. In Advances in Neural Information Processing, Vol. 6, pp. 711–718 San Mateo, CA. Morgan Kaufmann.
Google Scholar
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103–130.
Google Scholar
Puterman, M. L. (1994). Markov Decision Processes. J. Wiley & Sons, New York.
Google Scholar
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In Machine learning: An artificial intelligence approach, Vol. 1, pp. 463–482. Tioga Press, Palo Alto, CA.
Google Scholar
Sammut, C., & Cribb, J. (1990). Is learning rate a good performance criterion for learning? In Proceedings of the Seventh International Conference on Machine Learning, pp. 170–178 San Francisco, CA. Morgan Kaufmann.
Schaeffer, J. (1991). Checkers program earns the right to play for world title. Computing Research News, 1, 12. January.
Google Scholar
Subramanian, D., & Feldman, R. (1990). The utility of EBL in recursive domain theories. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90) Menlo Park, CA. AAAI Press.
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.
Google Scholar
Tambe, M., Newell, A., & Rosenbloom, P. (1990). The problem of expensive chunks and its solution by restricting expressiveness. Machine Learning, 5, 299–348.
Google Scholar
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.
Google Scholar
Thompson, K. (1986). Retrograde analysis of certain endgames. ICCA Journal, 9(3), 131–139.
Google Scholar
Watkins, C. J. C. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge.
Google Scholar
Watkins, C. J., & Dayan, P. (1992). Technical note Q-learning. Machine Learning, 8, 279–292.
Google Scholar
Yee, R. C., Saxena, S., Utgoff, P. E., & Barto, A. G. (1990). Explaining temporal differences to create useful concepts for evaluating states. In Proceedings of the Eight National Conference on Artificial Intelligence (AAAI-90), pp. 882–888 Cambridge, MA. AAAI Press/MIT Press.
Google Scholar
Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In 1995 International Joint Conference on Artificial Intelligence, pp. 1114–1120. AAAI/MIT Press, Cambridge, MA.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Oregon State University, Corvallis, OR, 97331-3202
Thomas G. Dietterich
Department of Computer Science, Utah State University, Logan, UT, 84322-4205
Nicholas S. Flann

Authors

Thomas G. Dietterich
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas S. Flann
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dietterich, T.G., Flann, N.S. Explanation-Based Learning and Reinforcement Learning: A Unified View. Machine Learning 28, 169–210 (1997). https://doi.org/10.1023/A:1007355226281

Download citation

Issue Date: August 1997
DOI: https://doi.org/10.1023/A:1007355226281

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Explanation-Based Learning and Reinforcement Learning: A Unified View

Abstract

Article PDF

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Explanation-Based Learning and Reinforcement Learning: A Unified View

Abstract

Article PDF

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation