Abstract
In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. Most RL methods perform this propagation on a state-by-state basis, while EBL methods compute the weakest preconditions of operators, and hence, perform this propagation on a region-by-region basis. Barto, Bradtke, and Singh (1995) have observed that many algorithms for reinforcement learning can be viewed as asynchronous dynamic programming. Based on this observation, this paper shows how to develop dynamic programming versions of EBL, which we call region-based dynamic programming or Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of point-based dynamic programming and to standard EBL. The results show that region-based dynamic programming combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of reinforcement learning algorithms (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.
Article PDF
Similar content being viewed by others
References
Atkeson, C. G. (1990). Using local models to control movement. In Advances in Neural Information Processing Systems, Vol. 2, pp. 316–323. Morgan Kaufmann.
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Bellman, R. E. (1957). Dynamic Programming. Princeton University Press.
Bern, M. (1990). Hidden surface removal for rectangles. Journal of Computer and System Sciences, 40, 49–69.
Bertsekas, D. P., & Castanon, D. A. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, AC-34, 589–598.
Bratko, I., & Michie, D. (1980). A representation for pattern-knowledge in chess endgames. In Clarke, M. R. (Ed.), Advances in Computer Chess, Vol. 2. Edinburgh University Press.
Chapman, D., & Kaelbling, L. P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 726–731. Morgan Kaufmann.
Christiansen, A. D. (1992). Learning to predict in uncertain continuous tasks. In Sleeman, D., & Edwards, P. (Eds.), Proceedings of the Ninth International Conference on Machine Learning, pp. 72–81 San Francisco. Morgan Kaufmann.
Cormen, T. H., Leiserson, C. E., & Rivest, R. L. (1990). Introduction to Algorithms. MIT Press.
Dijkstra, E.W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
Edelsbrunner, H. (1983). A new approach to rectangle intersections. International Journal of Computer Mathematics, 13, 209–219.
Flann, N. S. (1992). Correct Abstraction in Counter-planning: A Knowledge Compilation Approach. Ph.D. thesis, Oregon State University.
Kaelbling, L. P., Littman, M. L., & Moore, A.W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Laird, J., Rosenbloom, P., & Newell, A. (1986). Chunking in Soar: the anatomy of a general learning mechanism. Machine Learning, 1(1), 11–46.
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–322.
Minton, S. (1988). Learning effective search control knowledge: An explanation-based approach. Ph.D. thesis, Carnegie-Mellon University. Technical Report CMU-CS-88-133.
Minton, S. (1990). Quantitative results concerning the utility of explanation-based learning. Artificial Intelligence, 42, 363–392.
Mitchell, T. M., Keller, R. M., & Kedar-Cabelli, S. T. (1986). Explanation-based generalization: a unifying view. Machine Learning, 1(1), 47–80.
Moore, A.W. (1993). The Parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. In Advances in Neural Information Processing, Vol. 6, pp. 711–718 San Mateo, CA. Morgan Kaufmann.
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103–130.
Puterman, M. L. (1994). Markov Decision Processes. J. Wiley & Sons, New York.
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In Machine learning: An artificial intelligence approach, Vol. 1, pp. 463–482. Tioga Press, Palo Alto, CA.
Sammut, C., & Cribb, J. (1990). Is learning rate a good performance criterion for learning? In Proceedings of the Seventh International Conference on Machine Learning, pp. 170–178 San Francisco, CA. Morgan Kaufmann.
Schaeffer, J. (1991). Checkers program earns the right to play for world title. Computing Research News, 1, 12. January.
Subramanian, D., & Feldman, R. (1990). The utility of EBL in recursive domain theories. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90) Menlo Park, CA. AAAI Press.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.
Tambe, M., Newell, A., & Rosenbloom, P. (1990). The problem of expensive chunks and its solution by restricting expressiveness. Machine Learning, 5, 299–348.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.
Thompson, K. (1986). Retrograde analysis of certain endgames. ICCA Journal, 9(3), 131–139.
Watkins, C. J. C. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge.
Watkins, C. J., & Dayan, P. (1992). Technical note Q-learning. Machine Learning, 8, 279–292.
Yee, R. C., Saxena, S., Utgoff, P. E., & Barto, A. G. (1990). Explaining temporal differences to create useful concepts for evaluating states. In Proceedings of the Eight National Conference on Artificial Intelligence (AAAI-90), pp. 882–888 Cambridge, MA. AAAI Press/MIT Press.
Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In 1995 International Joint Conference on Artificial Intelligence, pp. 1114–1120. AAAI/MIT Press, Cambridge, MA.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dietterich, T.G., Flann, N.S. Explanation-Based Learning and Reinforcement Learning: A Unified View. Machine Learning 28, 169–210 (1997). https://doi.org/10.1023/A:1007355226281
Issue Date:
DOI: https://doi.org/10.1023/A:1007355226281