Abstract
An agent that must learn to act in the world by trial and error faces the reinforcement learning problem, which is quite different from standard concept learning. Although good algorithms exist for this problem in the general case, they are often quite inefficient and do not exhibit generalization. One strategy is to find restricted classes of action policies that can be learned more efficiently. This paper pursues that strategy by developing algorithms that can efficiently learn action maps that are expressible in k-DNF. The algorithms are compared with existing methods in empirical trials and are shown to have very good performance.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Anderson, C. W. (1986). Learning and Problem Solving with Multilayer Connectionist Systems. PhD thesis, University of Massachusetts, Amherst, Massachusetts.
Barto, A. G., Sutton, R. S., & Brouwer, P. S. (1981). Associative search network: A reinforcement learning associative memory. Biological Cybernetics, 40, 201–211.
Berry, D. A. & Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. London: Chapman and Hall.
Enderton, H. B. (1972). A Mathematical Introduction to Logic. New York, New York: Academic Press.
Gibbons, J. D. (1985). Nonparametric Statistical Inference. New York and Basel: Marcel Dekker, Inc.
Gluck, M. A. (1991). Stimulus generalization and representation in adaptive network models of category learning. Psychological Science, 1 (1), 50–55.
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Redwood City, California: Addison Wesley.
Kaelbling, L. P. (1994). Associative reinforcement learning: A generate and test algorithm. Machine Learning, 15, 299–319.
Kaelbling, L. P. (1993). Learning in Embedded Systems. Cambridge, Massachusetts: The MIT Press. Also available as a PhD Thesis from Stanford University, 1990.
Larsen, R. J. & Marx, M. L. (1986). An Introduction to Mathematical Statistics and Its Applications. Englewood Cliffs, New Jersey: Prentice-Hall.
Minsky, M. L. & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, Massachusetts: The MIT Press.
Munro, P. (1987). A dual back-propagation scheme for scalar reward learning. In Proceedings of the Ninth Conference of the Cognitive Science Society (pp. 165–176). Seattle, Washington.
Narendra, K. & Thathachar, M. A. L. (1989). Learning Automata: An Introduction. Englewood Cliffs, New Jersey: Prentice-Hall.
Rosenblatt, F. (1961). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington, DC: Spartan Press.
Sutton, R. S. (1984). Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, Massachusetts.
Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3 (1), 9–44.
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27 (11), 1134–1142.
Valiant, L. G. (1985). Learning disjunctions of conjunctions. In Proceedings of the International Joint Conference on Artificial Intelligence, volume 1 (pp. 560–566). Los Angeles, California: Morgan Kaufmann.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge.
Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1, 339–356.
Widrow, B. & Hoff, M. E. (1960). Adaptive switching circuits. In IRE WESCON Convention Record New York, New York. Reprinted in Neurocomputing: Foundations of Research, James A. Anderson and Edward Rosenfeld, editors, The MIT Press, Cambridge, Massachusetts, 1988.
Williams, R. J. (1988). On the use of backpropagation in associative reinforcement learning. In Proceedings of the IEEE International Conference on Neural Networks San Diego, California.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8 (3), 229–256.
Wolpert, D. H. (1993). On Overfitting Avoidance as Bias. Technical Report 93-03-016, Santa Fe Institute, Santa Fe, New Mexico.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kaelbling, L.P. Associative Reinforcement Learning: Functions in k-DNF. Machine Learning 15, 279–298 (1994). https://doi.org/10.1023/A:1022689909846
Issue Date:
DOI: https://doi.org/10.1023/A:1022689909846