Anderson, C.W. (1987). Strategy learning with multilayer connectionist representations.Proceedings of the Fourth International Workshop on Machine Learning (pp. 103–114).
Barto, A.G., Sutton, R.S., & Watkins, C.J.C.H. (1990). Learning and sequential decision making. In: M. Gabriel & J.W. Moore (Eds.),Learning and computational neuroscience. MIT Press.
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991).Real-time learning and control using asynchronous dynamic programming. (Technical Report 91–57). University of Massachusetts, Computer Science Department.
Chapman, D. & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons.Proceedings of IJCAI-91.
Dayan, P. (1992). The convergence of TD(λ) for general λ.Machine Learning, 8
, 341–362.Google Scholar
Grefenstette, J.J., Ramsey, C.L., & Schultz, A.C. (1990). Learning sequential decision rules using simulation models and competition.Machine Learning, 5
, 355–382.Google Scholar
Hinton, G.E., McClelland, J.L., & Rumelhart, D.E. (1986). Distributed representations.Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1, Bradford Books/MIT Press.
Howard, R.A. (1960).Dynamic programming and Markov processes
. Wiley, New York.Google Scholar
Kaelbling, L.P. (1990).Learning in embedded systems. Ph.D. Thesis, Department of Computer Science, Stanford University.
Lang, K.J. (1989).A time-delay neural network architecture for speech recognition. Ph.D. Thesis, School of Computer Science, Carnegie Mellon University.
Lin, Long-Ji. (1991a). Self-improving reactive agents: Case studies of reinforcement learning frameworks.Proceedings of the First International Conference on Simulation of Adaptive Behavior: From Animals to Animats (pp. 297–305). Also Technical Report CMU-CS-90-109, Carnegie Mellon University.
Lin, Long-Ji. (1991b). Self-improvement based on reinforcement learning, planning and teaching.Proceedings of the Eighth International Workshop on Machine Learning (pp. 323–327).
Lin, Long-Ji. (1991c). Programming robots using reinforcement learning and teaching.Proceedings of AAAI-91 (pp. 781–786).
Mahadevan, S. & Connell, J. (1991). Scaling reinforcement learning to robotics by exploiting the subsumption architecture.Proceedings of the Eighth International Workshop on Machine Learning (pp. 328–332).
Mitchell, T.M. (1982). Generalization as search.Articial Intelligence
, 18, 203–226.Google Scholar
Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces.Proceedings of the Eighth International Workshop on Machine Learning (pp. 333–337).
Mozer, M.C. (1986).RAMBOT: A connectionist expert system that learns by example. (Institute for Cognitive Science Report 8610). University of California at San Diego.
Pomerleau, D.A. (1989).ALVINN: An autonomous land vehicle in a neural network (Technical Report CMU-CS-89-107). Carnegie Mellon University.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation.Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1. Bradford Books/MIT Press.
Sutton, R.S. (1984).Temporal credit assignment in reinforcement learning. Ph.D. Thesis, Dept. of Computer and Information Science, University of Massachusetts.
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning
, 3, 9–44.Google Scholar
Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming.Proceedings of the Seventh International Workshop on Machine Learning (pp. 216–224).
Tan, Ming. (1991). Learning a cost-sensitive internal representation for reinforcement learning.Proceedings of the Eighth International Workshop on Machine Learning (pp. 358–362).
Thrun, S.B., Möller, K., & Linden, A. (1991). Planning with an adaptive world model. In D.S. Touretzky (Ed.),Advances in neural information processing systems 3, Morgan Kaufmann.
Thrun, S.B. & Möller, K. (1992). Active exploration in dynamic environments. To appear in D.S. Touretzky (Ed.),Advances in neural information processing systems 4, Morgan Kaufmann.
Watkins, C.J.C.H. (1989).Learning from delayed rewards
. Ph.D. Thesis, King's College, Cambridge.Google Scholar
Williams, R.J. & Zipser, D. (1988).A learning algorithm for continually running fully recurrent neural networks (Institute for Cognitive Science Report 8805). University of California at San Diego.
Whitehead, S.D. & Ballard, D.H. (1989). A role for anticipation in reactive systems that learn.Proceedings of the Sixth International Workshop on Machine Learning (pp. 354–357).
Whitehead, S.D. & Ballard, D.H. (1991a). Learning to perceive and act by trial and error.Machine Learning, 7
Whitehead, S.D. (1991b). Complexity and cooperation in Q-learning.Proceedings of the Eighth International Workshop on Machine Learning (pp. 363–367).