Akian, M., Chancelier, J.P. & Quadrat, J.P., (1988). Dynamic Programming Complexity and Application. InProceedings of the 27th Conference on Decision and Control, Austin, Texas.
Arcilla, A.S., Hauser, J., Eiseman, P.R. & Thompson, J.F., (1991).Numerical Grid Generation in Computational Fluid Dynamics and Related Fields. North-Holland.
Barto, A.G., Bradtke, S.J. & Singh, S.P., (1994). Real-time Learning and Control using Asynchronous Dynamic Programming.AI Journal, to appear (also published as UMass Amherst Technical Report 91-57 in 1991).
Barto, A.G., Sutton, R.S. & Anderson, C.W., (1983). Neuronlike Adaptive elements that that can learn difficult Control Problems.IEEE Trans. on Systems Man and Cybernetics
, 13(5):835–846.Google Scholar
Bellman, R.E., (1957).Dynamic Programming
. Princeton University Press, Princeton, NJ.Google Scholar
Bertsekas, D.P. & Tsitsiklis, J.N., (1989).Parallel and Distributed Computation. Prentice Hall.
Brooks, R.A. & Lozano-Perez, T., (1983). A Subdivision Algorithm in Configuration Space for Findpath with rotation. InProceedings of the 8th International Conference on Artifical Intelligence.
Chapman, D. & Kaelbling, L.P., (1991). Learning from Delayed Reinforcement In a Complex Domain. Technical Report, Teleos Research.
Chow, C.S., (1990). Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems. Technical report, M.I.T. Laboratory for Information and Decision Sciences.
Dayan, P. & Hinton, G.E., (1993). Feudal Reinforcement Learning. In S. J. Hanson, J. D Cowan, and C. L. Giles, editors,Advances in Neural Information Processing Systems 5. Morgan Kaufmann.
Hoppe, R. H. W., (1986). Multi-Grid Methods for Hamilton-Jacobi-Bellman Equations.Numerical Mathematics, 49.
Kaelbling, L. (1993). Hierarchicial Learning in Stochastic Domains: Preliminary Results. InMachine Learning: Proceedings of the Tenth International Workshop. Morgan Kaufmann.
Kaelbling, L.P., (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90-04, Stanford University, Department of Computer Science, June 1990.
Kambhampati, Subbarao & Davis, Larry S., (1986). Multiresolution Path Planning for Mobile Robots.IEEE Journal of Robotics and Automation, Vol. RA-2, No. 3, 2(3).
Knuth, D.E., (1973).Sorting and Searching. Addison Wesley.
Koenig, S. & Simmons, R.G. (1993). Complexity Analysis of Reinforcement Learning. InProceedings of the Eleventh International Conference on Artificial Intelligence (AAAI-93). MIT Press.
Latombe, J. (1991).Robot Motion Planning. Kluwer.
McCormick, S.F., (1989).Multilevel Adaptive Methods for Partial Differential Equations. SIAM.
Michie, D. & Chambers, R.A., (1968). BOXES: An Experiment in Adaptive Control. In E. Dale and D. Michie, editors,Machine Intelligence 2. Oliver and Boyd.
Moore, A.W., (1991). Variable Resolution Dynamic Programming: Efficiently Learning Action Maps in Multivariate Real-valued State-spaces. In L. Birnbaum and G. Collins, editors,Machine Learning: Proceedings of the Eighth International Workshop. Morgan Kaufmann.
Moore, A.W. & Atkeson, C.G., (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time.Machine Learning, 13.
Nilsson, N.J., (1971).Problem-solving Methods in Artificial Intelligence. McGraw Hill.
Peng, J. & Williams, R.J., (1993). Efficient Learning and Planning Within the Dyna Framework. InProceedings of the Second International Conference on Simulation of Adaptive Behavior. MIT Press.
Sage, A.P. & White, C.C., (1977).Optimum Systems Control. Prentice Hall.
Schaal, S. & Atkeson, C.G., (1994). Assessing the Quality of Local Linear Models. InAdvances in Neural Information Processing Systems 6. Morgan Kaufmann.
Simons, J., Van Brussel, H., De Schutter, J. & Verhaert, J. (1982). A Self-Learning Automaton with Variable Resolution for High Precision Assembly by Industrial Robots.IEEE Trans. on Automatic Control
, 27(5):1109–1113.Google Scholar
Sutton, R.S., (1984). Temporal Credit Assignment in Reinforcement Learning. Phd. thesis, University of Massachusetts, Amherst.Google Scholar
Sutton, R.S., (1990). Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. InProceedings of the 7th International Conference on Machine Learning. Morgan Kaufmann.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD. Thesis, King's College, University of Cambridge.