Bakshi, B. R. & Stephanopoulos, G., (1993). "Wave-Net: A Multiresolution, Hierarchical Neural Network with Localized Learning", AIChE Journal, vol, 39, no 1, pp. 57–81.

Barto, A. G., Bradtke, S. J. & Singh, S. P., (1995). "Real-time Learning and Control Using Asynchronous Dynamic Programming", Artificial Intelligence, vol. 72, pp. 81–138.

Bellman, R. E. & Dreyfus, S. E., (1959). "Functional Approximation and Dynamic Programming" Math. Tables and Other Aids Comp. Vol. 13, pp. 247–251.

Bortsekas, D. P., (1995).*Dynamic Programming and Optimal Control*, Athena Scientific, Bellmont, MA.

Bertsekas, D. P. (1994), "A Counter-Example to Temporal Differences Learning", Neural Computation, vol. 7, pp. 270–279.

Bertsekas, D. P. & Castañon, D. A., (1989), "Adaptive Aggregation for Infinite Horizon Dynamic Programming", IEEE Transactions on Automatic Control, Vol. 34, No. 6, pp. 589–598.

Bertsekas, D. P. & Tsitsiklis, J. N., (1989).*Parallel and Distributed Computation: Numerical Methods*, Prentice Hall, Englewood Cliffs, NJ.

Dayan, P. D., (1992). "The Convergence of TD(λ) for General λ", Machine Learning, vol. 8, pp. 341–362.

Gordon, G. J., (1995). "Stable Function Approximation in Dynamic Programming", Technical Report: CMUCS-95-103, Carnegie Mellon University.

Jaakola, T., Jordan M. I., & Singh, S. P., (1994). "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, No. 6.

Jaakola T., Singh, S. P., & Jordan, M. I., (1995). "Reinforcement Learning Algorithms for Partially Observable Markovian Decision Processes," in*Advances in Neural Information Processing Systems 7*, J. D. Cowan, G. Tesauro, and D. Touretzky, editors, Morgan Kaufmann.

Korf, R. E. (1987). "Planning as Search: A Quantitative Approach", Artificial Intelligence, vol. 33, pp. 65–88.

Lippman, R. P., Kukolich, L. & Singer, E., (1993). "LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification", The Lincoln Laboratory Journal, vol. 6, no. 2, pp. 249–268.

Morin, T. L., (1987). "Computational Advances in Dynamic Programming", in*Dynamic Programming and Its Applications*, edited by Puterman, M.L., pp. 53–90.

Poggio, T. & Girosi, F., (1990). "Networks for Approximation and Learning", Proceedings of the IEEE, vol. 78, no. 9, pp. 1481–1497.

Reetz, D., (1977). "Approximate Solutions of a Discounted Markovian Decision Process", Bonner Mathematische Schriften, vol. 98: Dynamische Optimierung, pp. 77–92.

Schweitzer, P. J., & Seidmann, A., (1985). "Generalized Polynomial Approximations in Markovian Decision Processes", Journal of Mathematical Analysis and Applications, vol. 110, pp. 568–582.

Sutton, R. S., (1988). "Learning to Predict by the Method of Temporal Differences", Machine Learning, vol. 3, pp. 9–44.

Tesauro, G., (1992). "Practical Issues in Temporal Difference Learning", Machine Learning, vol. 8, pp. 257–277.

Tsitsiklis, J. N., (1994). "Asynchronous Stochastic Approximation and Q-Learning", Machine Learning, vol. 16, pp. 185–202.

Watkina, C. J. C. H., Dayan, P., (1992). "Q-learning", Machine Learning, vol. 8, pp. 279–292.

Whitt, W., (1978). Approximations of Dynamic Programs I.*Mathematics of Operations Research*, vol. 3, pp. 231–243.