Ash, R. B. 1972. *Real Analysis and Probability*. New York: Academic Press Inc.

Bertsekas, D. P. 1995. A counterexample to temporal differences learning. *Neural Computation* 7: 270–279.

Bertsekas, D. P., and Ioffe, S. 1996. Temporal differences-based policy iteration and application in neuro-dynamic programming. *Lab. for Info. and Decision Systems Report* LIDS-P-2349. Cambridge, MA: MIT.

Bertsekas, D. P. 1999. *Nonlinear Programming*, 2nd edition. Belmont, MA: Athena Scientific.

Bertsekas, D. P. 2001. *Dynamic Programming and Optimal Control*, 2nd edition. Belmont, MA: Athena Scientific.

Bertsekas, D. P., and Tsitsiklis, J. N. 1996. *Neuro-Dynamic Programming*. Belmont, MA: Athena Scientific.

Bertsekas, D. P., and Tsitsiklis, J. N. 2000. Gradient convergence in gradient methods with errors. *SIAM J. Optim*. 10: 627–642.

Boyan, J. A. 2002. Technical update: least-squares temporal difference learning. To appear in *Machine Learning*, 49.

Bradtke, S. J., and Barto, A. G. 1996. Linear least-squares algorithms for temporal difference learning. *Machine Learning* 22: 33–57.

Dayan, P., and Sejnowski, T. J. 1994. TD(l) converges with probability 1. *Machine Learning* 14: 295–301.

Gallager, R. G. 1995. *Discrete Stochastic Processes*. Boston, MA: Kluwer Academic Publishers.

Gurvits, L., Lin, L., and Hanson, S. J. 1994. *Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems*. Working paper. Princeton, NJ: Siemens Corporate Research.

Golub, G. H., and Van Loan, C. F. 1996. *Matrix Computations*, 3rd edition. Baltimore, MD: Johns Hopkins University Press

Jaakkola, T., Jordan, M. I., and Singh S. P. 1994. On the convergence of stochastic iterative dynamic programming algorithms. *Neural Computation* 6: 1185–1201.

Kemeny, J. G., and Snell, J. L. 1967. *Finite Markov Chains*. New York: Van Nostrand Company.

Neveu, J. 1975. *Discrete Parameter Martingales*. Amsterdam: North-Holland.

Parzen, E. 1962. *Modern Probability Theory and Its Applications*. New York: John Wiley Inc.

Puterman, M. L. 1994. *Markovian Decision Problems*. New York: John Wiley Inc.

Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. *Machine Learning* 3: 9–44.

Tadic Â, V. 2001. On the convergence of temporal-difference learning with linear function approximation. *Machine Learning* 42: 241–267.

Tsitsiklis, J. N., and Van Roy, B. 1997. An analysis of temporal-difference learning with function approximation. *IEEE Transactions on Automatic Control* 42: 674–690.