Planning and Learning in Environments with Delayed Feedback

  • Thomas J. Walsh
  • Ali Nouri
  • Lihong Li
  • Michael L. Littman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4701)


This work considers the problems of planning and learning in environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed environments.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)Google Scholar
  2. 2.
    Altman, E., Nain, P.: Closed-loop control with delayed information. In: Proc. 1992 ACM SIGMETRICS and PERFORMANCE, 1-5 1992, pp. 193–204. ACM Press, New York (1992)Google Scholar
  3. 3.
    Brooks, D.M., Leondes, C.T.: Markov decision processes with state-information lag. Operations Research 20(4), 904–907 (1972)CrossRefGoogle Scholar
  4. 4.
    Katsikopoulos, K.V., Engelbrecht, S.E.: Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control 48, 568–574 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn., vol. 1/2. Athena Scientific (2001)Google Scholar
  6. 6.
    Bander, J.L., White III, C.C.: Markov decision processes with noise-corrupted and delayed state observations. The Journal of the Operational Research Society 50, 660–668 (1999)MATHCrossRefGoogle Scholar
  7. 7.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)MATHGoogle Scholar
  8. 8.
    Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)Google Scholar
  9. 9.
    Brafman, R.I., Tennenholtz, M.: R-max–a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. Artificial Intelligence Review 11(1-5), 75–113 (1997)CrossRefGoogle Scholar
  11. 11.
    Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Mathematics of Operations Research 12(3), 441–450 (1987)MATHMathSciNetGoogle Scholar
  12. 12.
    Singh, S.P., Yee, R.C.: An upper bound on the loss from approximate optimal-value functions. Machine Learning 16(3), 227–233 (1994)MATHGoogle Scholar
  13. 13.
    Littman, M.L.: Algorithms for Sequential Decision Making. PhD thesis, Brown University, Providence, RI (1996)Google Scholar
  14. 14.
    Munos, R., Moore, A.W.: Rates of convergence for variable resolution schemes in optimal control. In: ICML, pp. 647–654 (2000)Google Scholar
  15. 15.
    Lin, L.J.: Reinforcement Learning for Robots using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA (1993)Google Scholar
  16. 16.
    Vijayakumar, S., Schaal, S.: Locally weighted projection regression: An o(n) algorithm for incremental real time learning in high dimensional space. In: ICML, pp. 1079–1086 (2000)Google Scholar
  17. 17.
    Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: NIPS, pp. 369–376 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Thomas J. Walsh
    • 1
  • Ali Nouri
    • 1
  • Lihong Li
    • 1
  • Michael L. Littman
    • 1
  1. 1.Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854 

Personalised recommendations