Planning and Learning in Environments with Delayed Feedback

Walsh, Thomas J.; Nouri, Ali; Li, Lihong; Littman, Michael L.

doi:10.1007/978-3-540-74958-5_41

Thomas J. Walsh¹,
Ali Nouri¹,
Lihong Li¹ &
…
Michael L. Littman¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4701))

Included in the following conference series:

European Conference on Machine Learning

6342 Accesses
8 Citations

Abstract

This work considers the problems of planning and learning in environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed environments.

Download to read the full chapter text

Chapter PDF

Bayesian Reinforcement Learning with Exploration

Reinforcement Learning with Guarantees that Hold for Ever

Active Inference for Stochastic Control

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Google Scholar
Altman, E., Nain, P.: Closed-loop control with delayed information. In: Proc. 1992 ACM SIGMETRICS and PERFORMANCE, 1-5 1992, pp. 193–204. ACM Press, New York (1992)
Google Scholar
Brooks, D.M., Leondes, C.T.: Markov decision processes with state-information lag. Operations Research 20(4), 904–907 (1972)
Article Google Scholar
Katsikopoulos, K.V., Engelbrecht, S.E.: Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control 48, 568–574 (2003)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn., vol. 1/2. Athena Scientific (2001)
Google Scholar
Bander, J.L., White III, C.C.: Markov decision processes with noise-corrupted and delayed state observations. The Journal of the Operational Research Society 50, 660–668 (1999)
Article MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
MATH Google Scholar
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max–a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)
Article MathSciNet Google Scholar
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. Artificial Intelligence Review 11(1-5), 75–113 (1997)
Article Google Scholar
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Mathematics of Operations Research 12(3), 441–450 (1987)
MATH MathSciNet Google Scholar
Singh, S.P., Yee, R.C.: An upper bound on the loss from approximate optimal-value functions. Machine Learning 16(3), 227–233 (1994)
MATH Google Scholar
Littman, M.L.: Algorithms for Sequential Decision Making. PhD thesis, Brown University, Providence, RI (1996)
Google Scholar
Munos, R., Moore, A.W.: Rates of convergence for variable resolution schemes in optimal control. In: ICML, pp. 647–654 (2000)
Google Scholar
Lin, L.J.: Reinforcement Learning for Robots using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA (1993)
Google Scholar
Vijayakumar, S., Schaal, S.: Locally weighted projection regression: An o(n) algorithm for incremental real time learning in high dimensional space. In: ICML, pp. 1079–1086 (2000)
Google Scholar
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: NIPS, pp. 369–376 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,
Thomas J. Walsh, Ali Nouri, Lihong Li & Michael L. Littman

Authors

Thomas J. Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Ali Nouri
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Littman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Raomon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walsh, T.J., Nouri, A., Li, L., Littman, M.L. (2007). Planning and Learning in Environments with Delayed Feedback. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-74958-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Planning and Learning in Environments with Delayed Feedback

Abstract

Chapter PDF

Similar content being viewed by others

Bayesian Reinforcement Learning with Exploration

Reinforcement Learning with Guarantees that Hold for Ever

Active Inference for Stochastic Control

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Planning and Learning in Environments with Delayed Feedback

Abstract

Chapter PDF

Similar content being viewed by others

Bayesian Reinforcement Learning with Exploration

Reinforcement Learning with Guarantees that Hold for Ever

Active Inference for Stochastic Control

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation