Abstract
This work considers the problems of planning and learning in environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed environments.
Chapter PDF
Similar content being viewed by others
Keywords
- Optimal Policy
- Markov Decision Process
- Model Parameter Approximation
- Model Base Simulation
- Augmented Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Altman, E., Nain, P.: Closed-loop control with delayed information. In: Proc. 1992 ACM SIGMETRICS and PERFORMANCE, 1-5 1992, pp. 193–204. ACM Press, New York (1992)
Brooks, D.M., Leondes, C.T.: Markov decision processes with state-information lag. Operations Research 20(4), 904–907 (1972)
Katsikopoulos, K.V., Engelbrecht, S.E.: Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control 48, 568–574 (2003)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn., vol. 1/2. Athena Scientific (2001)
Bander, J.L., White III, C.C.: Markov decision processes with noise-corrupted and delayed state observations. The Journal of the Operational Research Society 50, 660–668 (1999)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)
Brafman, R.I., Tennenholtz, M.: R-max–a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. Artificial Intelligence Review 11(1-5), 75–113 (1997)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Mathematics of Operations Research 12(3), 441–450 (1987)
Singh, S.P., Yee, R.C.: An upper bound on the loss from approximate optimal-value functions. Machine Learning 16(3), 227–233 (1994)
Littman, M.L.: Algorithms for Sequential Decision Making. PhD thesis, Brown University, Providence, RI (1996)
Munos, R., Moore, A.W.: Rates of convergence for variable resolution schemes in optimal control. In: ICML, pp. 647–654 (2000)
Lin, L.J.: Reinforcement Learning for Robots using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA (1993)
Vijayakumar, S., Schaal, S.: Locally weighted projection regression: An o(n) algorithm for incremental real time learning in high dimensional space. In: ICML, pp. 1079–1086 (2000)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: NIPS, pp. 369–376 (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Walsh, T.J., Nouri, A., Li, L., Littman, M.L. (2007). Planning and Learning in Environments with Delayed Feedback. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)