New Error Bounds for Approximations from Projected Linear Equations

Yu, Huizhen; Bertsekas, Dimitri P.

doi:10.1007/978-3-540-89722-4_20

Huizhen Yu³ &
Dimitri P. Bertsekas⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

European Workshop on Reinforcement Learning

1076 Accesses

Abstract

We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markovian decision processes (MDP), one of our bounds is always sharper than the standard worst case bounds, and another one is often sharper. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

An Infeasible Stochastic Approximation and Projection Algorithm for Stochastic Variational Inequalities

Article 16 August 2019

Computing Expected Hitting Times for Imprecise Markov Chains

References

Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. II. Athena Scientific, Belmont (2007)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Bertsekas, D.P., Yu, H.: Projected equation methods for approximate solution of large linear systems. J. Computational and Applied Mathematics (to appear, 2008)
Google Scholar
Boyan, J.A.: Least-squares temporal difference learning. In: Proc. of the 16th Int. Conf. Machine Learning (1999)
Google Scholar
Konda, V.R.: Actor-Critic Algorithms. Ph.D thesis. MIT, Cambridge (2002)
Google Scholar
Munos, R.: Error bounds for approximate policy iteration. In: Proc. The 20th Int. Conf. Machine Learning (2003)
Google Scholar
Nedić, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. 13, 79–110 (2003)
Article MathSciNet MATH Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr. 42(5), 674–690 (1997)
Article MathSciNet MATH Google Scholar
Tsitsiklis, J.N., Van Roy, B.: Average cost temporal-difference learning. Automatica 35(11), 1799–1808 (1999)
Article MATH Google Scholar
Yu, H., Bertsekas, D.P.: New error bounds for approximations from projected linear equations. Technical Report C-2008-43, University of Helsinki (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology (HIIT), University of Helsinki, Finland
Huizhen Yu
Laboratory for Information and Decision Systems (LIDS), M.I.T., Cambridge, MA 02139, USA
Dimitri P. Bertsekas

Authors

Huizhen Yu
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri P. Bertsekas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Bertsekas, D.P. (2008). New Error Bounds for Approximations from Projected Linear Equations. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

New Error Bounds for Approximations from Projected Linear Equations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

An Infeasible Stochastic Approximation and Projection Algorithm for Stochastic Variational Inequalities

Computing Expected Hitting Times for Imprecise Markov Chains

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

New Error Bounds for Approximations from Projected Linear Equations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

An Infeasible Stochastic Approximation and Projection Algorithm for Stochastic Variational Inequalities

Computing Expected Hitting Times for Imprecise Markov Chains

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation