Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control

Prashanth, L. A.; Korda, Nathaniel; Munos, Rémi

doi:10.1007/978-3-662-44851-9_5

L. A. Prashanth²³,
Nathaniel Korda²⁴ &
Rémi Munos²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8725))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4112 Accesses
5 Citations

Abstract

We propose a stochastic approximation based method with randomisation of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm. Our method results in an O(d) improvement in complexity in comparison to regular LSTD, where d is the dimension of the data. We provide convergence rate results for our proposed method, both in high probability and in expectation. Moreover, we also establish that using our scheme in place of LSTD does not impact the rate of convergence of the approximate value function to the true value function. This result coupled with the low complexity of our method makes it attractive for implementation in big data settings, where d is large. Further, we also analyse a similar low-complexity alternative for least squares regression and provide finite-time bounds there. We demonstrate the practicality of our method for LSTD empirically by combining it with the LSPI algorithm in a traffic signal control application.

Download to read the full chapter text

Chapter PDF

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Article 04 January 2021

Stochastic Approximation Algorithms

Application of Smoothing Technique to Model Predictive Traffic Signal Control

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bach, F., Moulines, E.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: NIPS (2011)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 4th edn. Approximate Dynamic Programming, vol. II (2012)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Optimization and Neural Computation Series 3, vol. 7. Athena Scientific (1996)
Google Scholar
Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)
MATH Google Scholar
Dani, V., Hayes, T.P., Kakade, S.M.: Stochastic linear optimization under bandit feedback. In: COLT, pp. 355–366 (2008)
Google Scholar
Fathi, M., Frikha, N.: Transport-entropy inequalities and deviation estimates for stochastic approximation schemes. arXiv preprint arXiv:1301.7740 (2013)
Google Scholar
Frikha, N., Menozzi, S.: Concentration Bounds for Stochastic Approximations. Electron. Commun. Probab. 17(47), 1–15 (2012)
MathSciNet Google Scholar
Geramifard, A., Bowling, M., Zinkevich, M., Sutton, R.S.: iLSTD: Eligibility traces and convergence analysis. In: NIPS, vol. 19, p. 441 (2007)
Google Scholar
Hazan, E., Kale, S.: Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization, pp. 421–436 (2011)
Google Scholar
Kushner, H.J., Yin, G.: Stochastic approximation and recursive algorithms and applications, vol. 35. Springer (2003)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet Google Scholar
Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of least-squares policy iteration. Journal of Machine Learning Research 13, 3041–3074 (2012)
MATH MathSciNet Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization 30(4), 838–855 (1992)
Article MATH MathSciNet Google Scholar
Prashanth, L., Bhatnagar, S.: Reinforcement Learning with Function Approximation for Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems 12(2), 412–421 (2011)
Article Google Scholar
Prashanth, L., Bhatnagar, S.: Threshold Tuning using Stochastic Optimization for Graded Signal Control. IEEE Transactions on Vehicular Technology 61(9), 3865–3880 (2012)
Article Google Scholar
Prashanth, L., Korda, N., Munos, R.: Fast LSTD using stochastic approximation: Finite time analysis and application to traffic control. arXiv preprint arXiv:1306.2557v4 (2014)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. In: The Annals of Mathematical Statistics, pp. 400–407 (1951)
Google Scholar
Ruppert, D.: Stochastic approximation. In: Handbook of Sequential Analysis, pp. 503–529 (1991)
Google Scholar
Silver, D., Sutton, R.S., Müller, M.: Reinforcement Learning of Local Shape in the Game of Go. In: IJCAI, vol. 7, pp. 1053–1058 (2007)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. Cambridge Univ. Press (1998)
Google Scholar
Sutton, R.S., Szepesvári, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation, pp. 1609–1616 (2009)
Google Scholar
Sutton, R.S., et al.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: ICML, pp. 993–1000. ACM (2009)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)
Article MATH Google Scholar
Webscope, Y.: Yahoo! Webscope dataset ydata-frontpage-todaymodule-clicks-v2_0 (2011), “ http://research.yahoo.com/Academic_Relations ”
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: ICML, pp. 928–925 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Lille - Nord Europe, Team SequeL, France
L. A. Prashanth & Rémi Munos
Oxford University, United Kingdom
Nathaniel Korda

Authors

L. A. Prashanth
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel Korda
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Munos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences,Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prashanth, L.A., Korda, N., Munos, R. (2014). Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44851-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-44851-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44850-2
Online ISBN: 978-3-662-44851-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control

Abstract

Chapter PDF

Similar content being viewed by others

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Stochastic Approximation Algorithms

Application of Smoothing Technique to Model Predictive Traffic Signal Control

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control

Abstract

Chapter PDF

Similar content being viewed by others

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Stochastic Approximation Algorithms

Application of Smoothing Technique to Model Predictive Traffic Signal Control

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation