Proximal algorithms and temporal difference methods for solving fixed point problems

Bertsekas, Dimitri P.

doi:10.1007/s10589-018-9990-5

Proximal algorithms and temporal difference methods for solving fixed point problems

Published: 02 March 2018

Volume 70, pages 709–736, (2018)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Dimitri P. Bertsekas ORCID: orcid.org/0000-0001-6909-7208¹

973 Accesses
4 Citations
Explore all metrics

Abstract

In this paper we consider large fixed point problems and solution with proximal algorithms. We show that for linear problems there is a close connection between proximal iterations, which are prominent in numerical analysis and optimization, and multistep methods of the temporal difference type such as TD(\(\lambda \)), LSTD(\(\lambda \)), and LSPE(\(\lambda \)), which are central in simulation-based exact and approximate dynamic programming. One benefit of this connection is a new and simple way to accelerate the standard proximal algorithm by extrapolation towards a multistep iteration, which generically has a faster convergence rate. Another benefit is the potential for integration into the proximal algorithmic context of several new ideas that have emerged in the approximate dynamic programming context, including simulation-based implementations. Conversely, the analytical and algorithmic insights from proximal algorithms can be brought to bear on the analysis and the enhancement of temporal difference methods. We also generalize our linear case result to nonlinear problems that involve a contractive mapping, thus providing guaranteed and potentially substantial acceleration of the proximal and forward backward splitting algorithms at no extra cost. Moreover, under certain monotonicity assumptions, we extend the connection with temporal difference methods to nonlinear problems through a linearization approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

It is possible to scale the eigenvalues of Q to lie in the interval (0, 2] without changing the problem, by multiplying Q and b with a suitable positive scalar. This, however, requires some prior knowledge about the location of the eigenvalues of Q.
In approximate DP it is common to replace a fixed point equation of the form \(x=F(x)\) with the equation \(x=\Pi \big (F(x)\big )\). This approach comes under the general framework of Galerkin approximation, which is widely used in a variety of numerical computation contexts (see e.g., the books by Krasnoselskii [39] and Fletcher [34], and the DP-oriented discussion in the paper [68]). A distinguishing feature of approximate DP applications is that F is a linear mapping and the equation \(x=\Pi \big (F(x)\big )\) is typically solved by simulation-based methods.
The precise nature of the problem that TD(\(\lambda \)) is aiming to solve was unclear for a long time. The paper by Tsitsiklis and VanRoy [61] showed that it aims to find a fixed point of \(T^{(\lambda )}\) or \(\Pi T^{(\lambda )}\), and gave a convergence analysis (also replicated in the book [7]). The paper by Bertsekas and Yu [10] (Section 5.3) generalized TD(\(\lambda \)), LSTD(\(\lambda \)), and LSPE(\(\lambda \)) to the linear system context of this paper.
It is well known that the proximal iteration can be extrapolated by a factor of as much as two while maintaining convergence. This was first shown for the special case of a convex optimization problem in [13], and then for the general case of finding a zero of a monotone operator in [EcB92]; see also a more refined analysis, which quantifies the effects of extrapolation, for the case of a quadratic programming problem, given in [14], Section 2.3.1. However, we are not aware of any earlier proposal of a simple and general scheme to choose an extrapolation factor that maintains convergence and simultaneously guarantees acceleration. Moreover, this extrapolation factor, \((c+1)/c\), may be much larger than two.

References

Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, New York (2010)
Book MATH Google Scholar
Bertsekas, D.P., Borkar, V.S., Nedić, A.: Improved temporal difference methods with linear function approximation. In: Si, J., Barto, A., Powell, W., Wunsch, D. (eds.) Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
Google Scholar
Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near-optimal column-based matrix reconstruction. SIAM J. Comput. 43, 687–717 (2014)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Book MATH Google Scholar
Bertsekas, D.P., Ioffe, S.: Temporal differences-based policy iteration and applications in neuro-dynamic programming. Laboratory for Information and Decision Systems Report LIDS-P-2349, MIT (1996)
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. OR 16, 580–595 (1991)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont, MA (1996)
MATH Google Scholar
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal-recovery problems. In: Eldar, Y., Palomar, D. (eds.) Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2010)
Google Scholar
Bertsekas, D.P., Yu, H.: Solution of large systems of equations using approximate dynamic programming methods. Laboratory for Information and Decision Systems Report LIDS-P-2754, MIT (2007)
Bertsekas, D.P., Yu, H.: Projected equation methods for approximate solution of large linear systems. J. Comput. Appl. Math. 227, 27–50 (2009)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Yu, H.: Asynchronous distributed policy iteration in dynamic programming. In: Proceedings of Allerton Conference on Communication, Control and Computing, Allerton Park, Ill, pp. 1368–1374 (2010)
Bertsekas, D.P., Yu, H.: Q-learning and enhanced policy iteration in discounted dynamic programming. Math. OR 37, 66–94 (2012)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: On the method of multipliers for convex programming. IEEE Trans. Auton. Control 20, 385–388 (1975)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods, p. 1997. Academic Press, New York (1982). (Republished by Athena Scientific, Belmont, MA)
MATH Google Scholar
Bertsekas, D.P.: Temporal difference methods for general projected equations. IEEE Trans. Autom. Control 56, 2128–2139 (2011)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9(2011), 310–335 (2011)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control: Approximate Dynamic Programming, vol. II, 4th edn. Athena Scientific, Belmont, MA (2012)
MATH Google Scholar
Bertsekas, D.P.: \(\lambda \)-policy iteration: a review and a new implementation. In: Lewis, F., Liu, D. (eds.) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press, New York (2012)
Google Scholar
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont, MA (2015)
MATH Google Scholar
Bertsekas, D.P.: Proximal algorithms and temporal differences for large linear systems: extrapolation, approximation, and simulation. Laboratory for Information and Decision Systems Report LIDS-P-3205, MIT (2016)
Bertsekas, D.P.: Abstract Dynamic Programming, 2nd edn. Athena Scientific, Belmont, MA (2018). http://web.mit.edu/dimitrib/www/home.html
Boyan, J.A.: Technical update: least-squares temporal difference learning. Mach. Learn. 49, 1–15 (2002)
Article MATH Google Scholar
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22, 33–57 (1996)
MATH Google Scholar
Censor, J., Herman, G.T., Jiang, M.: A note on the behavior of the randomized Kaczmarz algorithm of Strohmer and Vershynin. J. Fourier Anal. Appl. 15, 431–436 (2009)
Article MathSciNet MATH Google Scholar
Curtiss, J.H.: A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations. In: Proceedings of Symposium on Monte Carlo Methods, pp. 191–233 (1954)
Curtiss, J.H.: A Monte Carlo methods for the iteration of linear operators. Uspekhi Mat. Nauk 12, 149–174 (1957)
MathSciNet MATH Google Scholar
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM. J. Comput. 35, 132–157 (2006)
Article MathSciNet MATH Google Scholar
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Calo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM. J. Comput. 36, 158–183 (2006)
Article MathSciNet MATH Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Sampling algorithms for L2 regression and applications. In: Proceedings 17th Annual SODA, pp. 1127–1136 (2006)
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM J. Matrix Anal. Appl. 30, 844–881 (2008)
Article MathSciNet MATH Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S., Sarlos, T.: Faster least squares approximation. Numer. Math. 117, 219–249 (2011)
Article MathSciNet MATH Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, New York (2003)
MATH Google Scholar
Fletcher, C.A.J.: Computational Galerkin Methods. Springer, New York (1984)
Book MATH Google Scholar
Forsythe, G.E., Leibler, R.A.: Matrix inversion by a Monte Carlo method. Mathematical Tables and Other Aids to Computation 4, 127–129 (1950)
Article MathSciNet Google Scholar
Gabillon, V., Ghavamzadeh, M., Scherrer, B.: Approximate dynamic programming finally performs well in the game of tetris. In: Advances in Neural Information Processing Systems, pp. 1754–1762 (2013)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)
Google Scholar
Halton, J.H.: A retrospective and prospective survey of the Monte Carlo method. SIAM Rev. 12, 1–63 (1970)
Article MathSciNet MATH Google Scholar
Krasnoselskii, M.A., et al.: Approximate Solution of Operator Equations. Wolters-Noordhoff Publication, Groningen (1972). Translated by D. Louvish
Book Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35, 641–654 (2010)
Article MathSciNet MATH Google Scholar
Lewis, F.L., Liu, D. (eds.): Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, Hoboken, NJ (2013)
Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet MATH Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Martinet, B.: Regularisation d’ Inequations Variationnelles par Approximations Successives. Rev. Francaise Inf. Rech. Oper. 4, 154–158 (1970)
MATH Google Scholar
Nedić, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. Theory Appl. 13, 79–110 (2003)
Article MathSciNet MATH Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 123–231 (2013)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York (2007)
Book MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Article MathSciNet MATH Google Scholar
Si, J., Barto, A., Powell, W., Wunsch, D. (eds.): Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, S., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Article Google Scholar
Scherrer, B., Ghavamzadeh, M., Gabillon, V., Lesner, B., Geist, M.: Approximate modified policy iteration and its application to the game of tetris. J. Mach. Learn. Res. 16, 1629–1676 (2015)
MathSciNet MATH Google Scholar
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3, 210–229 (1959)
Article MathSciNet Google Scholar
Samuel, A.L.: Some studies in machine learning using the game of checkers. II–recent progress. IBM J. Res. Dev. 11, 601–617 (1967)
Article Google Scholar
Scherrer, B.: Performance bounds for \(\lambda \)-policy iteration and application to the game of tetris. J. Mach. Learn. Res. 14, 1181–1227 (2013)
MathSciNet MATH Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge, MA (1998)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, San Rafael (2010)
MATH Google Scholar
Tesauro, G.J.: TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)
Article Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42, 674–690 (1997)
Article MathSciNet MATH Google Scholar
Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Control Optim. 29, 119–138 (1991)
Article MathSciNet MATH Google Scholar
Vrabie, D., Vamvoudakis, K.G., Lewis, F.L.: Optimal adaptive control and differential games by reinforcement learning principles. The Institution of Engineering and Technology, London (2013)
MATH Google Scholar
Wang, M., Bertsekas, D.P.: Stabilization of stochastic iterative methods for singular and nearly singular linear systems. Math. Oper. Res. 39, 1–30 (2013)
Article MathSciNet MATH Google Scholar
Wang, M., Bertsekas, D.P.: Convergence of iterative simulation-based methods for singular linear systems. Stoch. Systems 3, 39–96 (2014)
Google Scholar
Wang, M., Bertsekas, D.P.: Incremental constraint projection methods for variational inequalities. Math. Program. 150, 321–363 (2015)
Article MathSciNet MATH Google Scholar
Yu, H., Bertsekas, D.P.: Convergence results for some temporal difference methods based on least squares. IEEE Trans. Auton. Control 54, 1515–1531 (2006)
MathSciNet MATH Google Scholar
Yu, H., Bertsekas, D.P.: Error bounds for approximations from projected linear equations. Math. Oper. Res. 35, 306–329 (2010)
Article MathSciNet MATH Google Scholar
Yu, H., Bertsekas, D.P.: Weighted Bellman equations and their applications in dynamic programming. Laboratory for Information and Decision Systems Report LIDS-P-2876, MIT (2012)
Yu, H., Bertsekas, D.P.: Q-learning and policy iteration algorithms for stochastic shortest path problems. Ann. Oper. Res. 208, 95–132 (2013)
Article MathSciNet MATH Google Scholar
Yu, H.: Convergence of least squares temporal difference methods under general conditions. In: Proceedings of the 27th ICML, Haifa, Israel (2010)
Yu, H.: Least squares temporal difference methods: an analysis under general conditions. SIAM J. Control Optim. 50, 3310–3343 (2012)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA, 02139, USA
Dimitri P. Bertsekas

Authors

Dimitri P. Bertsekas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri P. Bertsekas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertsekas, D.P. Proximal algorithms and temporal difference methods for solving fixed point problems. Comput Optim Appl 70, 709–736 (2018). https://doi.org/10.1007/s10589-018-9990-5

Download citation

Received: 26 January 2017
Published: 02 March 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10589-018-9990-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximal algorithms and temporal difference methods for solving fixed point problems

Abstract

Access this article

Similar content being viewed by others

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

Testing and Non-linear Preconditioning of the Proximal Point Method

Modified Fejér sequences and applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Proximal algorithms and temporal difference methods for solving fixed point problems

Abstract

Access this article

Similar content being viewed by others

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

Testing and Non-linear Preconditioning of the Proximal Point Method

Modified Fejér sequences and applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation