# On the limited memory BFGS method for large scale optimization

- 4.6k Downloads
- 2.1k Citations

## Abstract

We study the numerical performance of a limited memory quasi-Newton method for large scale optimization, which we call the L-BFGS method. We compare its performance with that of the method developed by Buckley and LeNir (1985), which combines cycles of BFGS steps and conjugate direction steps. Our numerical tests indicate that the L-BFGS method is faster than the method of Buckley and LeNir, and is better able to use additional storage to accelerate convergence. We show that the L-BFGS method can be greatly accelerated by means of a simple scaling. We then compare the L-BFGS method with the partitioned quasi-Newton method of Griewank and Toint (1982a). The results show that, for some problems, the partitioned quasi-Newton method is clearly superior to the L-BFGS method. However we find that for other problems the L-BFGS method is very competitive due to its low iteration cost. We also study the convergence properties of the L-BFGS method, and prove global convergence on uniformly convex problems.

## Key words

Large scale nonlinear optimization limited memory methods partitioned quasi-Newton method conjugate gradient method## Preview

Unable to display preview. Download preview PDF.

## References

- E.M.L. Beale, “Algorithms for very large nonlinear optimization problems,” in: M.J.D. Powell, ed.,
*Nonlinear Optimization 1981*(Academic Press, London, 1981) pp. 281–292.Google Scholar - A. Buckley, “A combined conjugate gradient quasi-Newton minimization algorithm,”
*Mathematical Programming*15 (1978) 200–210.Google Scholar - A. Buckley, “Update to TOMS Algorithm 630,” Rapports Techniques No. 91, Institut National de Recherche en Informatique et en Automatique, Domaine Voluceau, Rocquencourt, B.P. 105 (Le Chesnay, 1987).Google Scholar
- A. Buckley and A. LeNir, “QN-like variable storage conjugate gradients,”
*Mathematical Programming*27 (1983) 155–175.Google Scholar - A. Buckley and A. LeNir, “BBVSCG—A variable storage algorithm for function minimization,”
*ACM Transactions on Mathematical Software*11/2 (1985) 103–119.Google Scholar - R.H. Byrd and J. Nocedal, “A tool for the analysis of quasi-Newton methods with application to unconstrained minimization,”
*SIAM Journal on Numerical Analysis*26 (1989) 727–739.Google Scholar - J.E. Dennis Jr. and R.B. Schnabel,
*Numerical methods for unconstrained optimization and nonlinear equations*(Prentice-Hall, 1983).Google Scholar - J.E. Dennis Jr. and R.B. Schnabel, “A view of unconstrained optimization,” in: G.L. Nemhauser, A.H.G. Rinnooy Kan and M.J. Todd, eds.,
*Handbooks in Operations Research and Management Science, Vol. 1, Optimization*(North-Holland, Amsterdam, 1989) pp. 1–72.Google Scholar - R. Fletcher,
*Practical Methods of Optimization, Vol. 1, Unconstrained Optimization*(Wiley, New York, 1980).Google Scholar - J.C. Gilbert and C. Lemaréchal, “Some numerical experiments with variable storage quasi-Newton algorithms,” IIASA Working Paper WP-88, A-2361 (Laxenburg, 1988).Google Scholar
- P.E. Gill and W. Murray, “Conjugate-gradient methods for large-scale nonlinear optimization,” Technical Report SOL 79-15, Department of Operations Research, Stanford University (Stanford, CA, 1979).Google Scholar
- P.E. Gill, W. Murray and M.H. Wright,
*Practical Optimization*(Academic Press, London, 1981).Google Scholar - A. Griewank, “The global convergence of partitioned BFGS on semi-smooth problems with convex decompositions,” ANL/MCS-TM-105, Mathematics and Computer Science Division, Argonne National Laboratory (Argonne, IL, 1987).Google Scholar
- A. Griewank and Ph.L. Toint, “Partitioned variable metric updates for large structured optimization problems,”
*Numerische Mathematik*39 (1982a) 119–137.Google Scholar - A. Griewank and Ph.L. Toint, “Local convergence analysis of partitioned quasi-Newton updates,”
*Numerische Mathematik*39 (1982b) 429–448.Google Scholar - A. Griewank and Ph.L. Toint, “Numerical experiments with partially separable optimization problems,” in: D.F. Griffiths, ed.,
*Numerical Analysis: Proceedings Dundee 1983, Lecture Notes in Mathematics, Vol. 1066*(Springer, Berlin, 1984) pp. 203–220.Google Scholar - D.C. Liu and J. Nocedal, “Test results of two limited memory methods for large scale optimization,” Technical Report NAM 04, Department of Electrical Engineering and Computer Science, Northwestern University (Evanston, IL, 1988).Google Scholar
- J.J. Moré, B.S. Garbow and K.E. Hillstrom, “Testing unconstrained optimization software,”
*ACM Transactions on Mathematical Software*7 (1981) 17–41.Google Scholar - S.G. Nash, “Preconditioning of truncated-Newton methods,”
*SIAM Journal on Scientific and Statistical Computing*6 (1985) 599–616.Google Scholar - L. Nazareth, “A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms,”
*SIAM Journal on Numerical Analysis*16 (1979) 794–800.Google Scholar - J. Nocedal, “Updating quasi-Newton matrices with limited storage,”
*Mathematics of Computation*35 (1980) 773–782.Google Scholar - D.P. O'Leary, “A discrete Newton algorithm for minimizing a function of many variables,”
*Mathematical Programming*23 (1982) 20–33.Google Scholar - J.D. Pearson, “Variable metric methods of minimization,”
*Computer Journal*12 (1969) 171–178.Google Scholar - J.M. Perry, “A class of conjugate gradient algorithms with a two-step variable-metric memory,” Discussion Paper 269, Center for Mathematical Studies in Economics and Management Science, Northwestern University (Evanston, IL, 1977).Google Scholar
- M.J.D. Powell, “Some global convergence properties of a variable metric algorithm for minimization without exact line search,” in: R.W. Cottle and C.E. Lemke, eds.,
*Nonlinear Programing, SIAM-AMS Proceedings IX*(SIAM, Philadelphia, PA, 1976).Google Scholar - M.J.D. Powell, “Restart procedures for the conjugate gradient method,”
*Mathematical Programming*12 (1977) 241–254.Google Scholar - D.F. Shanno, “On the convergence of a new conjugate gradient algorithm,”
*SIAM Journal on Numerical Analysis*15 (1978a) 1247–1257.Google Scholar - D.F. Shanno, “Conjugate gradient methods with inexact searches,”
*Mathematics of Operations Research*3 (1978b) 244–256.Google Scholar - D.F. Shanno and K.H. Phua, “Matrix conditioning and nonlinear optimization,”
*Mathematical Programming*14 (1978) 149–160.Google Scholar - D.F. Shanno and K.H. Phua, “Remark on algorithm 500: minimization of unconstrained multivariate functions,”
*ACM Transactions on Mathematical Software*6 (1980) 618–622.Google Scholar - T. Steihaug, “The conjugate gradient method and trust regions in large scale optimization,”
*SIAM Journal on Numerical Analysis*20 (1983) 626–637.Google Scholar - Ph.L. Toint, “Some numerical results using a sparse matrix updating formula in unconstrained optimization,”
*Mathematics of Computation*32 (1978) 839–851.Google Scholar - Ph.L. Toint, “Towards an efficient sparsity exploiting Newton method for minimization,” in: I.S. Duff, ed.,
*Sparse Matrices and their Uses*(Academic Press, New York, 1981) pp. 57–87.Google Scholar - Ph.L. Toint, “Test problems for partially separable optimization and results for the routine PSPMIN,” Report Nr 83/4, Department of Mathematics, Facultés Universitaires de Namur (Namur, 1983a).Google Scholar
- Ph.L. Toint, “VE08AD, a routine for partially separable optimization with bounded variables,” Harwell Subroutine Library, A.E.R.E. (UK, 1983b).Google Scholar
- Ph.L. Toint, “A view of nonlinear optimization in a large number of variables,” Report Nr 86/16, Department of Mathematics, Facultés Universitaires de Namur (Namur, 1986).Google Scholar