On the Acceleration of Forward-Backward Splitting via an Inexact Newton Method

  • Andreas Themelis
  • Masoud Ahookhosh
  • Panagiotis PatrinosEmail author


We propose a Forward-Backward Truncated-Newton method (FBTN) for minimizing the sum of two convex functions, one of which smooth. Unlike other proximal Newton methods, our approach does not involve the employment of variable metrics, but is rather based on a reformulation of the original problem as the unconstrained minimization of a continuously differentiable function, the forward-backward envelope (FBE). We introduce a generalized Hessian for the FBE that symmetrizes the generalized Jacobian of the nonlinear system of equations representing the optimality conditions for the problem. This enables the employment of conjugate gradient method (CG) for efficiently solving the resulting (regularized) linear systems, which can be done inexactly. The employment of CG prevents the computation of full (generalized) Jacobians, as it requires only (generalized) directional derivatives. The resulting algorithm is globally (subsequentially) convergent, Q-linearly under an error bound condition, and up to Q-superlinearly and Q-quadratically under regularity assumptions at the possibly non-isolated limit point.


Forward-backward splitting Linear Newton approximation Truncated-Newton method Backtracking linesearch Error bound Superlinear convergence 

AMS 2010 Subject Classification

49J52 49M15 90C06 90C25 90C30 



This work was supported by the Research Foundation Flanders (FWO) research projects G086518N and G086318N; KU Leuven internal funding StG/15/043; Fonds de la Recherche Scientifique—FNRS and the Fonds Wetenschappelijk Onderzoek—Vlaanderen under EOS Project no 30468160 (SeLMA).


  1. 1.
    Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming 137(1), 91–129 (2013). DOI 10.1007/s10107-011-0484-9Google Scholar
  2. 2.
    Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics. Springer (2017). DOI 10.1007/978-3-319-48311-5Google Scholar
  3. 3.
    Bauschke, H.H., Noll, D., Phan, H.M.: Linear and strong convergence of algorithms involving averaged nonexpansive operators. Journal of Mathematical Analysis and Applications 421(1), 1–20 (2015)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2017). DOI 10.1137/1.9781611974997Google Scholar
  5. 5.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2(1), 183–202 (2009). DOI 10.1137/080716542Google Scholar
  6. 6.
    Becker, S., Fadili, J.: A quasi-Newton proximal splitting method. In: Advances in Neural Information Processing Systems, pp. 2618–2626 (2012)Google Scholar
  7. 7.
    Bertsekas, D.P.: Constrained optimization and lagrange multiplier methods. Computer Science and Applied Mathematics, Boston: Academic Press, 1982 (1982)zbMATHGoogle Scholar
  8. 8.
    Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific (2015)Google Scholar
  9. 9.
    Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics. Springer New York (1997)Google Scholar
  10. 10.
    Bochnak, J., Coste, M., Roy, M.F.: Real Algebraic Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics. Springer Berlin Heidelberg (2013)Google Scholar
  11. 11.
    Bolte, J., Daniilidis, A., Lewis, A.: Tame functions are semismooth. Mathematical Programming 117(1), 5–19 (2009). DOI 10.1007/s10107-007-0166-9Google Scholar
  12. 12.
    Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization 3(3), 538–543 (1993). DOI 10.1137/0803026Google Scholar
  13. 13.
    Chen, X., Fukushima, M.: Proximal quasi-Newton methods for nondifferentiable convex optimization. Mathematical Programming 85(2), 313–334 (1999). DOI 10.1007/s101070050059Google Scholar
  14. 14.
    Chen, X., Qi, H., Tseng, P.: Analysis of nonsmooth symmetric-matrix-valued functions with applications to semidefinite complementarity problems. SIAM Journal on Optimization 13(4), 960–985 (2003). DOI 10.1137/S1052623400380584Google Scholar
  15. 15.
    Clarke, F.H.: Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics (1990). DOI 10.1137/1.9781611971309Google Scholar
  16. 16.
    Combettes, P.L., Pesquet, J.C.: Proximal Splitting Methods in Signal Processing, pp. 185–212. Springer New York, New York, NY (2011). DOI 10.1007/978-1-4419-9569-8_10Google Scholar
  17. 17.
    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Mathematics of Operations Research (2018)Google Scholar
  18. 18.
    Eldén, L.: Matrix Methods in Data Mining and Pattern Recognition. Society for Industrial and Applied Mathematics (2007). DOI 10.1137/1.9780898718867Google Scholar
  19. 19.
    Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems, vol. II. Springer (2003)Google Scholar
  20. 20.
    Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis, Stanford University (2002)Google Scholar
  21. 21.
    Fazel, M., Hindi, H., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the 2001 American Control Conference, vol. 6, pp. 4734–4739 (2001). DOI 10.1109/ACC.2001.945730Google Scholar
  22. 22.
    Fazel, M., Hindi, H., Boyd, S.P.: Rank minimization and applications in system theory. In: Proceedings of the 2004 American Control Conference, vol. 4, pp. 3273–3278 vol.4 (2004). DOI 10.23919/ACC.2004.1384521Google Scholar
  23. 23.
    Fukushima, M.: Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Mathematical Programming 53(1), 99–110 (1992). DOI 10.1007/BF01585696Google Scholar
  24. 24.
    Giselsson, P., Fält, M.: Envelope functions: Unifications and further properties. Journal of Optimization Theory and Applications (2018). DOI 10.1007/s10957-018-1328-zGoogle Scholar
  25. 25.
    Gowda, M.S.: Inverse and implicit function theorems for H-differentiable and semismooth functions. Optimization Methods and Software 19(5), 443–461 (2004). DOI 10.1080/10556780410001697668Google Scholar
  26. 26.
    Güler, O.: New proximal point algorithms for convex minimization. SIAM Journal on Optimization 2(4), 649–664 (1992). DOI 10.1137/0802032Google Scholar
  27. 27.
    Han, J., Sun, D.: Newton and quasi-Newton methods for normal maps with polyhedral sets. Journal of Optimization Theory and Applications 94(3), 659–676 (1997). DOI 10.1023/A:1022653001160Google Scholar
  28. 28.
    Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Grundlehren Text Editions. Springer Berlin Heidelberg (2004)Google Scholar
  29. 29.
    Horn, R.A., Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press (1994)Google Scholar
  30. 30.
    Kanzow, C., Ferenczi, I., Fukushima, M.: On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM Journal on Optimization 20(1), 297–320 (2009). DOI 10.1137/060657662Google Scholar
  31. 31.
    Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with O(1∕ε) iteration-complexity for cone programming. Mathematical Programming 126(1), 1–29 (2011). DOI 10.1007/s10107-008-0261-6Google Scholar
  32. 32.
    Lee, J.D., Sun, Y., Saunders, M.: Proximal Newton-type methods for minimizing composite functions. SIAM Journal on Optimization 24(3), 1420–1443 (2014). DOI 10.1137/130921428Google Scholar
  33. 33.
    Lemaréchal, C., Sagastizábal, C.: Practical aspects of the Moreau-Yosida regularization: Theoretical preliminaries. SIAM Journal on Optimization 7(2), 367–385 (1997). DOI 10.1137/S1052623494267127Google Scholar
  34. 34.
    Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis 2(1), 173–183 (1995)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Lewis, A.S.: Convex analysis on the Hermitian matrices. SIAM Journal on Optimization 6(1), 164–177 (1996). DOI 10.1137/0806009Google Scholar
  36. 36.
    Lewis, A.S.: Derivatives of spectral functions. Mathematics of Operations Research 21(3), 576–588 (1996)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Lewis, A.S., Sendov, H.S.: Twice differentiable spectral functions. SIAM Journal on Matrix Analysis and Applications 23(2), 368–386 (2001). DOI 10.1137/S089547980036838XGoogle Scholar
  38. 38.
    Li, W., Peng, J.: Exact penalty functions for constrained minimization problems via regularized gap function for variational inequalities. Journal of Global Optimization 37(1), 85–94 (2007). DOI 10.1007/s10898-006-9038-8Google Scholar
  39. 39.
    Li, X., Sun, D., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. ArXiv e-prints (2017)Google Scholar
  40. 40.
    Lions Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979). DOI 10.1137/0716071Google Scholar
  41. 41.
    Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM Journal on Matrix Analysis and Applications 31(3), 1235–1256 (2010). DOI 10.1137/090755436Google Scholar
  42. 42.
    Lu, Z.: Randomized block proximal damped Newton method for composite self-concordant minimization. SIAM Journal on Optimization 27(3), 1910–1942 (2017). DOI 10.1137/16M1082767Google Scholar
  43. 43.
    Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research 46(1), 157–178 (1993). DOI 10.1007/BF02096261Google Scholar
  44. 44.
    Maratos, N.: Exact penalty function algorithms for finite dimensional and control optimization problems (1978)Google Scholar
  45. 45.
    Martinet, B.: Brève communication. Régularisation d’inéquations variationnelles par approximations successives. Revue française d’informatique et de recherche opérationnelle. Série rouge 4(R3), 154–158 (1970)Google Scholar
  46. 46.
    Meng, F.: Moreau-Yosida regularization of Lagrangian-dual functions for a class of convex optimization problems. Journal of Global Optimization 44(3), 375 (2008). DOI 10.1007/s10898-008-9333-7Google Scholar
  47. 47.
    Meng, F., Sun, D., Zhao, G.: Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Mathematical Programming 104(2), 561–581 (2005). DOI 10.1007/s10107-005-0629-9Google Scholar
  48. 48.
    Meng, F., Zhao, G., Goh, M., De Souza, R.: Lagrangian-dual functions and Moreau-Yosida regularization. SIAM Journal on Optimization 19(1), 39–61 (2008). DOI 10.1137/060673746Google Scholar
  49. 49.
    Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM Journal on Control and Optimization 15(6), 959–972 (1977). DOI 10.1137/0315061Google Scholar
  50. 50.
    Mifflin, R., Qi, L., Sun, D.: Properties of the Moreau-Yosida regularization of a piecewise C 2 convex function. Mathematical Programming 84(2), 269–281 (1999). DOI 10.1007/s10107980029aGoogle Scholar
  51. 51.
    Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société Mathématique de France 93, 273–299 (1965)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Morita, T., Kanade, T.: A sequential factorization method for recovering shape and motion from image streams. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(8), 858–867 (1997). DOI 10.1109/34.608289Google Scholar
  53. 53.
    Nesterov, Y.: Introductory lectures on convex optimization: A basic course, vol. 87. Springer (2003)Google Scholar
  54. 54.
    Nesterov, Y.: Gradient methods for minimizing composite functions. Mathematical Programming 140(1), 125–161 (2013). DOI 10.1007/s10107-012-0629-5Google Scholar
  55. 55.
    Pang, J.S.: Error bounds in mathematical programming. Mathematical Programming 79(1), 299–332 (1997). DOI 10.1007/BF02614322Google Scholar
  56. 56.
    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014). DOI 10.1561/2400000003Google Scholar
  57. 57.
    Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: IEEE Conference on Decision and Control, pp. 2358–2363 (2013)Google Scholar
  58. 58.
    Patrinos, P., Sopasakis, P., Sarimveis, H.: A global piecewise smooth Newton method for fast large-scale model predictive control. Automatica 47(9), 2016–2022 (2011)MathSciNetCrossRefGoogle Scholar
  59. 59.
    Patrinos, P., Stella, L., Bemporad, A.: Forward-backward truncated Newton methods for convex composite optimization. ArXiv e-prints (2014)Google Scholar
  60. 60.
    Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Mathematical Programming 58(1), 353–367 (1993). DOI 10.1007/BF01581275Google Scholar
  61. 61.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52(3), 471–501 (2010). DOI 10.1137/070697835Google Scholar
  62. 62.
    Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22Nd International Conference on Machine Learning, ICML ’05, pp. 713–719. ACM, New York, NY, USA (2005). DOI 10.1145/1102351.1102441Google Scholar
  63. 63.
    Rockafellar, R.: Convex analysis (1970)Google Scholar
  64. 64.
    Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization 14(5), 877–898 (1976). DOI 10.1137/0314056Google Scholar
  65. 65.
    Rockafellar, R.T., Wets, R.J.B.: Variational analysis, vol. 317. Springer Science & Business Media (2011)Google Scholar
  66. 66.
    Scholtes, S.: Piecewise Differentiable Functions, pp. 91–111. Springer New York, New York, NY (2012). DOI 10.1007/978-1-4614-4340-7_4Google Scholar
  67. 67.
    Sopasakis, P., Freris, N., Patrinos, P.: Accelerated reconstruction of a compressively sampled data stream. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1078–1082 (2016). DOI 10.1109/EUSIPCO.2016.7760414Google Scholar
  68. 68.
    Srebro, N.: Learning with matrix factorizations. Ph.D. thesis, Cambridge, MA, USA (2004)Google Scholar
  69. 69.
    Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Computational Optimization and Applications 67(3), 443–487 (2017). DOI 10.1007/s10589-017-9912-yGoogle Scholar
  70. 70.
    Stella, L., Themelis, A., Patrinos, P.: Newton-type alternating minimization algorithm for convex optimization. IEEE Transactions on Automatic Control (2018). DOI 10.1109/TAC.2018.2872203Google Scholar
  71. 71.
    Stella, L., Themelis, A., Sopasakis, P., Patrinos, P.: A simple and efficient algorithm for nonlinear model predictive control. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1939–1944 (2017). DOI 10.1109/CDC.2017.8263933Google Scholar
  72. 72.
    Sun, D., Fukushima, M., Qi, L.: A computable generalized Hessian of the D-gap function and Newton-type methods for variational inequality problems. Complementarity and Variational Problems: State of the Art, MC Ferris and JS Pang (eds.), SIAM, Philadelphia, PA pp. 452–472 (1997)Google Scholar
  73. 73.
    Sun, D., Sun, J.: Semismooth matrix-valued functions. Mathematics of Operations Research 27(1), 150–169 (2002). DOI 10.1287/moor. Scholar
  74. 74.
    Themelis, A., Patrinos, P.: Douglas-Rachford splitting and ADMM for nonconvex optimization: tight convergence results. ArXiv e-prints (2017)Google Scholar
  75. 75.
    Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization 28(3), 2274–2303 (2018). DOI 10.1137/16M1080240Google Scholar
  76. 76.
    Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision 9(2), 137–154 (1992). DOI 10.1007/BF00129684Google Scholar
  77. 77.
    Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Tech. rep. (2008)Google Scholar
  78. 78.
    Ulbrich, M.: Optimization Methods in Banach Spaces, pp. 97–156. Springer Netherlands, Dordrecht (2009). DOI 10.1007/978-1-4020-8839-1_2Google Scholar
  79. 79.
    Yamashita, N., Taji, K., Fukushima, M.: Unconstrained optimization reformulations of variational inequality problems. Journal of Optimization Theory and Applications 92(3), 439–456 (1997). DOI 10.1023/A:1022660704427Google Scholar
  80. 80.
    Yang, Z.: A study on nonsymmetric matrix-valued functions. Master’s thesis, National University of Singapore (2009)Google Scholar
  81. 81.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 68(1), 49–67 (2006)Google Scholar
  82. 82.
    Zhou, G., Qi, L.: On the convergence of an inexact Newton-type method. Oper. Res. Lett. 34(6), 647–652 (2006). DOI 10.1016/j.orl.2005.11.001Google Scholar
  83. 83.
    Zhou, G., Toh, K.C.: Superlinear convergence of a Newton-type algorithm for monotone equations. Journal of Optimization Theory and Applications 125(1), 205–221 (2005). DOI 10.1007/s10957-004-1721-7Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Andreas Themelis
    • 1
  • Masoud Ahookhosh
    • 1
  • Panagiotis Patrinos
    • 1
    Email author
  1. 1.Department of Electrical Engineering (ESAT-STADIUS)KU LeuvenLeuvenBelgium

Personalised recommendations