Convergence of the Exponentiated Gradient Method with Armijo Line Search

  • Yen-Huan LiEmail author
  • Volkan Cevher


Consider the problem of minimizing a convex differentiable function on the probability simplex, spectrahedron, or set of quantum density matrices. We prove that the exponentiated gradient method with Armijo line search always converges to the optimum, if the sequence of the iterates possesses a strictly positive limit point (element-wise for the vector case, and with respect to the Löwner partial ordering for the matrix case). To the best of our knowledge, this is the first convergence result for a mirror descent-type method that only requires differentiability. The proof exploits self-concordant likeness of the log-partition function, which is of independent interest.


Exponentiated gradient method Armijo line search Self-concordant likeness Peierls–Bogoliubov inequality 

Mathematics Subject Classification




We thank Ya-Ping Hsieh for his comments. This work was supported by SNF 200021-146750 and ERC project time-data 725594.


  1. 1.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Hohage, T., Werner, F.: Inverse problems with Poisson data: statistical regularization theory, applications and algorithms. Inverse Probl. 32, 093001 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Koltchinskii, V.: von Neumann entropy penalization and low-rank matrix estimation. Ann. Stat. 39(6), 2936–2973 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Paris, M., Řeháček, J. (eds.): Quantum State Estimation. Springer, Berlin (2004)zbMATHGoogle Scholar
  5. 5.
    Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)Google Scholar
  6. 6.
    Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Auslender, A., Teboulle, M.: Interior gradient and epsilon-subgradient descent methods for constrained convex minimization. Math. Oper. Res. 29(1), 1–26 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8, 121–164 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132, 1–63 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Helmbold, D.P., Shapire, R.E., Singer, Y., Warmuth, M.K.: On-line portfolio selection using multiplicative updates. Math. Finance 8(4), 325–347 (1998)CrossRefzbMATHGoogle Scholar
  13. 13.
    Tsuda, K., Rätsch, G., Warmuth, M.K.: Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res. 6, 995–1018 (2005)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Lu, H., Freund, R.M., Nesterov, Y.: Relatively-smooth convex optimization by first-order methods, and applications. arXiv:1610.05708v1 (2016)
  15. 15.
    Collins, M., Globerson, A., Koo, T., Carreras, X., Bartlett, P.L.: Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. J. Mach. Learn. Res. 9, 1775–1822 (2008)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Doljansky, M., Teboulle, M.: An interior proximal algorithm and the exponential multiplier method for semidefinite programming. SIAM J. Optim. 9(1), 1–13 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Bertsekas, D.P.: On the Goldstein–Levitin–Polyak gradient projection method. IEEE Trans. Autom. Control AC–21(2), 174–184 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Gafni, E.M., Bertsekas, D.P.: Convergence of a Gradient Projection Method. LIDS-P-1201, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge (1982)Google Scholar
  19. 19.
    Salzo, S.: The variable metric forward-backward splitting algorithms under mild differentiability assumptions. SIAM J. Optim. 27(4), 2153–2181 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia (1994)CrossRefzbMATHGoogle Scholar
  21. 21.
    Blume-Kohout, R.: Hedged maximum likelihood quantum state estimation. Phys. Rev. Lett. 105, 200504 (2010)CrossRefGoogle Scholar
  22. 22.
    Decarreau, A., Hilhorst, D., Lemaréchal, C., Navaza, J.: Dual methods in entropy maximization. application to some problems in crystallography. SIAM J. Optim. 2(2), 173–197 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Hiai, F., Ohya, M., Tsukada, M.: Sufficiency, KMS condition and relative entropy in von Neumann algebras. Pac. J. Math. 96(1), 99–109 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Bertsekas, D.P.: Nonlinear Programming, vol. 3. Athena Sci, Belmont (2016)zbMATHGoogle Scholar
  25. 25.
    Bach, F.: Self-concordant analysis for logistic regression. Electron. J. Stat. 4, 384–414 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Bach, F.: Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. J. Mach. Learn. Res. 15, 595–627 (2014)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Tran-Dinh, Q., Li, Y.H., Cevher, V.: Composite convex minimization involving self-concordant-like cost functions. In: Modelling, Computation and Optimization in Information Systems and Management Sciences, pp. 155–168. Springer, Cham (2015)Google Scholar
  28. 28.
    Ohya, M., Petz, D.: Quantum Entropy and Its Use. Springer, Berlin (1993)CrossRefzbMATHGoogle Scholar
  29. 29.
    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)CrossRefzbMATHGoogle Scholar
  30. 30.
    Hradil, Z.: Quantum-state estimation. Phys. Rev. A 55(3), R1561 (1997)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Byrne, C., Censor, Y.: Proximity function minimization using multiple Bregman projections, with application to split feasibility and Kullback–Leibler distance minimization. Ann. Oper. Res. 105, 77–98 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    MacLean, L.C., Thorp, E.O., Ziemba, W.T. (eds.): The Kelly Capital Growth Investment Criterion. World Scientific, Singapore (2012)Google Scholar
  33. 33.
    Odor, G., Li, Y.H., Yurtsever, A., Hsieh, Y.P., El Halabi, M., Tran-Dinh, Q., Cevher, V.: Frank-Wolfe works for non-Lipschitz continuous gradient objectives: scalable Poisson phase retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6230–6234 (2016)Google Scholar
  34. 34.
    Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography. J. Am. Stat. Assoc. 80(389), 8–20 (1985)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Laboratory for Information and Inference SystemsÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
  2. 2.Department of Computer Science and Information Engineering and Graduate Institute of Networking and MultimediaNational Taiwan UniversityTaipeiTaiwan

Personalised recommendations