Mathematical Programming

, Volume 152, Issue 1–2, pp 75–112 | Cite as

Conditional gradient algorithms for norm-regularized smooth convex optimization

  • Zaid Harchaoui
  • Anatoli Juditsky
  • Arkadi Nemirovski
Full Length Paper Series A

Abstract

Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone \(K\), a norm \(\Vert \cdot \Vert \) and a smooth convex function \(f\), we want either (1) to minimize the norm over the intersection of the cone and a level set of \(f\), or (2) to minimize over the cone the sum of \(f\) and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) \(\Vert \cdot \Vert \) is “too complicated” to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of \(K\) and the unit \(\Vert \cdot \Vert \)-ball. Motivating examples are given by the nuclear norm with \(K\) being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications.

References

  1. 1.
    Andersen, E.D., Andersen, K.D.: The MOSEK optimization tools manual. http://www.mosek.com/fileadmin/products/6_0/tools/doc/pdf/tools.pdf
  2. 2.
    Bach, F., Jenatton, R., Mairal, J., Obozinski, G. et al.: Convex optimization with sparsity-inducing norms. In: Sra, S., Nowozin, S., Wright, S. J. (eds). Optimization for Machine Learning, pp. 19–53. MIT PressGoogle Scholar
  3. 3.
    Cai, J.-F., Candes, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 1–38 (2013). doi: 10.1007/s10107-013-0725-1
  6. 6.
    Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. American Elsevier, Amsterdam (1970)Google Scholar
  7. 7.
    Dudik, M., Harchaoui, Z., Malick, J.: Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS (2012)Google Scholar
  8. 8.
    Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Goldfarb, D., Ma, S., Wen, Z.: Solving low-rank matrix completion problems efficiently. In: Proceedings of 47th Annual Allerton Conference on Communication, Control, and Computing (2009)Google Scholar
  11. 11.
    Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: CVPR (2012)Google Scholar
  12. 12.
    Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for machine learning. In: NIPS Workshop on Optimization for Machine Learning. http://opt.kyb.tuebingen.mpg.de/opt12/papers.html (2012)
  13. 13.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Berlin (2008)Google Scholar
  14. 14.
    Hazan, E.: Sparse approximate solutions to semidefinite programs. In: Proceedings of the 8th Latin American Conference Theoretical Informatics, pp. 306–316 (2008)Google Scholar
  15. 15.
    Hearn, D., Lawphongpanich, S., Ventura, J.: Restricted simplicial decomposition: computation and extensions. Math. Program. Stud. 31, 99–118 (1987)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Holloway, C.: An extension of the Frank-Wolfe method of feasible directions. Math. Program. 6, 14–27 (1974)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: ICML (2013)Google Scholar
  18. 18.
    Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In: ICML (2010)Google Scholar
  19. 19.
    Juditsky, A., Karzan, F.K., Nemirovski, A.: Randomized first order algorithms with applications to \(\ell _1\)-minimization. Math. Program. 142(1–2), 269–310 (2013)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, i: general purpose methods; ii: utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S. (eds). Optimization for Machine Learning, pp. 121–184. The MIT Press, Cambridge (2012)Google Scholar
  21. 21.
    Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1–2), 365–397 (2012)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    LemarÃchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1–3), 111–147 (1995)Google Scholar
  23. 23.
    Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. 128, 321–353 (2011)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35(1), 52–78 (2010)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, New York (1983)Google Scholar
  26. 26.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Berlin (2003)Google Scholar
  27. 27.
    Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Nesterov, Y., Nemirovski, A.: On first-order algorithms for l 1/nuclear norm minimization. Acta Numer. 22, 509–575 (2013)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Pshenichnyj, B., Danilin, Y.: Numerical Methods in Extremal Problems. Mir, Moscow (1978)Google Scholar
  30. 30.
    Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Program. Comput. 5(2), 201–226 (2013)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60 (1992)Google Scholar
  33. 33.
    Shalev-Shwartz, S., Gonen, A., Shamir, O.: Large-scale convex minimization with a low-rank constraint. In: ICML (2011)Google Scholar
  34. 34.
    Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2010)Google Scholar
  35. 35.
    Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: COLT (2005)Google Scholar
  36. 36.
    Ventura, J.A., Hearn, D.W.: Restricted simplicial decomposition for convex constrained problems. Math. Program. 59, 71–85 (1993)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Yang, J., Yuan, X.: Linearized augmented lagrangian and alternating direction methods for nuclear norm minimization. Math. Comput. 82(281), 301–329 (2013)MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    Zhang, X., Yu, Y., Schuurmans, D.: Accelerated training for matrix-norm regularization: a boosting approach. In: NIPS, pp. 2915–2923 (2012)Google Scholar
  39. 39.
    Zibulevski, M., Narkiss, G.: Sequential subspace optimization method for large-scale unconstrained problems. Technical Report CCIT No 559, Faculty of Electrical engineering, Technion (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2014

Authors and Affiliations

  • Zaid Harchaoui
    • 1
  • Anatoli Juditsky
    • 2
  • Arkadi Nemirovski
    • 3
  1. 1.LJK, InriaSaint-IsmierFrance
  2. 2.LJK, Université Grenoble AlpesGrenoble Cedex 9France
  3. 3.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations