# Conditional gradient algorithms for norm-regularized smooth convex optimization

## Abstract

Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone \(K\), a norm \(\Vert \cdot \Vert \) and a smooth convex function \(f\), we want either (1) to minimize the norm over the intersection of the cone and a level set of \(f\), or (2) to minimize over the cone the sum of \(f\) and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) \(\Vert \cdot \Vert \) is “too complicated” to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of \(K\) and the unit \(\Vert \cdot \Vert \)-ball. Motivating examples are given by the nuclear norm with \(K\) being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications.

## References

- 1.Andersen, E.D., Andersen, K.D.: The MOSEK optimization tools manual. http://www.mosek.com/fileadmin/products/6_0/tools/doc/pdf/tools.pdf
- 2.Bach, F., Jenatton, R., Mairal, J., Obozinski, G. et al.: Convex optimization with sparsity-inducing norms. In: Sra, S., Nowozin, S., Wright, S. J. (eds). Optimization for Machine Learning, pp. 19–53. MIT PressGoogle Scholar
- 3.Cai, J.-F., Candes, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim.
**20**(4), 1956–1982 (2008)MathSciNetCrossRefGoogle Scholar - 4.Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math.
**9**(6), 717–772 (2009)MathSciNetCrossRefzbMATHGoogle Scholar - 5.Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 1–38 (2013). doi: 10.1007/s10107-013-0725-1
- 6.Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. American Elsevier, Amsterdam (1970)Google Scholar
- 7.Dudik, M., Harchaoui, Z., Malick, J.: Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS (2012)Google Scholar
- 8.Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl.
**62**(2), 432–444 (1978)MathSciNetCrossRefzbMATHGoogle Scholar - 9.Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q.
**3**, 95–110 (1956)MathSciNetCrossRefGoogle Scholar - 10.Goldfarb, D., Ma, S., Wen, Z.: Solving low-rank matrix completion problems efficiently. In: Proceedings of 47th Annual Allerton Conference on Communication, Control, and Computing (2009)Google Scholar
- 11.Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: CVPR (2012)Google Scholar
- 12.Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for machine learning. In: NIPS Workshop on Optimization for Machine Learning. http://opt.kyb.tuebingen.mpg.de/opt12/papers.html (2012)
- 13.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Berlin (2008)Google Scholar
- 14.Hazan, E.: Sparse approximate solutions to semidefinite programs. In: Proceedings of the 8th Latin American Conference Theoretical Informatics, pp. 306–316 (2008)Google Scholar
- 15.Hearn, D., Lawphongpanich, S., Ventura, J.: Restricted simplicial decomposition: computation and extensions. Math. Program. Stud.
**31**, 99–118 (1987)MathSciNetCrossRefGoogle Scholar - 16.Holloway, C.: An extension of the Frank-Wolfe method of feasible directions. Math. Program.
**6**, 14–27 (1974)MathSciNetCrossRefzbMATHGoogle Scholar - 17.Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: ICML (2013)Google Scholar
- 18.Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In: ICML (2010)Google Scholar
- 19.Juditsky, A., Karzan, F.K., Nemirovski, A.: Randomized first order algorithms with applications to \(\ell _1\)-minimization. Math. Program.
**142**(1–2), 269–310 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - 20.Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, i: general purpose methods; ii: utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S. (eds). Optimization for Machine Learning, pp. 121–184. The MIT Press, Cambridge (2012)Google Scholar
- 21.Lan, G.: An optimal method for stochastic composite optimization. Math. Program.
**133**(1–2), 365–397 (2012)MathSciNetCrossRefzbMATHGoogle Scholar - 22.LemarÃchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program.
**69**(1–3), 111–147 (1995)Google Scholar - 23.Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program.
**128**, 321–353 (2011)MathSciNetCrossRefGoogle Scholar - 24.Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res.
**35**(1), 52–78 (2010)MathSciNetCrossRefzbMATHGoogle Scholar - 25.Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, New York (1983)Google Scholar
- 26.Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Berlin (2003)Google Scholar
- 27.Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program.
**140**(1), 125–161 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - 28.Nesterov, Y., Nemirovski, A.: On first-order algorithms for l 1/nuclear norm minimization. Acta Numer.
**22**, 509–575 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - 29.Pshenichnyj, B., Danilin, Y.: Numerical Methods in Extremal Problems. Mir, Moscow (1978)Google Scholar
- 30.Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev.
**52**(3), 471–501 (2010)MathSciNetCrossRefzbMATHGoogle Scholar - 31.Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Program. Comput.
**5**(2), 201–226 (2013)MathSciNetCrossRefGoogle Scholar - 32.Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D
**60**(1992)Google Scholar - 33.Shalev-Shwartz, S., Gonen, A., Shamir, O.: Large-scale convex minimization with a low-rank constraint. In: ICML (2011)Google Scholar
- 34.Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2010)Google Scholar
- 35.Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: COLT (2005)Google Scholar
- 36.Ventura, J.A., Hearn, D.W.: Restricted simplicial decomposition for convex constrained problems. Math. Program.
**59**, 71–85 (1993)MathSciNetCrossRefzbMATHGoogle Scholar - 37.Yang, J., Yuan, X.: Linearized augmented lagrangian and alternating direction methods for nuclear norm minimization. Math. Comput.
**82**(281), 301–329 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - 38.Zhang, X., Yu, Y., Schuurmans, D.: Accelerated training for matrix-norm regularization: a boosting approach. In: NIPS, pp. 2915–2923 (2012)Google Scholar
- 39.Zibulevski, M., Narkiss, G.: Sequential subspace optimization method for large-scale unconstrained problems. Technical Report CCIT No 559, Faculty of Electrical engineering, Technion (2005)Google Scholar