Abstract
We consider convex optimization problems which are widely used as convex relaxations for low-rank matrix recovery problems. In particular, in several important problems, such as phase retrieval and robust PCA, the underlying assumption in many cases is that the optimal solution is rank-one. In this paper we consider a simple and natural sufficient condition on the objective so that the optimal solution to these relaxations is indeed unique and rank-one. Mainly, we show that under this condition, the standard Frank–Wolfe method with line-search (i.e., without any tuning of parameters whatsoever), which only requires a single rank-one SVD computation per iteration, finds an \(\epsilon \)-approximated solution in only \(O(\log {1/\epsilon })\) iterations (as opposed to the previous best known bound of \(O(1/\epsilon )\)), despite the fact that the objective is not strongly convex. We consider several variants of the basic method with improved complexities, as well as an extension motivated by robust PCA, and finally, an extension to nonsmooth problems.
Similar content being viewed by others
Notes
Here we note that while some problems, such as phase retrieval, are usually formulated as optimization over matrices with complex entries, our results are applicable in a straightforward manner to optimization over the corresponding spectrahedron \(\{{\mathbf {X}}\in {\mathbb {C}}^{n\times n} ~|~{\mathbf {X}}\succeq 0,~\text {Tr}({\mathbf {X}})=1\}\). However, for simplicity of presentation we focus on matrices with real entries.
In the close proximity of an optimal solution it is quite plausible that only low-rank SVD computations will be needed to compute the proximal step, see for instance our recent work [16].
Extending this discussion to the case in which these eigenvalues are only approximated up to sufficient precision is straightforward
This quantity is known as the duality gap and it is indeed an upper-bound on the approximation error since \(f(\cdot )\) is convex, see [23].
Here we make an implicit assumption that it is computationally efficient to compute Euclidean projections onto the set \(\mathcal {K}\).
Recall that according to the previous lemma the gradient vector is constant over the set of optimal solutions and thus, this is equivalent to assuming the eigen-gap holds for some optimal solution.
The bound on the approximation error is verified by computing the duality gap, which is an upper-bound on the approximation error w.r.t. the function value (see for instance [23])
We note this is a common initialization for Frank–Wolfe, and actually is equivalent to initializing Frank–Wolfe with \(\tau \cdot {\mathbf {x}}{\mathbf {x}}^{\top }\), and running for one iteration with the classical step-size rule \(\eta _t = \frac{2}{t+1}\)
References
Allen-Zhu, Z., Hazan, E., Hu, W., Li, Y.: Linear convergence of a Frank-Wolfe type algorithm over trace-norm balls. Adv. Neural. Inf. Process. Syst. 30, 6192–6201 (2017)
Beck, A.: First-order Methods in Optimization. SIAM (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)
Bhojanapalli, S., Neyshabur, B., Srebro, N.: Global optimality of local search for low rank matrix recovery. In Advances in Neural Information Processing Systems, pp 3873–3881 (2016)
Candes, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM Rev. 57(2), 225–251 (2015)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)
Chen, Y., Wainwright, M. J.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees. arXiv:1509.03025 (2015)
Demyanov, V.F., Rubinov, A.M.: Approximate Methods in Optimization Problems. Elsevier, Amsterdam (1970)
Ding, L., Fei, Y., Xu, Q., Yang, C.: Spectral Frank-Wolfe algorithm: strict complementarity and linear convergence. In: International conference on machine learning, pp 2535–2544. PMLR, (2020)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Freund, R.M., Grigas, P., Mazumder, R.: An extended Frank–Wolfe method with in-face directions, and its application to low-rank matrix completion. SIAM J. Optim. 27(1), 319–346 (2017)
Garber, D.: Faster projection-free convex optimization over the spectrahedron. In Advances in Neural Information Processing Systems, pp 874–882 (2016)
Garber, D.: Linear convergence of Frank-Wolfe for rank-one matrix recovery without strong convexity. (2019)
Garber, D.: On the convergence of projected-gradient methods with low-rank projections for smooth convex minimization over trace-norm balls and related problems. arXiv: abs/1902.01644, (2019)
Garber, D., Hazan, E.: Faster rates for the Frank–Wolfe method over strongly-convex sets. In 32nd International Conference on Machine Learning, ICML 2015, (2015)
Garber, D., Hazan, E., Ma, T.: Online learning of eigenvectors. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 pp 560–568, (2015)
Garber, D., Kaplan, A.: Fast stochastic algorithms for low-rank and nonsmooth matrix problems. In The 22nd International conference on artificial intelligence and statistics, AISTATS 2019, 16–18 Naha, Okinawa, Japan, pp 286–294, (2019)
Garber, D., Sabach, S., Kaplan, A.: Fast generalized conditional gradient method with applications to matrix recovery problems. arXiv: abs/1802.05581 (2018)
Ge, R., Lee, J. D., Ma, T.: Matrix completion has no spurious local minimum. In Advances in Neural Information Processing Systems, pp 2973–2981, (2016)
Golub, G.H., Van, L., Charles, F.: Matrix Computations. JHU Press, Baltimore (2012)
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In Proceedings of the 30th International Conference on Machine Learning, ICML (2013)
Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning, ICML (2010)
Jain, P., Meka, R., Dhillon, I.: Guaranteed rank minimization via singular value projection. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems pp 937–945 (2010)
Jain, P., Tewari, A., Kar, P.: On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems, pp 685–693, (2014)
Laue, S.: A hybrid algorithm for convex semidefinite optimization. In Proceedings of the 29th International Conference on International Conference on Machine Learning, pages 1083–1090. Omnipress, (2012)
Mu, C., Zhang, Y., Wright, J., Goldfarb, D.: Scalable robust matrix recovery: Frank–Wolfe meets proximal methods. SIAM J. Sci. Comput. 38(5), A3291–A3317 (2016)
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175(1–2), 69–107 (2019)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2013)
Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In Advances in Neural Information Processing Systems, pp 2796–2804, (2013)
Netrapalli, P., Niranjan, U. N., Sanghavi, S., Anandkumar, A., Prateek, J.: Non-convex robust pca. In Advances in Neural Information Processing Systems, pp 1107–1115, (2014)
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
Richard, E., Savalle, P.-A., Vayatis, N.: Estimation of simultaneously sparse and low rank matrices. In Proceedings of the 29th International Conference on Machine Learning, (2012)
Tropp, J. A.: Convex recovery of a structured signal from independent random linear measurements. In Sampling Theory, a Renaissance, pages 67–101. Springer, (2015)
Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in neural information processing systems, pp 2080–2088, (2009)
Yi, X., Park, D., Chen, Y.: Constantine Caramanis. Fast algorithms for robust pca via gradient descent. In Advances in neural information processing systems, pp 4152–4160, (2016)
Yurtsever, A., Madeleine U., Tropp, J. A., Cevher, V.,: Sketchy decisions: Convex low-rank matrix optimization with optimal storage. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20–22, Fort Lauderdale, FL, USA, pp 1188–1196. (2017)
Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165(2), 689–728 (2017)
Acknowledgements
We would like to thank both of the anonymous referees whose many excellent comments and suggestions have significantly improved the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A Proof of Lemma 2
The lemma is an adaptation of Lemma 3 in [16] (which considers optimization over trace-norm balls). We restate and prove a slightly more general version of the lemma.
Lemma 10
Let \(f:\mathbb {S}^n\rightarrow \mathbb {R}\) be \(\beta \)-smooth and convex. Let \({\mathbf {X}}^*\in \mathcal {S}_n\) be an optimal solution of rank r to the optimization problem \(\min _{{\mathbf {X}}\in \mathcal {S}_n}f({\mathbf {X}})\). Let \(\lambda _1,\dots ,\lambda _n\) denote the eigenvalues of \(\nabla {}f({\mathbf {X}}^*)\) in non-increasing order. Let \(\zeta \) be a non-negative scalar. It holds that
where \((1+\zeta )\mathcal {S}_n = \{(1+\zeta ){\mathbf {X}}~|~{\mathbf {X}}\in \mathcal {S}_n\}\), and \(\varPi _{(1+\zeta )\mathcal {S}_n}[\cdot ]\) denotes the Euclidean projection onto the convex set \((1+\zeta )\mathcal {S}_n\).
Proof
Let us write the eigen-decomposition of \({\mathbf {X}}^*\) as \({\mathbf {X}}^*=\sum _{i=1}^r\lambda _i^*{\mathbf {v}}_i{\mathbf {v}}_i^{\top }\). It follows from the optimality of \({\mathbf {X}}^*\) that for all \(i\in [r]\), \({\mathbf {v}}_i\) is also an eigenvector of \(\nabla {}f({\mathbf {X}}^*)\) which corresponds to the smallest eigenvalue \(\lambda _n\) (see Lemma 7 in [16]). Thus, if we let \(\rho _1,\dots ,\rho _n\) denote the eigenvalues (in non-increasing order) of \({\mathbf {Y}}:= {\mathbf {X}}^*-\beta ^{-1}\nabla {}f({\mathbf {X}}^*)\), it holds that
Recall that \(\sum _{i=1}^r\lambda _i^* =1\) and \(\lambda _{r+1}^* = 0\).
It is well known that for any matrix \({\mathbf {M}}\in \mathbb {S}^n\) with eigen-decomposition \({\mathbf {M}}=\sum _{i=1}^n\sigma _i{\mathbf {u}}_i{\mathbf {u}}_i^{\top }\), the projection of \({\mathbf {M}}\) onto the set \((1+\zeta )\mathcal {S}_n\), for any \(\zeta \ge 0\) is given by
where \(\sigma \in \mathbb {R}\) is the unique scalar such that \(\sum _{i=1}^n\max \{0,~\sigma _i-\sigma \} = 1+\zeta \).
Now, we can see that \(\text {rank}(\varPi _{(1+\zeta )\mathcal {S}_n}[{\mathbf {Y}}]) \le r\) if and only if \(\sigma \ge \rho _{r+1} = -\beta ^{-1}\lambda _{n-r}\). Thus, if \(\text {rank}(\varPi _{(1+\zeta )\mathcal {S}_n}[{\mathbf {Y}}]) \le r\) then it must hold that \(\sigma \ge -\beta ^{-1}\lambda _{n-r}\) which implies that
However, (37) can hold only if \(\zeta \le \beta {}r(\lambda _{n-r}-\lambda _n)\). Thus, we have \(\text {rank}(\varPi _{(1+\zeta )\mathcal {S}_n}[{\mathbf {Y}}]) \le r \Longrightarrow \zeta \le \beta {}r(\lambda _{n-r}-\lambda _n)\).
On the other-hand, if \(\text {rank}(\varPi _{(1+\zeta )\mathcal {S}_n}[{\mathbf {Y}}]) > r\) then it must hold that \(\sigma < -\beta ^{-1}\lambda _{n-r}\) which, using the same arguments as above, implies that
We see that (38) can hold only if \(\zeta > \beta {}r(\lambda _{n-r}-\lambda _n)\). Thus, we also have \(\text {rank}(\varPi _{(1+\zeta )\mathcal {S}_n}[{\mathbf {Y}}])> r \Longrightarrow \zeta > \beta {}r(\lambda _{n-r}-\lambda _n)\), and the lemma follows. \(\square \)
1.2 B Proof of Lemma 3
We first restate the lemma and then prove it.
Lemma 11
Let \(f:\mathbb {S}^n\rightarrow \mathbb {R}\) be \(\beta \)-smooth and convex. Suppose that Assumption 1 holds w.r.t. \(f(\cdot )\) with some parameter \(\delta >0\). Let \({\tilde{f}}:\mathbb {S}^n\rightarrow \mathbb {R}\) be differentiable and convex, and suppose that \(\sup _{{\mathbf {X}}\in \mathcal {S}_n}\Vert {\nabla {}f({\mathbf {X}}) - \nabla {\tilde{f}}({\mathbf {X}})}\Vert _F \le \nu \), for some \(\nu > 0\). Then, for \(\nu < \frac{1}{2}(1+\frac{2\beta }{\delta })^{-1}\delta \), Assumption 1 holds w.r.t. the function \({\tilde{f}}(\cdot )\) with parameter \({\tilde{\delta }} = \delta - 2\nu (1+\frac{2\beta }{\delta }) > 0\).
Proof
Let \({\mathbf {X}}^*\) and \({\tilde{{\mathbf {X}}}}^*\) denote minimizers of \(f(\cdot )\) and \({\tilde{f}}(\cdot )\) over \(\mathcal {S}_n\), respectively. Since Assumption 1 holds w.r.t. \(f(\cdot )\), using the quadratic growth result of Lemma 4 we have that
where (a) follows from convexity of \(f(\cdot )\), (b) follows from optimality of \({\tilde{{\mathbf {X}}}}^*\) w.r.t. \({\tilde{f}}(\cdot )\), and (c) follows from the Cauchy-Schwarz inequality and the assumption of the lemma that \(\sup _{{\mathbf {X}}\in \mathcal {S}_n}\Vert {\nabla {}f({\mathbf {X}}) - \nabla {\tilde{f}}({\mathbf {X}})}\Vert _F \le \nu \).
Thus, we get that \(\Vert {{\tilde{{\mathbf {X}}}}^*-{\mathbf {X}}^*}\Vert _F \le \frac{2\nu }{\delta }\).
Using Weyl’s inequality for the eigenvalues we have that
Using the smoothness of \(f(\cdot )\) and the assumption \(\sup _{{\mathbf {X}}\in \mathcal {S}_n}\Vert {\nabla {}f({\mathbf {X}}) - \nabla {\tilde{f}}({\mathbf {X}})}\Vert _F \le \nu \), we have that
Plugging-in (40) into (39) and rearranging we obtain
Thus, Assumption 1 indeed holds w.r.t. \({\tilde{f}}(\cdot )\) whenever \(\nu < \frac{1}{2}(1+\frac{2\beta }{\delta })^{-1}\delta \). \(\square \)
Rights and permissions
About this article
Cite this article
Garber, D. Linear convergence of Frank–Wolfe for rank-one matrix recovery without strong convexity. Math. Program. 199, 87–121 (2023). https://doi.org/10.1007/s10107-022-01821-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01821-8
Keywords
- Conditional gradient method
- Frank–Wolfe algorithm
- Convex optimization
- Robust PCA
- Phase retrieval
- Low-rank matrix recovery
- Low-rank optimization
- Semidefinite programming
- Nuclear norm minimization