New analysis and results for the Frank–Wolfe method

Freund, Robert M.; Grigas, Paul

doi:10.1007/s10107-014-0841-6

New analysis and results for the Frank–Wolfe method

Full Length Paper
Series A
Published: 28 November 2014

Volume 155, pages 199–230, (2016)
Cite this article

Mathematical Programming Submit manuscript

Robert M. Freund¹ &
Paul Grigas²

2327 Accesses
66 Citations
Explore all metrics

Abstract

We present new results for the Frank–Wolfe method (also known as the conditional gradient method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size rules and computational guarantees that depend naturally on the warm-start quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the so-called FW gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds

Article 30 January 2022

Perturbed Fenchel duality and first-order methods

Article 03 February 2022

References

Clarkson, K.L.: Coresets, sparse Greedy approximation, and the Frank–Wolfe algorithm. In: 19th ACM-SIAM Symposium on Discrete Algorithms, pp. 922–931 (2008)
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)
Article MATH MathSciNet Google Scholar
Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. American Elsevier Publishing, New York (1970)
Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.E.: First-order methods of smooth convex optimization with inexact oracle. Technical Report, CORE, Louvain-la-Neuve, Belgium (2013)
Dudík, M., Harchaoui, Z., Malick, J.: Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS (2012)
Dunn, J.: Rates of convergence for conditional gradient algorithms near singular and nonsinglular extremals. SIAM J. Control Optim. 17(2), 187–211 (1979)
Article MATH MathSciNet Google Scholar
Dunn, J.: Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5), 473–487 (1980)
Article MATH MathSciNet Google Scholar
Dunn, J., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62, 432–444 (1978)
Article MATH MathSciNet Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 95–110 (1956)
Article MathSciNet Google Scholar
Freund, R.M., Grigas, P., Mazumder, R.: Boosting methods in regression: computational guarantees and regularization via subgradient optimization. Technical Report, MIT Operations Research Center (2014)
Giesen, J., Jaggi, M., Laue, S.: Optimizing over the growing spectrahedron. In: ESA 2012: 20th Annual European Symposium on Algorithms (2012)
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. (2014). doi:10.1007/s10107-014-0778-9
Google Scholar
Hazan, E.L.: Sparse approximate solutions to semidefinite programs. In: Proceedings of Theoretical Informatics, 8th Latin American Symposium (LATIN), pp. 306–316 (2008)
Hearn, D.: The gap function of a convex program. Oper. Res. Lett. 1(2), 67–71 (1982)
Article MATH MathSciNet Google Scholar
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization, In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 427–435 (2013)
Khachiyan, L.: Rounding of polytopes in the real number model of computation. Math. Oper. Res. 21(2), 307–320 (1996)
Article MATH MathSciNet Google Scholar
Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate frank-wolfe optimization for structural svms. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13) (2013)
Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. Department of Industrial and Systems Engineering, University of Florida, Gainesville, Florida. Technical Report (2013)
Levitin, E., Polyak, B.: Constrained minimization methods. USSR Comput. Math. Math. Phys. 6, 1 (1966)
Article Google Scholar
Nemirovski, A.: Private communication (2007)
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston (2003)
Google Scholar
Nesterov, Y.E.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MATH MathSciNet Google Scholar
Nesterov, Y.E.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009)
Article MATH MathSciNet Google Scholar
Polyak, B.: Introduction to Optimization. Optimization Software, Inc., New York (1987)
Google Scholar
Temlyakov, V.: Greedy approximation in convex optimization. University of South Carolina. Technical report (2012)

Download references

Author information

Authors and Affiliations

MIT Sloan School of Management, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
Robert M. Freund
MIT Operations Research Center, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
Paul Grigas

Authors

Robert M. Freund
View author publications
You can also search for this author in PubMed Google Scholar
Paul Grigas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Grigas.

Additional information

R. M. Freund: This author’s research is supported by AFOSR Grant No. FA9550-11-1-0141 and the MIT-Chile-Pontificia Universidad Católica de Chile Seed Fund.

P. Grigas: This author’s research has been partially supported through NSF Graduate Research Fellowship No. 1122374 and the MIT-Chile-Pontificia Universidad Católica de Chile Seed Fund.

Appendix

Proposition 7.1

Let $B_k^w$ and $B_k^m$ be as defined in Sect. 2. Suppose that there exists an open set $\hat{Q} \subseteq E$ containing $Q$ such that $\phi (x,\cdot )$ is differentiable on $\hat{Q}$ for each fixed $x \in P$, and that $h(\cdot )$ has the minmax structure (4) on $\hat{Q}$ and is differentiable on $\hat{Q}$. Then it holds that:

$$\begin{aligned} B_k^w \ge B_k^m \ge h^*. \end{aligned}$$

Furthermore, it holds that $B_k^w = B_k^m$ in the case when $\phi (x,\cdot )$ is linear in the variable $\lambda $.

Proof

It is simple to show that $B_k^m \ge h^*$. At the current iterate $\lambda _k \in Q$, define $x_k \in \arg \min \limits _{x \in P}\phi (x, \lambda _k)$. Then from the definition of $h(\lambda )$ and the concavity of $\phi (x_k, \cdot )$ we have:

$$\begin{aligned} h(\lambda )&\le \phi (x_k, \lambda ) \le \phi (x_k, \lambda _k) + \nabla _\lambda \phi (x_k, \lambda _k)^T(\lambda - \lambda _k) \nonumber \\&= h(\lambda _k) + \nabla _\lambda \phi (x_k, \lambda _k)^T(\lambda - \lambda _k), \end{aligned}$$

(55)

whereby $\nabla _\lambda \phi (x_k, \lambda _k)$ is a subgradient of $h(\cdot )$ at $\lambda _k$. It then follows from the differentiability of $h(\cdot )$ that $\nabla h(\lambda _k) = \nabla _\lambda \phi (x_k, \lambda _k)$, and this implies from (55) that:

$$\begin{aligned} \phi (x_k, \lambda ) \le h(\lambda _k) + \nabla h(\lambda _k)^T(\lambda - \lambda _k). \end{aligned}$$

(56)

Therefore we have:

$$\begin{aligned} B_k^m = f(x_k) = \max _{\lambda \in Q}\{\phi (x_k, \lambda )\} \le \max _{\lambda \in Q}\{h(\lambda _k) + \nabla h(\lambda _k)^T(\lambda - \lambda _k)\} = B_k^w. \end{aligned}$$

If $\phi (x,\lambda )$ is linear in $\lambda $, then the second inequality in (55) is an equality, as is (56).

Proposition 7.2

Let $C_{h, Q}, \mathrm {Diam}_Q$, and $L_{h,Q}$ be as defined in Sect. 2. Then it holds that $C_{h, Q} \le L_{h,Q}(\mathrm {Diam}_Q)^2 $.

Proof

Since $Q$ is convex, we have $\lambda + \alpha (\tilde{\lambda }- \lambda ) \in Q$ for all $\lambda , \tilde{\lambda } \in Q$ and for all $\alpha \in [0,1]$. Since the gradient of $h(\cdot )$ is Lipschitz, from the fundamental theorem of calculus we have:

$$\begin{aligned} h(\lambda + \alpha (\tilde{\lambda }- \lambda ))&= h(\lambda ) + \nabla h(\lambda )^T(\alpha (\tilde{\lambda }- \lambda )) \\&+\int \limits _0^1[\nabla h(\lambda + t \alpha (\tilde{\lambda }- \lambda )) - \nabla h (\lambda )]^T(\alpha (\tilde{\lambda }- \lambda )) dt \\&\ge h(\lambda ) + \nabla h(\lambda )^T(\alpha (\tilde{\lambda }- \lambda )) \\&-\int \limits _0^1 \Vert \nabla h(\lambda + t \alpha (\tilde{\lambda }- \lambda )) - \nabla h (\lambda )\Vert _*(\alpha )\Vert \tilde{\lambda }- \lambda \Vert dt \\&\ge h(\lambda ) \!+\! \nabla h(\lambda )^T(\alpha (\tilde{\lambda }- \lambda )) \!-\!\int \limits _0^1 L_{h, Q} \Vert (t \alpha )(\tilde{\lambda }- \lambda )\Vert ( \alpha )\Vert \tilde{\lambda }\!-\! \lambda \Vert dt \\&= h(\lambda ) + \nabla h(\lambda )^T(\alpha (\tilde{\lambda }- \lambda )) - \frac{\alpha ^2}{2}L_{h, Q}\Vert (\tilde{\lambda }- \lambda )\Vert ^2 \\&\ge h(\lambda ) + \nabla h(\lambda )^T(\alpha (\tilde{\lambda }- \lambda )) - \frac{\alpha ^2}{2}L_{h, Q}(\mathrm {Diam}_Q)^2, \end{aligned}$$

whereby it follows that $C_{h, Q} \le L_{h,Q}(\mathrm {Diam}_Q)^2 $.

Proposition 7.3

For $k\ge 0$ the following inequality holds:

$$\begin{aligned} \sum _{i=0}^k \frac{i+1}{i+2} \le \frac{(k+1)(k+2)}{k+4}. \end{aligned}$$

Proof

The inequality above holds at equality for $k=0$. By induction, suppose the inequality is true for some given $k\ge 0$, then

$$\begin{aligned} \sum _{i=0}^{k+1} \frac{i+1}{i+2}&= \sum _{i=0}^{k} \frac{i+1}{i+2} + \frac{k+2}{k+3} \nonumber \\&\le \frac{(k+1)(k+2)}{k+4} + \frac{k+2}{k+3} \nonumber \\&= (k+2)\left[ \frac{k^2+5k+7}{k^2 + 7k +12}\right] . \end{aligned}$$

(57)

Now notice that

$$\begin{aligned}&(k^2 +5k+7)(k+5) = k^3 + 10k^2 +32k + 35 < k^3 + 10k^2 +\,33k + 36 \\&\quad = (k^2 + 7k +12)(k+3), \end{aligned}$$

which combined with (57) completes the induction.$\square $

Proposition 7.4

For $k\ge 1$ let $\bar{\alpha }:= 1-\frac{1}{\root k \of {k+1}}$. Then the following inequalities holds:

(i)
$\displaystyle \frac{\ln (k+1)}{k} \ge \bar{\alpha }$, and
(ii)
$(k+1)\bar{\alpha }\ge 1 $.

Proof

To prove (i), define $f(t):= 1-e^{-t}$, and noting that $f(\cdot )$ is a concave function, the gradient inequality for $f(\cdot )$ at $t=0$ is

$$\begin{aligned} t \ge 1-e^{-t}. \end{aligned}$$

Substituting $t=\frac{\ln (k+1)}{k} $ yields

$$\begin{aligned} \frac{\ln (k+1)}{k} = t \ge 1-e^{-t} = 1 - e^{-\frac{\ln (k+1)}{k}} = 1- \frac{1}{\root k \of {k+1}} = \bar{\alpha }. \end{aligned}$$

Note that (ii) holds for $k = 1$, so assume now that $k \ge 2$. To prove (ii) for $k \ge 2$, substitute $t = -\frac{\ln (k+1)}{k}$ into the gradient inequality above to obtain $-\frac{\ln (k+1)}{k} \ge 1 - (k+1)^{\frac{1}{k}}$ which can be rearranged to:

$$\begin{aligned} (k+1)^{\frac{1}{k}} \ge 1 + \frac{\ln (k+1)}{k} \ge 1 + \frac{\ln (e)}{k} = 1 + \frac{1}{k} = \frac{k+1}{k}. \end{aligned}$$

(58)

Inverting (58) yields:

$$\begin{aligned} (k+1)^{-\frac{1}{k}} \le \frac{k}{k+1} = 1 - \frac{1}{k+1}. \end{aligned}$$

(59)

Finally, rearranging (59) and multiplying by $k+1$ yields (ii).$\square $

Proposition 7.5

For any integers $\ell , k$ with $2 \le \ell \le k$, the following inequalities hold:

$$\begin{aligned} \ln \left( \frac{k+1}{\ell }\right) \le \sum _{i = \ell }^k\frac{1}{i} \le \ln \left( \frac{k}{\ell - 1}\right) , \end{aligned}$$

(60)

and

$$\begin{aligned} \frac{k - \ell + 1}{(k+1)\ell } \le \sum _{i = \ell }^k\frac{1}{i^2} \le \frac{k - \ell + 1}{k(\ell -1)}. \end{aligned}$$

(61)

Proof

(60) and (61) are specific instances of the following more general fact: if $f(\cdot ): [1, \infty ) \rightarrow \mathbb {R}_+$ is a monotonically decreasing continuous function, then

$$\begin{aligned} \int _{\ell }^{k+1}f(t)dt \le \sum _{i = \ell }^kf(i) \le \int _{\ell - 1}^kf(t)dt. \end{aligned}$$

(62)

It is easy to verify that the integral expressions in (62) match the bounds in (60) and (61) for the specific choices of $f(t) = \frac{1}{t}$ and $f(t) = \frac{1}{t^2}$, respectively.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Freund, R.M., Grigas, P. New analysis and results for the Frank–Wolfe method. Math. Program. 155, 199–230 (2016). https://doi.org/10.1007/s10107-014-0841-6

Download citation

Received: 01 July 2013
Accepted: 07 November 2014
Published: 28 November 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s10107-014-0841-6

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New analysis and results for the Frank–Wolfe method

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds

Perturbed Fenchel duality and first-order methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proposition 7.1

Proof

Proposition 7.2

Proof

Proposition 7.3

Proof

Proposition 7.4

Proof

Proposition 7.5

Proof

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

New analysis and results for the Frank–Wolfe method

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds

Perturbed Fenchel duality and first-order methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proposition 7.1

Proof

Proposition 7.2

Proof

Proposition 7.3

Proof

Proposition 7.4

Proof

Proposition 7.5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation