Applying FISTA to optimization problems (with or) without minimizers

Bauschke, Heinz H.; Bui, Minh N.; Wang, Xianfu

doi:10.1007/s10107-019-01415-x

Applying FISTA to optimization problems (with or) without minimizers

Full Length Paper
Series A
Published: 22 July 2019

Volume 184, pages 349–381, (2020)
Cite this article

Mathematical Programming Submit manuscript

884 Accesses
5 Citations
Explore all metrics

Abstract

Beck and Teboulle’s FISTA method for finding a minimizer of the sum of two convex functions, one of which has a Lipschitz continuous gradient whereas the other may be nonsmooth, is arguably the most important optimization algorithm of the past decade. While research activity on FISTA has exploded ever since, the mathematically challenging case when the original optimization problem has no minimizer has found only limited attention. In this work, we systematically study FISTA and its variants. We present general results that are applicable, regardless of the existence of minimizers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Article 08 May 2024

CasADi: a software framework for nonlinear optimization and optimal control

Article 11 July 2018

An away-step Frank–Wolfe algorithm for constrained multiobjective optimization

Article 07 May 2024

Notes

For a nonempty set C, ${\text {P}}_{C}$ denotes the projector associated with C.
The set is closed and convex by [21, Corollary 4.2] and [24, Lemma 4].

References

Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
MathSciNet MATH Google Scholar
Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Inertial forward-backward algorithms with perturbations: application to Tikhonov regularization. J. Optim. Theory Appl. 179, 1–36 (2018)
MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B 168, 123–175 (2018)
MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case $\alpha \le 3$. ESAIM Control Optim. Calc. Var. (2019). https://doi.org/10.1051/cocv/2017083
Article MATH Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than $1/k^2$. SIAM J. Optim. 26, 1824–1834 (2016)
MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Fast convergence of an inertial gradient-like system with vanishing viscosity. (2015) arXiv:1507.04782
Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25, 2408–2433 (2015)
MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, second edn. Springer, New York (2017)
MATH Google Scholar
Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)
MATH Google Scholar
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009)
MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problem. SIAM J. Imaging Sci. 2, 183–202 (2009)
MathSciNet MATH Google Scholar
Bello Cruz, J.Y., Nghia, T.T.A.: On the convergence of the forward-backward splitting method with linesearches. Optim. Methods Softw. 31, 1209–1238 (2016)
MathSciNet MATH Google Scholar
Bredies, K.: A forward-backward splitting algorithm for the minimization of non-smooth convex functionals in Banach space. Inverse Probl. 25, 015005 (2009)
MathSciNet MATH Google Scholar
Bruck, R.E., Reich, S.: Nonexpansive projections and resolvents of accretive operators in Banach spaces. Houst. J. Math. 3, 459–470 (1977)
MathSciNet MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166, 968–982 (2015)
MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
MathSciNet MATH Google Scholar
Combettes, P.L.: Quasi-Fejérian analysis of some optimization algorithms. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, vol. 8, pp. 115–152. North-Holland, Amsterdam (2001)
Google Scholar
Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27, 2356–2380 (2017)
MathSciNet MATH Google Scholar
Combettes, P.L., Salzo, S., Villa, S.: Consistent learning by composite proximal thresholding. Math. Program. Ser. B 167, 99–127 (2018)
MathSciNet MATH Google Scholar
Kaczor, W.J., Nowak, M.T.: Problems in Mathematical Analysis. I. Real Numbers, Sequences and Series. American Mathematical Society, Providence, RI (2000)
MATH Google Scholar
Moursi, W.M.: The forward-backward algorithm and the normal problem. J. Optim. Theory Appl. 176, 605–624 (2018)
MathSciNet MATH Google Scholar
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate $O(1/k^{2})$. Dokl. Akad. Nauk 269, 543–547 (1983)
MathSciNet Google Scholar
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
MathSciNet MATH Google Scholar
Pazy, A.: Asymptotic behavior of contractions in Hilbert space. Israel J. Math. 9, 235–240 (1971)
MathSciNet MATH Google Scholar
Rǎdulescu, T.-L., Rǎdulescu, V.D., Andreescu, T.: Problems in Real Analysis: Advanced Calculus on the Real Axis. Springer, New York (2009)
MATH Google Scholar
Schmidt, M., Roux, N.L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 1458–1466. Curran Associates Inc., Red Hook (2011)
Google Scholar
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)
MathSciNet MATH Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23, 1607–1633 (2013)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank two referees for their very careful reading and constructive comments. HHB and XW were partially supported by NSERC Discovery Grants while MNB was partially supported by a Mitacs Globalink Graduate Fellowship Award.

Author information

Authors and Affiliations

Mathematics, University of British Columbia, Kelowna, B.C., V1V 1V7, Canada
Heinz H. Bauschke & Xianfu Wang
Department of Mathematics, North Carolina State University, Raleigh, NC, 27695-8205, USA
Minh N. Bui

Authors

Heinz H. Bauschke
View author publications
You can also search for this author in PubMed Google Scholar
Minh N. Bui
View author publications
You can also search for this author in PubMed Google Scholar
Xianfu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heinz H. Bauschke.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix A

For the sake of completeness, we provide the following proof of Lemma 2.1 based on [20, Problem 3.2.43].

Proof of Lemma 2.1

Because due to the assumption that , it is sufficient to establish that

(126)

Indeed, since $\tau _{n}\rightarrow {+}\infty $, there exists $N \in \mathbb {N}^{*}$ such that

(127)

Now, set , and . Then, on the one hand, since is increasing and positive, we have , and is therefore an increasing sequence in $\mathbb {R}_{+}$; moreover, due to (127), . On the other hand, because $\tau _{n}\rightarrow {+}\infty $, we have $\sigma _{n} =\tau _{n+1}^{2}-\tau _{1}^{2} \rightarrow {+}\infty $. Altogether, since

(128)

by the fact that is increasing, we see that . It follows that the partial sums of do not satisfy the Cauchy property. Hence, since , we obtain

$$\begin{aligned} \sum _{n \geqslant N}\frac{\tau _{n+1}^{2}-\tau _{n}^{2}}{\tau _{n+1}^{2}-\tau _{1}^{2}} = \sum _{n \geqslant N}\frac{\xi _{n}}{\sigma _{n}} = {+}\infty . \end{aligned}$$

(129)

Consequently, in the light of (127),

(130)

and (126) follows. $\square $

1.2 Appendix B

Proof of Lemma 2.2

Let us argue by contradiction. Towards this goal, assume that $\varliminf \beta _{n} \in ] 0, {+}\infty ]$ and fix $ \beta \in ] 0 , \varliminf \beta _{n} [$. Then, there exists $N \in \mathbb {N}^{*}$ such that , and hence, because , we have , it follows that $\sum _{n \geqslant N}\alpha _{n}\beta _{n} \geqslant \sum _{n \geqslant N} \beta \alpha _{n} = {+}\infty $, which violates our assumption. To sum up, $\varliminf \beta _{n} = 0$. $\square $

1.3 Appendix C

The following self-contained proof of Lemma 2.3 follows [17, Lemma 3.1] in the case $\chi =1$; however, we do not require the error sequence to be positive.

Proof of Lemma 2.3

(i):
Set $\alpha :=\varliminf _{n}\alpha _{n} \in \left[ \inf _{n \in \mathbb {N}^{*}}\alpha _{n}, {+}\infty \right] $ and let be a subsequence of that converges to $\alpha $. We first show that $\alpha < {+}\infty $. Since , it follows from (9) that . Thus, ; in particular, . Hence, since $\alpha _{k_{n}}\rightarrow \alpha $ and $\sum _{n \in \mathbb {N}^{*}}\varepsilon _{n}$ converges, it follows that $\alpha \leqslant \alpha _{1} + \sum _{k \in \mathbb {N}}\varepsilon _{k } < {+}\infty $, as claimed. In turn, to establish the convergence of , it suffices to verify that $\varlimsup _{n}\alpha _{n} \leqslant \varliminf _{n} \alpha _{n}$. Towards this goal, let $\delta $ be in $]0, {+}\infty [$. Then, on the one hand, Cauchy’s criterion ensures the existence of $k_{n_{0}} \in \mathbb {N}^{*}$ such that $\alpha _{k_{n_{0} } } - \alpha \leqslant \delta /2 $ and that . On the other hand, because , (9) implies that . Altogether, , from which we deduce that $\varlimsup _{n} \alpha _{n} \leqslant \alpha + \delta $. Consequently, since $\delta $ is arbitrarily chosen in $]0, {+}\infty [$, it follows that $\varlimsup _{n}\alpha _{n} \leqslant \alpha = \varliminf _{n}\alpha _{n}$, and therefore, converges to $\alpha $.
(ii):
We derive from (9) that . Hence, since $\sum _{n \in \mathbb {N}^{*}}\varepsilon _{n}$ is convergent and, by (i), $\lim _{n}\alpha _{n} =\alpha $, letting $N \rightarrow {+}\infty $ yields $\sum _{n \in \mathbb {N}} \beta _{n} \leqslant \alpha _{1} - \alpha + \sum _{n \in \mathbb {N}}\varepsilon _{n } < {+}\infty $, and so $\sum _{n \in \mathbb {N}^{*}}\beta _{n} < {+}\infty $, as required.

$\square $

1.4 Appendix D

Proof of Lemma 2.4

Indeed, since

(131a)

(131b)

we readily obtain the conclusion. $\square $

1.5 Appendix E

Proof of Lemma 2.5

“$\Rightarrow $”: Since is a decreasing sequence in $\mathbb {R}_{+}$ and $\sum _{n \in \mathbb {N}^{*}}\alpha _{n} < {+}\infty $, it follows that $n\alpha _{n}\rightarrow 0$ (see, e.g., [20, Problem 3.2.35]). Invoking the assumption that $\sum _{n \in \mathbb {N}^{*}}\alpha _{n} < {+}\infty $ once more, we infer from Lemma 2.4 that , as desired.

“$\Leftarrow $”: A consequence of Lemma 2.4. $\square $

1.6 Appendix F

Proof of Lemma 3.1

This is similar to the one found in [11, Lemma 2.3] and included for completeness; see also [15, Lemma 3.1]. Fix . On the one hand, by (A1) and (A3) in Assumption 1.1, ${\nabla {f} } $ is Lipschitz continuous with constant $\gamma ^{-1}$, from which, the Descent Lemma (see, e.g., [8, Lemma 2.64]), and the convexity of f we infer that

(132a)

(132b)

(132c)

On the other hand, because , [8, Proposition 12.26] asserts that

(133a)

(133b)

Altogether, upon adding (132) and (133), it follows that

(134a)

(134b)

(134c)

which yields (18). $\square $

1.7 Appendix G

Proof of (50)

Recall that $\lim (\tau _n/n)=1/2$. In turn, because $(\forall n\in \mathbb {N}^{*})$ $\tau _n^2=\tau _{n+1}^2-\tau _{n+1}$, it follows that

$$\begin{aligned} \frac{n(\tau _n-\tau _{n+1})}{\tau _{n+1}}= & {} \frac{n(\tau _n^2-\tau _{n+1}^2)}{\tau _{n+1}(\tau _n+\tau _{n+1})} =\frac{-n\tau _{n+1}}{\tau _{n+1}(\tau _n+\tau _{n+1})}\nonumber \\= & {} \frac{-1}{\displaystyle \frac{\tau _n}{n}+\frac{\tau _{n+1}}{n+1}\frac{n+1}{n}} \rightarrow \frac{-1}{\frac{1}{2}+\frac{1}{2}}=-1 \end{aligned}$$

(135)

and therefore that

$$\begin{aligned} n\Bigg (\frac{\tau _n-1}{\tau _{n+1}}-1+\frac{3}{n}\Bigg ) =\frac{n(\tau _n-\tau _{n+1})}{\tau _{n+1}} -\frac{n+1}{\tau _{n+1}}\frac{n}{n+1}+3 \rightarrow -1-2+3=0. \end{aligned}$$

(136)

Hence, (50) holds. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bauschke, H.H., Bui, M.N. & Wang, X. Applying FISTA to optimization problems (with or) without minimizers. Math. Program. 184, 349–381 (2020). https://doi.org/10.1007/s10107-019-01415-x

Download citation

Received: 22 November 2018
Accepted: 09 July 2019
Published: 22 July 2019
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10107-019-01415-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying FISTA to optimization problems (with or) without minimizers

Abstract

Access this article

Similar content being viewed by others

Learning to optimize: A tutorial for continuous and mixed-integer optimization

CasADi: a software framework for nonlinear optimization and optimal control

An away-step Frank–Wolfe algorithm for constrained multiobjective optimization

Notes

References

Acknowledgements