Abstract
Simple bilevel problems are optimization problems in which we want to find an optimal solution to an inner problem that minimizes an outer objective function. Such problems appear in many machine learning and signal processing applications as a way to eliminate undesirable solutions. In our work, we suggest a new approach that is designed for bilevel problems with simple outer functions, such as the \(l_1\) norm, which are not required to be either smooth or strongly convex. In our new ITerative Approximation and Level-set EXpansion (ITALEX) approach, we alternate between expanding the level-set of the outer function and approximately optimizing the inner problem over this level-set. We show that optimizing the inner function through first-order methods such as proximal gradient and generalized conditional gradient results in a feasibility convergence rate of O(1/k), which up to now was a rate only achieved by bilevel algorithms for smooth and strongly convex outer functions. Moreover, we prove an \(O(1/\sqrt{k})\) rate of convergence for the outer function, contrary to existing methods, which only provide asymptotic guarantees. We demonstrate this performance through numerical experiments.
Similar content being viewed by others
Notes
Both IR-PG and MNG, as well as their convergence results, can easily be extended to the case where g is a proximal friendly function [5, chapter 6].
If then the condition should hold true for any \(\varDelta (\rho )\in (0,\infty )\)
In fact, it can be shown that they are satisfied by any PDA method as defined in [6].
References
Adly, S., Bourdin, L., Caubet, F.: On a decomposition formula for the proximal operator of the sum of two convex functions. J. Convex Anal. 26(3), 699–718 (2019)
Amini, M., Yousefian, F.: An iterative regularized incremental projected subgradient method for a class of bilevel optimization problems. In: 2019 American Control Conference (ACC), pp. 4069–4074 (2019)
Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)
Beck, A.: Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB. Society for Industrial and Applied Mathematics, Philadelphia, PA (2014)
Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2017)
Beck, A., Pauwels, E., Sabach, S.: Primal and dual predicted decrease approximation methods. Math. Program. 167(1), 37–73 (2018)
Beck, A., Sabach, S.: A first order method for finding minimal norm-like solutions of convex optimization problems. Math. Program. 147(1), 25–46 (2014)
Cabot, A.: Proximal point algorithm controlled by a slowly vanishing term: applications to hierarchical minimization. SIAM J. Optim. 15(2), 555–572 (2005)
Chen, S., Donoho, D.: Basis pursuit. In: Proceedings of 1994 28th Asilomar Conference on Signals. Systems and Computers. 1, pp. 41–44 (1994)
Dutta, J., Pandit, T.: Algorithms for Simple Bilevel Programming, pp. 253–291. Springer International Publishing, Cham (2020)
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)
Gurobi Optimization, L.: Gurobi optimizer reference manual (2021). http://www.gurobi.com
Hansen, P.C.: Regularization tools version 4.0 for matlab 7.3 (2007). http://www2.imm.dtu.dk/~pcha/Regutools/
Helou, E.S., Simões, L.E.: \(\epsilon \)-subgradient algorithms for bilevel convex optimization. Inverse Prob. 33(5), 055020 (2017)
Hoerl, A.E., Kennard, R.W.: Ridge regression: applications to nonorthogonal problems. Technometrics 12(1), 69–82 (1970)
Lewis, A.S., Pang, J.S.: Error bounds for convex inequality systems. In: M.L.J. Crouziex J., V. M. (eds.) Generalized Convexity, Generalized Monotonicity, chap. 3, pp. 75–110. Kluwer Academic Publishers, Alphen aan den Rijn (1998)
Nesterov, Y.: Lectures on Convex Optimization, 2nd edn. Springer International Publishing, Louvain-la-Neuve (2018)
Pedregosa, F., Negiar, G., Askari, A., Jaggi, M.: Linearly convergent frank-wolfe with backtracking line-search. In: International Conference on Artificial Intelligence and Statistics, pp. 1–10. PMLR (2020)
Sabach, S., Shtern, S.: A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 27(2), 640–660 (2017)
Shehu, Y., Vuong, P.T., Zemkoho, A.: An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 36(1), 1–19 (2021)
Solodov, M.: An explicit descent method for bilevel convex optimization. J. Convex Anal. 14, 227–237 (2007)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
Tikhonov, A.N.: Solutions of ill-posed problems. Winston, Washington, D.C (1977)
Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15(45), 1523–1548 (2014)
Acknowledgements
We are grateful to Prof. Shoham Sabach for presenting us with this problem and for his helpful suggestions that greatly improved this work. We additionally want to thank the the editor and the anonymous reviewers for their insightful comments, which have helped improve the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Proof of Lemma 4
Proof
-
(i)
\(H(\textbf{x},\textbf{z},\alpha )\) is an extended value function that is jointly convex in \(\textbf{x}\), \(\textbf{z}\), and \(\alpha \), as a sum of the convex functions \(\varphi (\textbf{x})+\left\| {\textbf{x}-\textbf{z}}\right\| ^2\) and the indicator function over the convex set \({{\,\textrm{epi}\,}}(\omega )\). Since by definition \(h(\alpha )\) is a partial minimization of convex function \(H(\textbf{x},\textbf{z},\alpha )\), by [5, Theorem 2.18], it is also convex.
-
(ii)
First, we will show h is a nonincreasing function. Let \(\alpha _1\le \alpha _2\), since \({{\,\textrm{Lev}\,}}_{\omega }(\alpha _1)\subseteq {{\,\textrm{Lev}\,}}_{\omega }(\alpha _2)\), it is clear that
$$\begin{aligned} {h(\alpha _2)=\min _{\begin{array}{c} \textbf{x}\in {\mathbb {R}}^n,\\ \textbf{z}\in {{\,\textrm{Lev}\,}}_{\omega }(\alpha _2) \end{array}}\{\varphi (\textbf{x})+\left\| {\textbf{x}-\textbf{z}}\right\| ^2\}\le \min _{\begin{array}{c} \textbf{x}\in {\mathbb {R}}^n,\\ \textbf{z}\in {{\,\textrm{Lev}\,}}_{\omega }(\alpha _1) \end{array}} \{\varphi (\textbf{x})+\left\| {\textbf{x}-\textbf{z}}\right\| ^2\}=h(\alpha _1).} \end{aligned}$$By the definition of \(\omega ^*\), for any \(\alpha \ge \omega ^*\) we have that \(h(\alpha )=\varphi ^*=h(\omega ^*)\), and for any \(\alpha <\omega ^*\), we have that \(h(\alpha )>\varphi ^*\). Therefore, it remains to show that for any , \(h(\alpha _1)>h(\alpha _2)\). The choice of \(\alpha _2\) implies that there exists \(\lambda \in (0,1)\) such that \(\alpha _2=\lambda \alpha _1+(1-\lambda )\omega ^*\). Thus, by the convexity of \(h(\cdot )\) we have that
$$\begin{aligned} h(\alpha _2)\le \lambda h(\alpha _1)+(1-\lambda )h(\omega ^*)<h(\alpha _1), \end{aligned}$$where the final inequality follows from the fact that \(\alpha _1<\omega ^*\), and therefore \(h(\alpha _1)>\varphi ^*\), thus concluding the proof.
\(\square \)
Appendix B Proof of Lemma 5
Proof
First denote \(a=\max \{\frac{2}{\eta },\xi _1\}\). We will prove the statement by induction. In the case \(p=1\), \(\xi _1\le \frac{a}{1}={\max \{\frac{2}{\eta },\xi _1\}}\) trivially holds. Now, assume that the statement is true for \(j=1,...,p\), we will prove it also holds for \(p+1\). By the properties of the sequence and the induction assumption, we have
Assume to the contrary that \(\xi _{p+1}>\frac{a}{p+1}\), then by multiplying (B.1) by \(p(p+2)^2\) we obtain
By definition of a, the above inequality implies
and since \(p\ge 1\), we obtain a contradiction. Thus, \(\xi _{p+1}\le \frac{a}{p+1}\). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Doron, L., Shtern, S. Methodology and first-order algorithms for solving nonsmooth and non-strongly convex bilevel optimization problems. Math. Program. 201, 521–558 (2023). https://doi.org/10.1007/s10107-022-01914-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01914-4