Skip to main content
Log in

Methodology and first-order algorithms for solving nonsmooth and non-strongly convex bilevel optimization problems

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Simple bilevel problems are optimization problems in which we want to find an optimal solution to an inner problem that minimizes an outer objective function. Such problems appear in many machine learning and signal processing applications as a way to eliminate undesirable solutions. In our work, we suggest a new approach that is designed for bilevel problems with simple outer functions, such as the \(l_1\) norm, which are not required to be either smooth or strongly convex. In our new ITerative Approximation and Level-set EXpansion (ITALEX) approach, we alternate between expanding the level-set of the outer function and approximately optimizing the inner problem over this level-set. We show that optimizing the inner function through first-order methods such as proximal gradient and generalized conditional gradient results in a feasibility convergence rate of O(1/k), which up to now was a rate only achieved by bilevel algorithms for smooth and strongly convex outer functions. Moreover, we prove an \(O(1/\sqrt{k})\) rate of convergence for the outer function, contrary to existing methods, which only provide asymptotic guarantees. We demonstrate this performance through numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Both IR-PG and MNG, as well as their convergence results, can easily be extended to the case where g is a proximal friendly function [5, chapter 6].

  2. If then the condition should hold true for any \(\varDelta (\rho )\in (0,\infty )\)

  3. In fact, it can be shown that they are satisfied by any PDA method as defined in [6].

References

  1. Adly, S., Bourdin, L., Caubet, F.: On a decomposition formula for the proximal operator of the sum of two convex functions. J. Convex Anal. 26(3), 699–718 (2019)

    MathSciNet  MATH  Google Scholar 

  2. Amini, M., Yousefian, F.: An iterative regularized incremental projected subgradient method for a class of bilevel optimization problems. In: 2019 American Control Conference (ACC), pp. 4069–4074 (2019)

  3. Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)

    MathSciNet  MATH  Google Scholar 

  4. Beck, A.: Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB. Society for Industrial and Applied Mathematics, Philadelphia, PA (2014)

    MATH  Google Scholar 

  5. Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2017)

    MATH  Google Scholar 

  6. Beck, A., Pauwels, E., Sabach, S.: Primal and dual predicted decrease approximation methods. Math. Program. 167(1), 37–73 (2018)

    MathSciNet  MATH  Google Scholar 

  7. Beck, A., Sabach, S.: A first order method for finding minimal norm-like solutions of convex optimization problems. Math. Program. 147(1), 25–46 (2014)

    MathSciNet  MATH  Google Scholar 

  8. Cabot, A.: Proximal point algorithm controlled by a slowly vanishing term: applications to hierarchical minimization. SIAM J. Optim. 15(2), 555–572 (2005)

    MathSciNet  MATH  Google Scholar 

  9. Chen, S., Donoho, D.: Basis pursuit. In: Proceedings of 1994 28th Asilomar Conference on Signals. Systems and Computers. 1, pp. 41–44 (1994)

  10. Dutta, J., Pandit, T.: Algorithms for Simple Bilevel Programming, pp. 253–291. Springer International Publishing, Cham (2020)

    MATH  Google Scholar 

  11. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)

  12. Gurobi Optimization, L.: Gurobi optimizer reference manual (2021). http://www.gurobi.com

  13. Hansen, P.C.: Regularization tools version 4.0 for matlab 7.3 (2007). http://www2.imm.dtu.dk/~pcha/Regutools/

  14. Helou, E.S., Simões, L.E.: \(\epsilon \)-subgradient algorithms for bilevel convex optimization. Inverse Prob. 33(5), 055020 (2017)

    MathSciNet  MATH  Google Scholar 

  15. Hoerl, A.E., Kennard, R.W.: Ridge regression: applications to nonorthogonal problems. Technometrics 12(1), 69–82 (1970)

    MATH  Google Scholar 

  16. Lewis, A.S., Pang, J.S.: Error bounds for convex inequality systems. In: M.L.J. Crouziex J., V. M. (eds.) Generalized Convexity, Generalized Monotonicity, chap. 3, pp. 75–110. Kluwer Academic Publishers, Alphen aan den Rijn (1998)

  17. Nesterov, Y.: Lectures on Convex Optimization, 2nd edn. Springer International Publishing, Louvain-la-Neuve (2018)

    MATH  Google Scholar 

  18. Pedregosa, F., Negiar, G., Askari, A., Jaggi, M.: Linearly convergent frank-wolfe with backtracking line-search. In: International Conference on Artificial Intelligence and Statistics, pp. 1–10. PMLR (2020)

  19. Sabach, S., Shtern, S.: A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 27(2), 640–660 (2017)

    MathSciNet  MATH  Google Scholar 

  20. Shehu, Y., Vuong, P.T., Zemkoho, A.: An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 36(1), 1–19 (2021)

    MathSciNet  MATH  Google Scholar 

  21. Solodov, M.: An explicit descent method for bilevel convex optimization. J. Convex Anal. 14, 227–237 (2007)

    MathSciNet  MATH  Google Scholar 

  22. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  23. Tikhonov, A.N.: Solutions of ill-posed problems. Winston, Washington, D.C (1977)

    MATH  Google Scholar 

  24. Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15(45), 1523–1548 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to Prof. Shoham Sabach for presenting us with this problem and for his helpful suggestions that greatly improved this work. We additionally want to thank the the editor and the anonymous reviewers for their insightful comments, which have helped improve the presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shimrit Shtern.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proof of Lemma 4

Proof

   

  1. (i)

    \(H(\textbf{x},\textbf{z},\alpha )\) is an extended value function that is jointly convex in \(\textbf{x}\), \(\textbf{z}\), and \(\alpha \), as a sum of the convex functions \(\varphi (\textbf{x})+\left\| {\textbf{x}-\textbf{z}}\right\| ^2\) and the indicator function over the convex set \({{\,\textrm{epi}\,}}(\omega )\). Since by definition \(h(\alpha )\) is a partial minimization of convex function \(H(\textbf{x},\textbf{z},\alpha )\), by [5, Theorem 2.18], it is also convex.

  2. (ii)

    First, we will show h is a nonincreasing function. Let \(\alpha _1\le \alpha _2\), since \({{\,\textrm{Lev}\,}}_{\omega }(\alpha _1)\subseteq {{\,\textrm{Lev}\,}}_{\omega }(\alpha _2)\), it is clear that

    $$\begin{aligned} {h(\alpha _2)=\min _{\begin{array}{c} \textbf{x}\in {\mathbb {R}}^n,\\ \textbf{z}\in {{\,\textrm{Lev}\,}}_{\omega }(\alpha _2) \end{array}}\{\varphi (\textbf{x})+\left\| {\textbf{x}-\textbf{z}}\right\| ^2\}\le \min _{\begin{array}{c} \textbf{x}\in {\mathbb {R}}^n,\\ \textbf{z}\in {{\,\textrm{Lev}\,}}_{\omega }(\alpha _1) \end{array}} \{\varphi (\textbf{x})+\left\| {\textbf{x}-\textbf{z}}\right\| ^2\}=h(\alpha _1).} \end{aligned}$$

    By the definition of \(\omega ^*\), for any \(\alpha \ge \omega ^*\) we have that \(h(\alpha )=\varphi ^*=h(\omega ^*)\), and for any \(\alpha <\omega ^*\), we have that \(h(\alpha )>\varphi ^*\). Therefore, it remains to show that for any , \(h(\alpha _1)>h(\alpha _2)\). The choice of \(\alpha _2\) implies that there exists \(\lambda \in (0,1)\) such that \(\alpha _2=\lambda \alpha _1+(1-\lambda )\omega ^*\). Thus, by the convexity of \(h(\cdot )\) we have that

    $$\begin{aligned} h(\alpha _2)\le \lambda h(\alpha _1)+(1-\lambda )h(\omega ^*)<h(\alpha _1), \end{aligned}$$

    where the final inequality follows from the fact that \(\alpha _1<\omega ^*\), and therefore \(h(\alpha _1)>\varphi ^*\), thus concluding the proof.

\(\square \)

Appendix B Proof of Lemma 5

Proof

First denote \(a=\max \{\frac{2}{\eta },\xi _1\}\). We will prove the statement by induction. In the case \(p=1\), \(\xi _1\le \frac{a}{1}={\max \{\frac{2}{\eta },\xi _1\}}\) trivially holds. Now, assume that the statement is true for \(j=1,...,p\), we will prove it also holds for \(p+1\). By the properties of the sequence and the induction assumption, we have

$$\begin{aligned} (1+\eta \xi _{p+1})\xi _{p+1}\le \xi _p\le \frac{a}{p} \end{aligned}$$
(B.1)

Assume to the contrary that \(\xi _{p+1}>\frac{a}{p+1}\), then by multiplying (B.1) by \(p(p+2)^2\) we obtain

$$\begin{aligned} {p^2+p+\eta a p}< {p^2+2p+1}. \end{aligned}$$

By definition of a, the above inequality implies

$$\begin{aligned} 2p= \frac{2}{\eta } \eta p\le \eta a p < p+1, \end{aligned}$$

and since \(p\ge 1\), we obtain a contradiction. Thus, \(\xi _{p+1}\le \frac{a}{p+1}\). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Doron, L., Shtern, S. Methodology and first-order algorithms for solving nonsmooth and non-strongly convex bilevel optimization problems. Math. Program. 201, 521–558 (2023). https://doi.org/10.1007/s10107-022-01914-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01914-4

Keywords

Mathematics Subject Classification

Navigation