Skip to main content
Log in

Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

The paper deals with the optimal parameter tuning for the elastic net problem. This process is formulated as an optimization problem over a Pareto set. The Pareto set is associated with a convex multi-objective optimization problem, and, based on the scalarization theorem, we give a parametrical representation of it. Thus, the problem becomes a bilevel optimization with a unique response of the follower (strong Stackelberg game). Then, we apply this strategy to the parameter tuning for the elastic net problem. We propose a new algorithm called Ensalg to compute the optimal regularization path of the elastic net w.r.t. the sparsity-inducing term in the objective. In contrast to existing algorithms, our method can also deal with the so-called “many-at-a-time” case, where more than one variable becomes zero at the same time and/or changes from zero. In examples involving real-world data, we demonstrate the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. For a map \(\varPhi :A\rightarrow B\) and a subset \(S \subseteq A\), we denote by \(\varPhi (S) := \{ \varPhi (x) :x\in S\}\) the image of S under \(\varPhi \).

  2. This definition holds without the convexity assumption.

  3. \(\mathbb {R}^r_+=\{ (\alpha _1, \ldots , \alpha _r)\in \mathbb {R}^r :\alpha _i \ge 0 \text { for all } i\}\) and \( \mathbb {R}^r_{++}=\{(\alpha _1, \ldots , \alpha _r)\in \mathbb {R}^r :\alpha _i > 0 \text { for all } i\}\)

  4. \(f_i\) coercive means \(\lim _{\Vert x\Vert \rightarrow +\infty }f_i(x) = +\infty \).

  5. In these limit cases, in order to satisfy the hypothesis \(({\mathrm {H}}_{{\mathrm {we}}})\), i.e., to have uniqueness of the solution \(x(0, \beta )\), we have to assume that \(p \ge n\) and A is full rank, and hence, \(f_1\) is strictly convex.

  6. The results presented in this section work also for \(\alpha =0\) under the additional assumption that \(p\ge n\) and the matrix A has full rank.

  7. For a matrix \(M \in \mathbb {R}^{p\times q}\) and \(I = \{ i_1< \ldots < i_k \} \subseteq \{ 1,\ldots , p\}\), \(J = \{ j_1< \ldots < j_\ell \} \subseteq \{ 1,\ldots , q\}\), we denote

    $$\begin{aligned} M_{IJ} = (M_{ij})_{(i,j)\in I\times J} \in \mathbb {R}^{k\times \ell }\,. \end{aligned}$$

    When \(k = p\) (resp. \(\ell = q\)), we set

    $$\begin{aligned} M_{\cdot J} = M_{IJ} \quad (\text {resp. } M_{I\cdot }=M_{IJ}) \end{aligned}$$
  8. For a vector \(u=(u_1, \ldots , u_n)\in \mathbb {R}^n\), we denote \({{\,\mathrm{{\mathrm {sign}}}\,}}(u)=({{\,\mathrm{{\mathrm {sign}}}\,}}(u_1), \ldots , {{\,\mathrm{{\mathrm {sign}}}\,}}(u_n)).\)

  9. Notice that \(J_1\) and \(J_2\) are not necessarily disjoint.

  10. For a set \(I\subseteq \{1, \ldots , n\}\), we denote by \(I^c= \{1, \ldots , n\}\setminus I\) its complement.

  11. Notice that for all \(i\in I_m^s\), we have \(G_i^{m-1} \in \{ -1,\, 1\}\).

  12. Because the matrix \(R_{ I_{m_0} I_{m_0}}\) is invertible

References

  1. Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, pp. 481–492 (1951)

  2. Pareto, V.: Manuale di economia politica. Società Editrice Libraria (1906)

  3. Edgeworth, F.Y.: Mathematical Psychics: An Essay on the Application of Mathematics to the Moral Sciences. C.K. Paul & co, London (1881)

    MATH  Google Scholar 

  4. Philip, J.: Algorithms for the vector maximization problem. Math. Program. 2(1), 207–229 (1972)

    MathSciNet  MATH  Google Scholar 

  5. Benson, H.P.: Optimization over the efficient set. J. Math. Anal. Appl. 98(2), 562–580 (1984)

    MathSciNet  MATH  Google Scholar 

  6. Dauer, J.P.: Optimization over the efficient set using an active constraint approach. Zeitschrift für Oper. Res. 35(3), 185–195 (1991)

    MathSciNet  MATH  Google Scholar 

  7. Craven, B.D.: Aspects of multicriteria optimization. In: Recent Developments in Mathematical Programming, pp. 93–100 (1991)

  8. Benson, H.P.: A finite, non-adjacent extreme point search algorithm for optimization over the efficient set. J. Optim. Theory Appl. 73(1), 47–64 (1992)

    MathSciNet  MATH  Google Scholar 

  9. Bolintinéanu, S.: Necessary conditions for nonlinear suboptimization over the weakly-efficient set. J. Optim. Theory Appl. 78(2), 579–598 (1993)

    MathSciNet  MATH  Google Scholar 

  10. Bolintinéanu, S.: Minimization of a quasi-concave function over an efficient set. Math. Program. 61(1–3), 89–110 (1993)

    MathSciNet  MATH  Google Scholar 

  11. Fülöp, J.: A cutting plane algorithm for linear optimization over the efficient set. In: Generalized Convexity, pp. 374–385 (1994)

    Google Scholar 

  12. Dauer, J.P., Fosnaugh, T.A.: Optimization over the efficient set. J. Global Optim. 7(3), 261–277 (1995)

    MathSciNet  MATH  Google Scholar 

  13. An, L.T.H., Tao, P.D., Muu, L.D.: Numerical solution for optimization over the efficient set by D.C. optimization algorithms. Oper. Res. Lett. 19(3), 117–128 (1996)

    MathSciNet  MATH  Google Scholar 

  14. Horst, R., Thoai, N.V.: Maximizing a concave function over the efficient or weakly-efficient set. Eur. J. Oper. Res. 117(2), 239–252 (1999)

    MATH  Google Scholar 

  15. Horst, R., Thoai, N.V., Yamamoto, Y., Zenke, D.: On optimization over the efficient set in linear multicriteria programming. J. Optim. Theory Appl. 134(3), 433–443 (2007)

    MathSciNet  MATH  Google Scholar 

  16. Kim, N.T.B., Ngoc, T.T.: Optimization over the efficient set of a bicriteria convex programming problem. Pac. J. Optim. 9(1), 103–115 (2013)

    MathSciNet  MATH  Google Scholar 

  17. Yamamoto, Y.: Optimization over the efficient set: overview. J. Global Optim. 22(1–4), 285–317 (2002)

    MathSciNet  MATH  Google Scholar 

  18. Bolintinéanu, S.: Optimality conditions for minimization over the (weakly or properly) efficient set. J. Math. Anal. Appl. 173(2), 523–541 (1993)

    MathSciNet  MATH  Google Scholar 

  19. Bonnel, H., Kaya, C.Y.: Optimization over the efficient set of multi-objective control problems. J. Optim. Theory Appl. 147(1), 93–112 (2010)

    MathSciNet  MATH  Google Scholar 

  20. Bonnel, H., Pham, N.S.: Nonsmooth optimization over the (weakly or properly) Pareto set of a linear-quadratic multi-objective control problem: explicit optimality conditions. J. Ind. Manage. Optim. 7(4), 789–809 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Bonnel, H.: Post-Pareto analysis for multiobjective parabolic control systems. Ann. Acad. Romanian Sci. Ser. Math. Appl. 5(1–2), 13–34 (2013)

    MathSciNet  MATH  Google Scholar 

  22. Bonnel, H., Collonge, J.: Stochastic optimization over a pareto set associated with a stochastic multi-objective optimization problem. J. Optim. Theory Appl. 162(2), 405–427 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Bonnel, H., Collonge, J.: Optimization over the Pareto outcome set associated with a convex bi-objective optimization problem: theoretical results, deterministic algorithm and application to the stochastic case. J. Global Optim. 62(3), 481–505 (2015)

    MathSciNet  MATH  Google Scholar 

  24. Bonnel, H., Morgan, J.: Semivectorial bilevel optimization problem: penalty approach. J. Optim. Theory Appl. 131(3), 365–382 (2006)

    MathSciNet  MATH  Google Scholar 

  25. Bonnel, H.: Optimality conditions for the semivectorial bilevel optimization problem. Pac. J. Optim. 2(3), 447–468 (2006)

    MathSciNet  MATH  Google Scholar 

  26. Ankhili, Z., Mansouri, A.: An exact penalty on bilevel programs with linear vector optimization lower level. Eur. J. Oper. Res. 197(1), 36–41 (2009)

    MathSciNet  MATH  Google Scholar 

  27. Bonnel, H., Morgan, J.: Semivectorial bilevel convex optimal control problems: existence results. SIAM J. Control Optim. 50(6), 3224–3241 (2012)

    MathSciNet  MATH  Google Scholar 

  28. Eichfelder, G.: Multiobjective bilevel optimization. Math. Program. 123(2), 419–449 (2010)

    MathSciNet  MATH  Google Scholar 

  29. Zheng, Y., Wan, Z.: A solution method for semivectorial bilevel programming problem via penalty method. J. Appl. Math. Comput. 37(1–2), 207–219 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Bonnel, H., Morgan, J.: Optimality conditions for semivectorial bilevel convex optimal control problems. In: Computational and Analytical Mathematics, pp. 45–78 (2013)

    Google Scholar 

  31. Dempe, S., Gadhi, N., Zemkoho, A.B.: New optimality conditions for the semivectorial bilevel optimization problem. J. Optim. Theory Appl. 157(1), 54–74 (2013)

    MathSciNet  MATH  Google Scholar 

  32. Bonnel, H., Todjihoundé, L., Udrişte, C.: Semivectorial bilevel optimization on riemannian manifolds. J. Optim. Theory Appl. 167(2), 464–486 (2015)

    MathSciNet  MATH  Google Scholar 

  33. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)

    MathSciNet  MATH  Google Scholar 

  34. Tibshirani, R.: Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  35. Giesen, J., Müller, J.K., Laue, S., Swiercy, S.: Approximating concavely parameterized optimization problems. In: Advances in Neural Information Processing Systems (NIPS), pp. 2114–2122 (2012)

  36. Giesen, J., Löhne, A., Laue, S., Schneider, C.: Using benson’s algorithm for regularization parameter tracking. Proc. AAAI Confer. Artif. Intell. 33(01), 3689–3696 (2019)

    Google Scholar 

  37. Efron, B., Hastie, T., Johnstone, I., Tibshirani, T.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    MathSciNet  MATH  Google Scholar 

  38. Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007)

    MathSciNet  MATH  Google Scholar 

  39. Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)

    MathSciNet  MATH  Google Scholar 

  40. Mairal, J., Yu, B.: Complexity analysis of the lasso regularization path. In: International Conference on Machine Learning (ICML), pp. 353–360 (2012)

  41. Jahn, J.: Vector Optimization: Theory, Applications, and Extensions. Springer, Berlin (2011)

    MATH  Google Scholar 

  42. Luc, D.T.: Theory of Vector Optimization. Springer, Berlin (1989)

    Google Scholar 

  43. Miettinen, K.: Nonlinear Multiobjective Optimization. Springer, Berlin (1998)

    MATH  Google Scholar 

  44. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  45. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  46. Murty, K.G.: Linear Complementarity. Internet edn, Linear and Nonlinear Programming (1997)

  47. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)

    MATH  Google Scholar 

  48. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Google Scholar 

  49. Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E., Yang, N.: Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II Radical prostatectomy treated patients. J. Urol. 141(5), 1076–1083 (1989)

    Google Scholar 

  50. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, New York (1998)

    MATH  Google Scholar 

  51. Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)

    MATH  Google Scholar 

Download references

Acknowledgements

The second author acknowledges financial support by Carl-Zeiss-Stiftung. The first author is grateful to Dr. V. Dragan from the Institute of Mathematics of the Romanian Academy for informing him about the result known as “Complement of Schur”. The authors are grateful to the referees for their comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henri Bonnel.

Additional information

Communicated by Xiaoqi Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: Proofs of Theorem 3.1, Proposition 3.1, and Lemma 3.2

A Appendix: Proofs of Theorem 3.1, Proposition 3.1, and Lemma 3.2

Proof of Theorem 3.1

  1. (a)

    A simple computation shows that

    $$\begin{aligned} {{\,\mathrm{{\mathrm {grad}}}\,}}\left( x \mapsto \frac{1}{2} \Vert Ax-b\Vert _2^2 + \frac{\alpha }{2}\Vert x\Vert _2^2\right) ({\bar{x}}) = A^{\mathsf {T}}A{\bar{x}} - A^{\mathsf {T}}b + \alpha {\bar{x}}\,, \end{aligned}$$
    (60)

    and hence, the subdifferential of the convex function \(J_{\alpha \beta }\) at any \(x\in \mathbb {R}^n\), denoted \(\partial J_{\alpha \beta }(x)\), is given by the set of all vectors of the form

    $$\begin{aligned} \left( A^{\mathsf {T}}A + \alpha I_n\right) x - A^{\mathsf {T}}b + \beta \xi \,, \end{aligned}$$

    where \(\xi \in [-1,1]^n\) verifies

    $$\begin{aligned} \xi _i {\left\{ \begin{array}{ll} = {{\,\mathrm{{\mathrm {sign}}}\,}}(x_i) &{}\text {if } x_i\ne 0\,,\\ \in [-1,1] &{}\text {if } x_i=0\,, \end{array}\right. } \qquad i = 1,\ldots , n\,. \end{aligned}$$

    On the other hand, the optimality of \(x(\alpha , \beta )\) is equivalent to the relation

    $$\begin{aligned} 0 \in \partial J(x(\alpha , \beta ))\,, \end{aligned}$$

    and hence, \(\xi (\alpha , \beta )\) is the unique vector such that

    $$\begin{aligned} \left( A^{\mathsf {T}}A + \alpha I_n\right) x(\alpha , \beta ) - A^{\mathsf {T}}b + \beta \xi (\alpha , \beta )=0 \end{aligned}$$

    and

    $$\begin{aligned} \xi _i(\alpha , \beta ) {\left\{ \begin{array}{ll} = {{\,\mathrm{{\mathrm {sign}}}\,}}x_i(\alpha ,\beta ) &{}\text {if } x_i(\alpha ,\beta ) \ne 0\,,\\ \in [-1,1] &{}\text {if } x_i(\alpha , \beta ) =0\,, \end{array}\right. } \qquad \text {for } i \in \{ 1, \ldots , n \}\,. \end{aligned}$$

    Then, we obviously have relation (6).

  2. (b)

    The proof of the converse is obvious.

  3. (d)

    If \(\beta = 0\), then

    $$\begin{aligned} {{\,\mathrm{{\mathrm {grad}}}\,}}\left( J_{\alpha ,0} \right) (x(\alpha ,0)) = {{\,\mathrm{{\mathrm {grad}}}\,}}\left( x \mapsto \frac{1}{2} \Vert Ax-b\Vert _2^2 + \frac{\alpha }{2} \Vert x\Vert _2^2\right) (x(\alpha , 0)) = 0\,, \end{aligned}$$

    and hence, by (60) we obtain (8).

  4. (c)

    The continuity of the function \(x(\cdot , \cdot )\) on \(\mathbb {R}_{++}\times \mathbb {R}_+\) is a consequence of [50, Theorem 1.17] (see also [51, Proposition 4.4]). Indeed, it is easy to see that [50, Theorem 1.17] holds also if we replace \(\mathbb {R}^m\) with an open subset U of \(\mathbb {R}^m\). Thus, with \(m = 2\), put \(U := \mathbb {R}_{++} \times \mathbb {R}\). Using the notations of [50, Theorem 1.17], for \(u := (\alpha , \beta ) \in U \subseteq \mathbb {R}^2\) and \(x\in \mathbb {R}^n\), we denote

    $$\begin{aligned} f(x,u) := J_{\alpha \beta }(x)\,, \quad p(u) := \inf _x f(x,u)\,, \quad P(u) := {{\,\mathrm{{\mathrm {arg\,min}}}\,}}_x f(x,u)\,. \end{aligned}$$

    The function \(f(\cdot , \cdot )\) obviously is continuous on \(\mathbb {R}^n\times U\) and hence lower semi-continuous. Using the equivalence of norms in \(\mathbb {R}^n\), it is easy to see that \((x,u) \mapsto f(x,u)\) is level-bounded in x uniformly in u (see [50, Definition 1.16]). So, all the hypotheses of [50, Theorem 1.17] are fulfilled. Therefore, since f is finite everywhere, we have \({{\,\mathrm{{\mathrm {dom}}}\,}}(p) = U\). Also, for each \(u \in U\), we have \(p(u) = \min _x f(x,u)\) and P(u) is nonempty and compact. Finally, by (c) of [50, Theorem 1.17], we obtain that p is continuous on U, and hence, (b) holds. Since for each \({\bar{u}} = (\alpha , \beta ) \in U\) with \(\beta \ge 0\), the function \(f(\cdot , {\bar{u}})\) is strictly convex on \(\mathbb {R}^n\), we have that the set \(P({\bar{u}})\) is a singleton. Thus, for such \(\bar{u}\) (with \(\beta \ge 0\)), the conclusion (b) of [50, Theorem 1.17] implies that \(x(\cdot , \cdot ) :\mathbb {R}_{++}\times \mathbb {R}_+\rightarrow \mathbb {R}^n\) is continuous at \({\bar{u}}=(\alpha , \beta )\). Now, from (6) we have for each \((\alpha , \beta ) \in \mathbb {R}_{++}^2\)

    $$\begin{aligned} \xi (\alpha ,\beta ) = \beta ^{-1} \left[ \left( A^{\mathsf {T}}A + \alpha I_n\right) x(\alpha ,\beta ) - A^{\mathsf {T}}b \right] , \end{aligned}$$

    which proves the uniqueness of \(\xi (\alpha , \beta )\) and the continuity of the function \(\xi (\cdot , \cdot )\) on \(\mathbb {R}_{++}^2\).

\(\square \)

Proof of Proposition 3.1

  1. (a)

    Let \(0 \le \lambda _1 \le \ldots \le \lambda _n\) be the eigenvalues of the positive semi-definite matrix \(A^{\mathsf {T}}A\). Then, for any \(\alpha >0\), the matrix \(A^{\mathsf {T}}A + \alpha I_n\) has the eigenvalues \(\lambda _1 + \alpha \le \ldots \le \lambda _n + \alpha \). Therefore, the eigenvalues of \(R(\alpha )\) are \(\frac{1}{\lambda _n + \alpha } \le \ldots \le \frac{1}{\lambda _1 + \alpha }\). Denote the matrix norm induced by the Euclidean norm also by \(\Vert \cdot \Vert _2\). Then, it is well known that \(\Vert R(\alpha ) \Vert _2 = \frac{1}{\lambda _1 + \alpha }\), and, from Rayleigh quotient, we have that for any vector \(v\in \mathbb {R}^n\), \(v^{\mathsf {T}}R(\alpha ) v \ge \frac{1}{\lambda _n + \alpha } \Vert v\Vert _2^2\). Thus, multiplying relation (10) on the left by \(\xi ^{\mathsf {T}}(\alpha , \beta )\), we obtain

    $$\begin{aligned} \xi ^{\mathsf {T}}(\alpha ,\beta ) u(\alpha ) - \beta \, \xi ^{\mathsf {T}}(\alpha ,\beta ) R(\alpha ) \xi (\alpha , \beta )= & {} \xi ^{\mathsf {T}}(\alpha ,\beta ) \cdot x(\alpha ,\beta )\\= & {} \left\| x(\alpha , \beta ) \right\| _1. \end{aligned}$$

    Therefore, using Cauchy–Schwarz inequality, we obtain

    $$\begin{aligned} \left\| x(\alpha ,\beta ) \right\| _1\le & {} \left\| \xi (\alpha ,\beta ) \right\| _2 \cdot \left\| u(\alpha ) \right\| _2 - \beta \, \xi ^{\mathsf {T}}(\alpha ,\beta ) R(\alpha ) \xi (\alpha ,\beta )\\\le & {} \left\| \xi (\alpha ,\beta ) \right\| _2 \cdot \left\| u(\alpha ) \right\| _2 - \frac{\beta }{\lambda _n + \alpha } \left\| \xi (\alpha ,\beta ) \right\| _2^2. \end{aligned}$$

    Since

    $$\begin{aligned} \left\| u(\alpha ) \right\| _2 = \left\| R(\alpha ) A^{\mathsf {T}}b \right\| _2 \le \left\| R(\alpha ) \right\| _2\cdot \left\| A^{\mathsf {T}}b \right\| _2 = \frac{1}{\lambda _1 + \alpha } \left\| A^{\mathsf {T}}b \right\| _2, \end{aligned}$$

    we obtain that

    $$\begin{aligned} 0 \le \left\| x(\alpha ,\beta ) \right\| _1 \le \frac{1}{\lambda _1 + \alpha } \left\| \xi (\alpha ,\beta ) \right\| _2 \cdot \left\| A^{\mathsf {T}}b \right\| _2 - \frac{\beta }{\lambda _n + \alpha } \left\| \xi (\alpha ,\beta ) \right\| _2^2. \end{aligned}$$
    (61)

    This implies that for all \((\alpha , \beta )\in \mathbb {R}^2_{++}\), we have

    $$\begin{aligned} \left\| \xi (\alpha ,\beta ) \right\| _2 \le \frac{\lambda _n + \alpha }{\beta \left( \lambda _1 + \alpha \right) } \left\| A^{\mathsf {T}}b \right\| _2. \end{aligned}$$
    (62)

    Notice that

    $$\begin{aligned} \left\| \xi (\alpha ,\beta ) \right\| _2 < 1 \quad \Longrightarrow \quad x(\alpha ,\beta ) =0\,. \end{aligned}$$

    On the other hand, for all \(\alpha > 0\) we have

    $$\begin{aligned} 1 \le \frac{\lambda _n + \alpha }{\lambda _1 + \alpha } \le \frac{\lambda _n}{\lambda _1}\,. \end{aligned}$$

    Therefore, from (62) we obtain that for all \(\alpha >0\) and \(\beta >\frac{\lambda _n}{\lambda _1}\Vert A^{\mathsf {T}}b\Vert _2\), we have \(\Vert \xi (\alpha ,\beta ) \Vert _2 < 1\), so part (a) is proved.

  2. (b)

    Let

    $$\begin{aligned} \beta \in \left] \left\| A^{\mathsf {T}}b \right\| _2, \frac{\lambda _n}{\lambda _1} \left\| A^{\mathsf {T}}b \right\| _2 \right[. \end{aligned}$$

    This is equivalent to

    $$\begin{aligned} \frac{\lambda _1}{\lambda _n}< \frac{ \left\| A^{\mathsf {T}}b \right\| _2}{\beta } < 1\,. \end{aligned}$$

    The function \(\alpha \mapsto \psi (\alpha ) := \frac{\lambda _1 + \alpha }{\lambda _n + \alpha }\) is increasing on \(\mathbb {R}_{++}\), and hence, for all \(\alpha >0 \),

    $$\begin{aligned} \frac{\lambda _1}{\lambda _n}< \psi (\alpha ) < 1\,, \end{aligned}$$

    and for \(\alpha _0 = \frac{\lambda _n \Vert A^{\mathsf {T}}b \Vert _2 - \beta \lambda _1}{\beta - \Vert A^{\mathsf {T}}b \Vert _2}\), we have

    $$\begin{aligned} \psi (\alpha _0) = \frac{\left\| A^{\mathsf {T}}b \right\| _2}{\beta }\,. \end{aligned}$$

    Therefore, for all \(\alpha > \alpha _0\), we have \(\psi (\alpha ) > \frac{\Vert A^{\mathsf {T}}b \Vert _2}{\beta }\), and hence,

    $$\begin{aligned} \frac{(\lambda _n + \alpha )}{\beta \left( \lambda _1 + \alpha \right) } \left\| A^{\mathsf {T}}b \right\| _2 < 1\,, \end{aligned}$$

    which implies by (62) that \(\Vert \xi (\alpha ,\beta ) \Vert _2 < 1\). Thus, \(x(\alpha ,\beta ) = 0\).

  3. (c)

    The proof of part (c) follows from the fact that \(\Vert \xi (\alpha ,\beta ) \Vert _2 \le \sqrt{n}\) for all \((\alpha ,\beta ) \in \mathbb {R}^2_{++}\), and hence, (61) implies that

    $$\begin{aligned} \left\| x(\alpha ,\beta ) \right\| _1 \le \frac{1}{\lambda _1+\alpha } \left\| A^{\mathsf {T}}b \right\| _2 - \frac{\beta \sqrt{n}}{\lambda _n + \alpha } \end{aligned}$$

    and the last expression goes to 0 when \(\alpha \rightarrow +\infty \).

\(\square \)

Proof of Lemma 3.2

  1. (i)

    We claim that

    $$\begin{aligned} J_1\cup J_2=\{ 1, \ldots , n\}\,. \end{aligned}$$
    (63)

    Indeed, if (63) does not hold, then the set \(J_1^c\cap J_2^c\) is nonempty. For each \(i\in J_1^c\cap J_2^c\), there exist decreasing sequences \((\gamma _k^{(i)}), (\delta _k^{(i)})\) tending to \({{\hat{\beta }}}\) such that \(x_i(\gamma _k^{(i)})\ne 0\) and \(|\xi _i(\delta _k^{(i)})|<1\), and hence, \(x_i(\delta _k^{(i)})=0\). By the continuity of \(x_i\) and \(\xi _i\) and by (7), we can find open intervals \({]}a_k^{(i)},b_k^{(i)}{[}\) and \({]}c_k^{(i)},d_k^{(i)}{[}\) around \(\gamma _k^{(i)}\) (resp. \(\delta _k^{(i)}\)) such that \(x_i(\beta )\ne 0\) for all \(\beta \in {]}a_k^{(i)},b_k^{(i)}{[}\) and \(x_i(\beta )= 0\) for all \(\beta \in {]}c_k^{(i)},d_k^{(i)}{[}\). Based on this property, we can easily find a sequence of mutually disjoint open intervals \((I_k = {]}\nu _k,\mu _k{[})_{k\ge 0}\) such that \(\mu _k \searrow {{\hat{\beta }}}\), \(\nu _k \searrow {{\hat{\beta }}}\) and for each \(i \in J_1^c\cap J_2^c\) and for each integer \(k\ge 0\) we have

    $$\begin{aligned} \left( x_i(\beta )\ne 0 \text { for all } \beta \in I_k \right) \quad \text {OR} \quad \left( x_i(\beta )= 0 \text { for all } \beta \in I_k \right) . \end{aligned}$$

    Moreover, we can assume that these intervals are maximal with this property. This implies that for each k, there exists \(i\in J_1^c\cap J_2^c\) such that,

    $$\begin{aligned} \begin{aligned}&\text {if } x_i(\beta )\ne 0 \text { for all } \beta \in I_k, \text { then } x_i(\nu _k)=0\\&\qquad \qquad \qquad \qquad \text {OR}\\&\text {if } |\xi _i(\beta )| < 1 \text { for all } \beta \in I_k, \text { then } |\xi _i(\nu _k)| = 1\,. \end{aligned} \end{aligned}$$
    (64)

    Consider now the index sets \(L_k=\{ i \in J_1^c\cap J_2^c :x_i(\beta ) =0 \text { for all } \beta \in I_k\}\), \(J^* =\{ 1\le i \le n :|\xi _i({{\hat{\beta }}})| <1\}\). So, for \(\beta >{{\hat{\beta }}}\) near \({{\hat{\beta }}}\) and for all \(i\in J^*\), \( |\xi _i( \beta )| <1\), and hence, \(x_i(\beta )=0\). Thus, for sufficiently large k and for all \(i\in M_k := J^*\cup L_k\), we have \(x_i(\beta ) = 0\) for all \(\beta \in I_k\). On the other hand, for all \(i\in M_k^c\) we must have \(| \xi _i(\beta )|=1\) for all \(\beta \in I_k\). In other words, \(\xi _{M_k^c}(\beta )\) is constant on \(I_k\) having the coordinates 1 or \(-1\). Since for all \(\beta \in I_k\) we have \(x_{M_k}(\beta )=0\), using (10) we obtain

    $$\begin{aligned} 0&=u_{M_k}-\beta R_{M_k\cdot }\xi (\beta )\nonumber \\&=u_{M_k}-\beta \Big (R_{M_kM_k} \xi _{M_k}(\beta ) +R_{M_kM_k^c} \xi _{M_k^c}(\beta )\Big ) \nonumber \\&=u_{M_k}-\beta \Big (R_{M_kM_k} \xi _{M_k}(\beta ) +R_{M_kM_k^c}G^k_{M_k^c}\Big ), \end{aligned}$$
    (65)

    where \(G^k\in \mathbb {R}^n\) is the constant vector such that \(G^k_{M_k^c}:=\xi _{M_k^c}(\beta ) \) (for all \(\beta \in I_k\)). Multiplying Eq. (65) on the left with \(\frac{1}{\beta } (R_{M_kM_k}) ^{-1}\), we get easily

    $$\begin{aligned} \xi (\beta )=\frac{1}{\beta } F^k+G^k \qquad \text {for all } \beta \in I_k\,, \end{aligned}$$
    (66)

    where

    $$\begin{aligned} F^k_{M_k}&=(R_{M_kM_k}) ^{-1}u_{M_k} \\ G^k_{M_k}&=-(R_{M_kM_k}) ^{-1}R_{M_kM_k^c}G^k_{M_k^c}\\ F^k_{M_k^c}&=0\,. \end{aligned}$$

    From (66) and (10), we obtain

    $$\begin{aligned} x(\beta )=u-RF^k-\beta RG^k \qquad \text {for all } \beta \in I_k\,. \end{aligned}$$
    (67)

    Since \(M_k\subseteq \{1, \ldots ,n\}\), the number of distinct sets \(M_k\) is upper bounded by \(2^n\). Hence, we can find a constant subsequence of the sequence \((M_k)_{k\ge 0}\), which—to simplify notation—we denote \((M_{k'})_{k'\ge 0}\). So, let M such that \(M_{k'}=M\) for all \(k'\). Therefore, there exist F and G such that \(F^{k'}=F\), \(G^{k'}=G\) for all \(k'\). Finally, we have for all \(k'\)

    $$\begin{aligned} x(\beta )&=u-RF-\beta RG&\text {for all } \beta \in I_{k'}\\ \xi (\beta )&=\frac{1}{\beta } F+G&\text {for all } \beta \in I_{k'}\,. \end{aligned}$$

    By (64) and using the fact that the set \(J_1^c\cap J_2^c\) is finite, we can find a subsequence \((I_{k''})_{k''\ge 0}\) such that there exists \(i\in J_1^c\cap J_2^c\) verifying \(\Big (x_i(\beta )\ne 0\) for all \(\beta \in I_{k''}\) and \(x_i(\nu _{k''})=0\Big )\) OR \(\Big (|\xi _i(\beta )|<1\) for all \(\beta \in I_{k''}\) and \(| \xi _i(\nu _{k''})|=1\Big )\) for all \(k''\). By the continuity of \(x_i(\cdot )\) and \(\xi _i(\cdot )\), we get for all \(k''\)

    $$\begin{aligned}&\Big (u_i-R_{i\cdot }F-\nu _{k''} R_{i\cdot }G=0 \quad \text {with }u_i-R_{i\cdot }F\ne 0\Big ) \\&\quad \text { OR } \quad \Big (\Big |\frac{1}{\nu _{k''}} F_i+G_i\Big |=1\quad \text {with } F_i\ne 0\Big ) \end{aligned}$$

    which is impossible. This contradiction proves that (63) is satisfied.

  2. (ii)

    By (63) we have \(x_{J_1}(\beta )=0\) and \(\xi _{J_2}(\beta )=\xi _{J_2}({{\hat{\beta }}})\) for all \(\beta >{{\hat{\beta }}}\) near \({{\hat{\beta }}}\). Since \(J_1^c\subset J_2\) we also have \(\xi _{J_1^c}(\beta )=\xi _{J_1^c}({{\hat{\beta }}})\) for all \(\beta >{{\hat{\beta }}}\) near \({{\hat{\beta }}}\). From (10)

    $$\begin{aligned} 0&=x_{J_1}(\beta ) \\&= u_{J_1}-\beta R_{J_1\cdot }\xi (\beta )\\&= u_{J_1}-\beta \Big (R_{J_1J_1}\xi _{J_1}(\beta )+R_{J_1J_1^c}\xi _{J_1^c}(\beta )\Big )\\&=u_{J_1}-\beta \Big (R_{J_1J_1}\xi _{J_1}(\beta )+R_{J_1J_1^c}\xi _{J_1^c}({{\hat{\beta }}})\Big ) \end{aligned}$$

    for all \(\beta >{{\hat{\beta }}}\) near \({{\hat{\beta }}}\). It follows immediately that

    $$\begin{aligned} \xi _{J_1}(\beta )=\frac{1}{\beta }(R_{J_1J_1})^{-1}u_{J_1}-(R_{J_1J_1})^{-1}R_{J_1J_1^c}\xi _{J_1^c}({{\hat{\beta }}})\,, \end{aligned}$$

    and hence, defining F, \(G\in \mathbb {R}^n\) by

    $$\begin{aligned}&F_{J_1}=(R_{J_1J_1})^{-1}u_{J_1}\,, \quad G_{J_1}=-(R_{J_1J_1})^{-1}R_{J_1J_1^c}\xi _{J_1^c}({{\hat{\beta }}})\,, \quad F_{J_1^c}=0\,,\\&\quad G_{J_1^c} = \xi _{J_1^c}({{\hat{\beta }}})\,, \end{aligned}$$

    we obtain (32) and (33).

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bonnel, H., Schneider, C. Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net. J Optim Theory Appl 183, 993–1027 (2019). https://doi.org/10.1007/s10957-019-01592-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-019-01592-x

Keywords

Mathematics Subject Classification

Navigation