Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net

Bonnel, Henri; Schneider, Christopher

doi:10.1007/s10957-019-01592-x

Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net

Published: 18 October 2019

Volume 183, pages 993–1027, (2019)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

335 Accesses
4 Citations
Explore all metrics

Abstract

The paper deals with the optimal parameter tuning for the elastic net problem. This process is formulated as an optimization problem over a Pareto set. The Pareto set is associated with a convex multi-objective optimization problem, and, based on the scalarization theorem, we give a parametrical representation of it. Thus, the problem becomes a bilevel optimization with a unique response of the follower (strong Stackelberg game). Then, we apply this strategy to the parameter tuning for the elastic net problem. We propose a new algorithm called Ensalg to compute the optimal regularization path of the elastic net w.r.t. the sparsity-inducing term in the objective. In contrast to existing algorithms, our method can also deal with the so-called “many-at-a-time” case, where more than one variable becomes zero at the same time and/or changes from zero. In examples involving real-world data, we demonstrate the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inverse optimization for multi-objective linear programming

Article 30 January 2019

A General Modeling Framework for Robust Optimization

Approximations for Pareto and Proper Pareto solutions and their KKT conditions

Article 06 June 2022

Notes

For a map $\varPhi :A\rightarrow B$ and a subset $S \subseteq A$, we denote by $\varPhi (S) := \{ \varPhi (x) :x\in S\}$ the image of S under $\varPhi $.
This definition holds without the convexity assumption.
$\mathbb {R}^r_+=\{ (\alpha _1, \ldots , \alpha _r)\in \mathbb {R}^r :\alpha _i \ge 0 \text { for all } i\}$ and $ \mathbb {R}^r_{++}=\{(\alpha _1, \ldots , \alpha _r)\in \mathbb {R}^r :\alpha _i > 0 \text { for all } i\}$
$f_i$ coercive means $\lim _{\Vert x\Vert \rightarrow +\infty }f_i(x) = +\infty $.
In these limit cases, in order to satisfy the hypothesis $({\mathrm {H}}_{{\mathrm {we}}})$, i.e., to have uniqueness of the solution $x(0, \beta )$, we have to assume that $p \ge n$ and A is full rank, and hence, $f_1$ is strictly convex.
The results presented in this section work also for $\alpha =0$ under the additional assumption that $p\ge n$ and the matrix A has full rank.
For a matrix $M \in \mathbb {R}^{p\times q}$ and $I = \{ i_1< \ldots < i_k \} \subseteq \{ 1,\ldots , p\}$, $J = \{ j_1< \ldots < j_\ell \} \subseteq \{ 1,\ldots , q\}$, we denote
$$\begin{aligned} M_{IJ} = (M_{ij})_{(i,j)\in I\times J} \in \mathbb {R}^{k\times \ell }\,. \end{aligned}$$
When $k = p$ (resp. $\ell = q$), we set
$$\begin{aligned} M_{\cdot J} = M_{IJ} \quad (\text {resp. } M_{I\cdot }=M_{IJ}) \end{aligned}$$
For a vector $u=(u_1, \ldots , u_n)\in \mathbb {R}^n$, we denote ${{\,\mathrm{{\mathrm {sign}}}\,}}(u)=({{\,\mathrm{{\mathrm {sign}}}\,}}(u_1), \ldots , {{\,\mathrm{{\mathrm {sign}}}\,}}(u_n)).$
Notice that $J_1$ and $J_2$ are not necessarily disjoint.
For a set $I\subseteq \{1, \ldots , n\}$, we denote by $I^c= \{1, \ldots , n\}\setminus I$ its complement.
Notice that for all $i\in I_m^s$, we have $G_i^{m-1} \in \{ -1,\, 1\}$.
Because the matrix $R_{ I_{m_0} I_{m_0}}$ is invertible

References

Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, pp. 481–492 (1951)
Pareto, V.: Manuale di economia politica. Società Editrice Libraria (1906)
Edgeworth, F.Y.: Mathematical Psychics: An Essay on the Application of Mathematics to the Moral Sciences. C.K. Paul & co, London (1881)
MATH Google Scholar
Philip, J.: Algorithms for the vector maximization problem. Math. Program. 2(1), 207–229 (1972)
MathSciNet MATH Google Scholar
Benson, H.P.: Optimization over the efficient set. J. Math. Anal. Appl. 98(2), 562–580 (1984)
MathSciNet MATH Google Scholar
Dauer, J.P.: Optimization over the efficient set using an active constraint approach. Zeitschrift für Oper. Res. 35(3), 185–195 (1991)
MathSciNet MATH Google Scholar
Craven, B.D.: Aspects of multicriteria optimization. In: Recent Developments in Mathematical Programming, pp. 93–100 (1991)
Benson, H.P.: A finite, non-adjacent extreme point search algorithm for optimization over the efficient set. J. Optim. Theory Appl. 73(1), 47–64 (1992)
MathSciNet MATH Google Scholar
Bolintinéanu, S.: Necessary conditions for nonlinear suboptimization over the weakly-efficient set. J. Optim. Theory Appl. 78(2), 579–598 (1993)
MathSciNet MATH Google Scholar
Bolintinéanu, S.: Minimization of a quasi-concave function over an efficient set. Math. Program. 61(1–3), 89–110 (1993)
MathSciNet MATH Google Scholar
Fülöp, J.: A cutting plane algorithm for linear optimization over the efficient set. In: Generalized Convexity, pp. 374–385 (1994)
Google Scholar
Dauer, J.P., Fosnaugh, T.A.: Optimization over the efficient set. J. Global Optim. 7(3), 261–277 (1995)
MathSciNet MATH Google Scholar
An, L.T.H., Tao, P.D., Muu, L.D.: Numerical solution for optimization over the efficient set by D.C. optimization algorithms. Oper. Res. Lett. 19(3), 117–128 (1996)
MathSciNet MATH Google Scholar
Horst, R., Thoai, N.V.: Maximizing a concave function over the efficient or weakly-efficient set. Eur. J. Oper. Res. 117(2), 239–252 (1999)
MATH Google Scholar
Horst, R., Thoai, N.V., Yamamoto, Y., Zenke, D.: On optimization over the efficient set in linear multicriteria programming. J. Optim. Theory Appl. 134(3), 433–443 (2007)
MathSciNet MATH Google Scholar
Kim, N.T.B., Ngoc, T.T.: Optimization over the efficient set of a bicriteria convex programming problem. Pac. J. Optim. 9(1), 103–115 (2013)
MathSciNet MATH Google Scholar
Yamamoto, Y.: Optimization over the efficient set: overview. J. Global Optim. 22(1–4), 285–317 (2002)
MathSciNet MATH Google Scholar
Bolintinéanu, S.: Optimality conditions for minimization over the (weakly or properly) efficient set. J. Math. Anal. Appl. 173(2), 523–541 (1993)
MathSciNet MATH Google Scholar
Bonnel, H., Kaya, C.Y.: Optimization over the efficient set of multi-objective control problems. J. Optim. Theory Appl. 147(1), 93–112 (2010)
MathSciNet MATH Google Scholar
Bonnel, H., Pham, N.S.: Nonsmooth optimization over the (weakly or properly) Pareto set of a linear-quadratic multi-objective control problem: explicit optimality conditions. J. Ind. Manage. Optim. 7(4), 789–809 (2011)
MathSciNet MATH Google Scholar
Bonnel, H.: Post-Pareto analysis for multiobjective parabolic control systems. Ann. Acad. Romanian Sci. Ser. Math. Appl. 5(1–2), 13–34 (2013)
MathSciNet MATH Google Scholar
Bonnel, H., Collonge, J.: Stochastic optimization over a pareto set associated with a stochastic multi-objective optimization problem. J. Optim. Theory Appl. 162(2), 405–427 (2014)
MathSciNet MATH Google Scholar
Bonnel, H., Collonge, J.: Optimization over the Pareto outcome set associated with a convex bi-objective optimization problem: theoretical results, deterministic algorithm and application to the stochastic case. J. Global Optim. 62(3), 481–505 (2015)
MathSciNet MATH Google Scholar
Bonnel, H., Morgan, J.: Semivectorial bilevel optimization problem: penalty approach. J. Optim. Theory Appl. 131(3), 365–382 (2006)
MathSciNet MATH Google Scholar
Bonnel, H.: Optimality conditions for the semivectorial bilevel optimization problem. Pac. J. Optim. 2(3), 447–468 (2006)
MathSciNet MATH Google Scholar
Ankhili, Z., Mansouri, A.: An exact penalty on bilevel programs with linear vector optimization lower level. Eur. J. Oper. Res. 197(1), 36–41 (2009)
MathSciNet MATH Google Scholar
Bonnel, H., Morgan, J.: Semivectorial bilevel convex optimal control problems: existence results. SIAM J. Control Optim. 50(6), 3224–3241 (2012)
MathSciNet MATH Google Scholar
Eichfelder, G.: Multiobjective bilevel optimization. Math. Program. 123(2), 419–449 (2010)
MathSciNet MATH Google Scholar
Zheng, Y., Wan, Z.: A solution method for semivectorial bilevel programming problem via penalty method. J. Appl. Math. Comput. 37(1–2), 207–219 (2011)
MathSciNet MATH Google Scholar
Bonnel, H., Morgan, J.: Optimality conditions for semivectorial bilevel convex optimal control problems. In: Computational and Analytical Mathematics, pp. 45–78 (2013)
Google Scholar
Dempe, S., Gadhi, N., Zemkoho, A.B.: New optimality conditions for the semivectorial bilevel optimization problem. J. Optim. Theory Appl. 157(1), 54–74 (2013)
MathSciNet MATH Google Scholar
Bonnel, H., Todjihoundé, L., Udrişte, C.: Semivectorial bilevel optimization on riemannian manifolds. J. Optim. Theory Appl. 167(2), 464–486 (2015)
MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)
MathSciNet MATH Google Scholar
Tibshirani, R.: Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Giesen, J., Müller, J.K., Laue, S., Swiercy, S.: Approximating concavely parameterized optimization problems. In: Advances in Neural Information Processing Systems (NIPS), pp. 2114–2122 (2012)
Giesen, J., Löhne, A., Laue, S., Schneider, C.: Using benson’s algorithm for regularization parameter tracking. Proc. AAAI Confer. Artif. Intell. 33(01), 3689–3696 (2019)
Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, T.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
MathSciNet MATH Google Scholar
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007)
MathSciNet MATH Google Scholar
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
MathSciNet MATH Google Scholar
Mairal, J., Yu, B.: Complexity analysis of the lasso regularization path. In: International Conference on Machine Learning (ICML), pp. 353–360 (2012)
Jahn, J.: Vector Optimization: Theory, Applications, and Extensions. Springer, Berlin (2011)
MATH Google Scholar
Luc, D.T.: Theory of Vector Optimization. Springer, Berlin (1989)
Google Scholar
Miettinen, K.: Nonlinear Multiobjective Optimization. Springer, Berlin (1998)
MATH Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Murty, K.G.: Linear Complementarity. Internet edn, Linear and Nonlinear Programming (1997)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)
MATH Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Google Scholar
Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E., Yang, N.: Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II Radical prostatectomy treated patients. J. Urol. 141(5), 1076–1083 (1989)
Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, New York (1998)
MATH Google Scholar
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)
MATH Google Scholar

Download references

Acknowledgements

The second author acknowledges financial support by Carl-Zeiss-Stiftung. The first author is grateful to Dr. V. Dragan from the Institute of Mathematics of the Romanian Academy for informing him about the result known as “Complement of Schur”. The authors are grateful to the referees for their comments and suggestions.

Author information

Authors and Affiliations

Université de la Nouvelle-Calédonie, 98851, Nouméa Cédex, New Caledonia
Henri Bonnel
Ernst-Abbe-Hochschule Jena, 07745, Jena, Germany
Christopher Schneider

Authors

Henri Bonnel
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henri Bonnel.

Additional information

Communicated by Xiaoqi Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: Proofs of Theorem 3.1, Proposition 3.1, and Lemma 3.2

Proof of Theorem 3.1

(a)
A simple computation shows that
$$\begin{aligned} {{\,\mathrm{{\mathrm {grad}}}\,}}\left( x \mapsto \frac{1}{2} \Vert Ax-b\Vert _2^2 + \frac{\alpha }{2}\Vert x\Vert _2^2\right) ({\bar{x}}) = A^{\mathsf {T}}A{\bar{x}} - A^{\mathsf {T}}b + \alpha {\bar{x}}\,, \end{aligned}$$
(60)
and hence, the subdifferential of the convex function $J_{\alpha \beta }$ at any $x\in \mathbb {R}^n$, denoted $\partial J_{\alpha \beta }(x)$, is given by the set of all vectors of the form
$$\begin{aligned} \left( A^{\mathsf {T}}A + \alpha I_n\right) x - A^{\mathsf {T}}b + \beta \xi \,, \end{aligned}$$
where $\xi \in [-1,1]^n$ verifies
$$\begin{aligned} \xi _i {\left\{ \begin{array}{ll} = {{\,\mathrm{{\mathrm {sign}}}\,}}(x_i) &{}\text {if } x_i\ne 0\,,\\ \in [-1,1] &{}\text {if } x_i=0\,, \end{array}\right. } \qquad i = 1,\ldots , n\,. \end{aligned}$$
On the other hand, the optimality of $x(\alpha , \beta )$ is equivalent to the relation
$$\begin{aligned} 0 \in \partial J(x(\alpha , \beta ))\,, \end{aligned}$$
and hence, $\xi (\alpha , \beta )$ is the unique vector such that
$$\begin{aligned} \left( A^{\mathsf {T}}A + \alpha I_n\right) x(\alpha , \beta ) - A^{\mathsf {T}}b + \beta \xi (\alpha , \beta )=0 \end{aligned}$$
and
$$\begin{aligned} \xi _i(\alpha , \beta ) {\left\{ \begin{array}{ll} = {{\,\mathrm{{\mathrm {sign}}}\,}}x_i(\alpha ,\beta ) &{}\text {if } x_i(\alpha ,\beta ) \ne 0\,,\\ \in [-1,1] &{}\text {if } x_i(\alpha , \beta ) =0\,, \end{array}\right. } \qquad \text {for } i \in \{ 1, \ldots , n \}\,. \end{aligned}$$
Then, we obviously have relation (6).
(b)
The proof of the converse is obvious.
(d)
If $\beta = 0$, then
$$\begin{aligned} {{\,\mathrm{{\mathrm {grad}}}\,}}\left( J_{\alpha ,0} \right) (x(\alpha ,0)) = {{\,\mathrm{{\mathrm {grad}}}\,}}\left( x \mapsto \frac{1}{2} \Vert Ax-b\Vert _2^2 + \frac{\alpha }{2} \Vert x\Vert _2^2\right) (x(\alpha , 0)) = 0\,, \end{aligned}$$
and hence, by (60) we obtain (8).
(c)
The continuity of the function $x(\cdot , \cdot )$ on $\mathbb {R}_{++}\times \mathbb {R}_+$ is a consequence of [50, Theorem 1.17] (see also [51, Proposition 4.4]). Indeed, it is easy to see that [50, Theorem 1.17] holds also if we replace $\mathbb {R}^m$ with an open subset U of $\mathbb {R}^m$. Thus, with $m = 2$, put $U := \mathbb {R}_{++} \times \mathbb {R}$. Using the notations of [50, Theorem 1.17], for $u := (\alpha , \beta ) \in U \subseteq \mathbb {R}^2$ and $x\in \mathbb {R}^n$, we denote
$$\begin{aligned} f(x,u) := J_{\alpha \beta }(x)\,, \quad p(u) := \inf _x f(x,u)\,, \quad P(u) := {{\,\mathrm{{\mathrm {arg\,min}}}\,}}_x f(x,u)\,. \end{aligned}$$
The function $f(\cdot , \cdot )$ obviously is continuous on $\mathbb {R}^n\times U$ and hence lower semi-continuous. Using the equivalence of norms in $\mathbb {R}^n$, it is easy to see that $(x,u) \mapsto f(x,u)$ is level-bounded in x uniformly in u (see [50, Definition 1.16]). So, all the hypotheses of [50, Theorem 1.17] are fulfilled. Therefore, since f is finite everywhere, we have ${{\,\mathrm{{\mathrm {dom}}}\,}}(p) = U$. Also, for each $u \in U$, we have $p(u) = \min _x f(x,u)$ and P(u) is nonempty and compact. Finally, by (c) of [50, Theorem 1.17], we obtain that p is continuous on U, and hence, (b) holds. Since for each ${\bar{u}} = (\alpha , \beta ) \in U$ with $\beta \ge 0$, the function $f(\cdot , {\bar{u}})$ is strictly convex on $\mathbb {R}^n$, we have that the set $P({\bar{u}})$ is a singleton. Thus, for such $\bar{u}$ (with $\beta \ge 0$), the conclusion (b) of [50, Theorem 1.17] implies that $x(\cdot , \cdot ) :\mathbb {R}_{++}\times \mathbb {R}_+\rightarrow \mathbb {R}^n$ is continuous at ${\bar{u}}=(\alpha , \beta )$. Now, from (6) we have for each $(\alpha , \beta ) \in \mathbb {R}_{++}^2$
$$\begin{aligned} \xi (\alpha ,\beta ) = \beta ^{-1} \left[ \left( A^{\mathsf {T}}A + \alpha I_n\right) x(\alpha ,\beta ) - A^{\mathsf {T}}b \right] , \end{aligned}$$
which proves the uniqueness of $\xi (\alpha , \beta )$ and the continuity of the function $\xi (\cdot , \cdot )$ on $\mathbb {R}_{++}^2$.

$\square $

Proof of Proposition 3.1

(a)
Let $0 \le \lambda _1 \le \ldots \le \lambda _n$ be the eigenvalues of the positive semi-definite matrix $A^{\mathsf {T}}A$. Then, for any $\alpha >0$, the matrix $A^{\mathsf {T}}A + \alpha I_n$ has the eigenvalues $\lambda _1 + \alpha \le \ldots \le \lambda _n + \alpha $. Therefore, the eigenvalues of $R(\alpha )$ are $\frac{1}{\lambda _n + \alpha } \le \ldots \le \frac{1}{\lambda _1 + \alpha }$. Denote the matrix norm induced by the Euclidean norm also by $\Vert \cdot \Vert _2$. Then, it is well known that $\Vert R(\alpha ) \Vert _2 = \frac{1}{\lambda _1 + \alpha }$, and, from Rayleigh quotient, we have that for any vector $v\in \mathbb {R}^n$, $v^{\mathsf {T}}R(\alpha ) v \ge \frac{1}{\lambda _n + \alpha } \Vert v\Vert _2^2$. Thus, multiplying relation (10) on the left by $\xi ^{\mathsf {T}}(\alpha , \beta )$, we obtain
$$\begin{aligned} \xi ^{\mathsf {T}}(\alpha ,\beta ) u(\alpha ) - \beta \, \xi ^{\mathsf {T}}(\alpha ,\beta ) R(\alpha ) \xi (\alpha , \beta )= & {} \xi ^{\mathsf {T}}(\alpha ,\beta ) \cdot x(\alpha ,\beta )\\= & {} \left\| x(\alpha , \beta ) \right\| _1. \end{aligned}$$
Therefore, using Cauchy–Schwarz inequality, we obtain
$$\begin{aligned} \left\| x(\alpha ,\beta ) \right\| _1\le & {} \left\| \xi (\alpha ,\beta ) \right\| _2 \cdot \left\| u(\alpha ) \right\| _2 - \beta \, \xi ^{\mathsf {T}}(\alpha ,\beta ) R(\alpha ) \xi (\alpha ,\beta )\\\le & {} \left\| \xi (\alpha ,\beta ) \right\| _2 \cdot \left\| u(\alpha ) \right\| _2 - \frac{\beta }{\lambda _n + \alpha } \left\| \xi (\alpha ,\beta ) \right\| _2^2. \end{aligned}$$
Since
$$\begin{aligned} \left\| u(\alpha ) \right\| _2 = \left\| R(\alpha ) A^{\mathsf {T}}b \right\| _2 \le \left\| R(\alpha ) \right\| _2\cdot \left\| A^{\mathsf {T}}b \right\| _2 = \frac{1}{\lambda _1 + \alpha } \left\| A^{\mathsf {T}}b \right\| _2, \end{aligned}$$
we obtain that
$$\begin{aligned} 0 \le \left\| x(\alpha ,\beta ) \right\| _1 \le \frac{1}{\lambda _1 + \alpha } \left\| \xi (\alpha ,\beta ) \right\| _2 \cdot \left\| A^{\mathsf {T}}b \right\| _2 - \frac{\beta }{\lambda _n + \alpha } \left\| \xi (\alpha ,\beta ) \right\| _2^2. \end{aligned}$$
(61)
This implies that for all $(\alpha , \beta )\in \mathbb {R}^2_{++}$, we have
$$\begin{aligned} \left\| \xi (\alpha ,\beta ) \right\| _2 \le \frac{\lambda _n + \alpha }{\beta \left( \lambda _1 + \alpha \right) } \left\| A^{\mathsf {T}}b \right\| _2. \end{aligned}$$
(62)
Notice that
$$\begin{aligned} \left\| \xi (\alpha ,\beta ) \right\| _2 < 1 \quad \Longrightarrow \quad x(\alpha ,\beta ) =0\,. \end{aligned}$$
On the other hand, for all $\alpha > 0$ we have
$$\begin{aligned} 1 \le \frac{\lambda _n + \alpha }{\lambda _1 + \alpha } \le \frac{\lambda _n}{\lambda _1}\,. \end{aligned}$$
Therefore, from (62) we obtain that for all $\alpha >0$ and $\beta >\frac{\lambda _n}{\lambda _1}\Vert A^{\mathsf {T}}b\Vert _2$, we have $\Vert \xi (\alpha ,\beta ) \Vert _2 < 1$, so part (a) is proved.
(b)
Let
$$\begin{aligned} \beta \in \left] \left\| A^{\mathsf {T}}b \right\| _2, \frac{\lambda _n}{\lambda _1} \left\| A^{\mathsf {T}}b \right\| _2 \right[. \end{aligned}$$
This is equivalent to
$$\begin{aligned} \frac{\lambda _1}{\lambda _n}< \frac{ \left\| A^{\mathsf {T}}b \right\| _2}{\beta } < 1\,. \end{aligned}$$
The function $\alpha \mapsto \psi (\alpha ) := \frac{\lambda _1 + \alpha }{\lambda _n + \alpha }$ is increasing on $\mathbb {R}_{++}$, and hence, for all $\alpha >0 $,
$$\begin{aligned} \frac{\lambda _1}{\lambda _n}< \psi (\alpha ) < 1\,, \end{aligned}$$
and for $\alpha _0 = \frac{\lambda _n \Vert A^{\mathsf {T}}b \Vert _2 - \beta \lambda _1}{\beta - \Vert A^{\mathsf {T}}b \Vert _2}$, we have
$$\begin{aligned} \psi (\alpha _0) = \frac{\left\| A^{\mathsf {T}}b \right\| _2}{\beta }\,. \end{aligned}$$
Therefore, for all $\alpha > \alpha _0$, we have $\psi (\alpha ) > \frac{\Vert A^{\mathsf {T}}b \Vert _2}{\beta }$, and hence,
$$\begin{aligned} \frac{(\lambda _n + \alpha )}{\beta \left( \lambda _1 + \alpha \right) } \left\| A^{\mathsf {T}}b \right\| _2 < 1\,, \end{aligned}$$
which implies by (62) that $\Vert \xi (\alpha ,\beta ) \Vert _2 < 1$. Thus, $x(\alpha ,\beta ) = 0$.
(c)
The proof of part (c) follows from the fact that $\Vert \xi (\alpha ,\beta ) \Vert _2 \le \sqrt{n}$ for all $(\alpha ,\beta ) \in \mathbb {R}^2_{++}$, and hence, (61) implies that
$$\begin{aligned} \left\| x(\alpha ,\beta ) \right\| _1 \le \frac{1}{\lambda _1+\alpha } \left\| A^{\mathsf {T}}b \right\| _2 - \frac{\beta \sqrt{n}}{\lambda _n + \alpha } \end{aligned}$$
and the last expression goes to 0 when $\alpha \rightarrow +\infty $.

$\square $

Proof of Lemma 3.2

(i)
We claim that
$$\begin{aligned} J_1\cup J_2=\{ 1, \ldots , n\}\,. \end{aligned}$$
(63)
Indeed, if (63) does not hold, then the set $J_1^c\cap J_2^c$ is nonempty. For each $i\in J_1^c\cap J_2^c$, there exist decreasing sequences $(\gamma _k^{(i)}), (\delta _k^{(i)})$ tending to ${{\hat{\beta }}}$ such that $x_i(\gamma _k^{(i)})\ne 0$ and $|\xi _i(\delta _k^{(i)})|<1$, and hence, $x_i(\delta _k^{(i)})=0$. By the continuity of $x_i$ and $\xi _i$ and by (7), we can find open intervals ${]}a_k^{(i)},b_k^{(i)}{[}$ and ${]}c_k^{(i)},d_k^{(i)}{[}$ around $\gamma _k^{(i)}$ (resp. $\delta _k^{(i)}$) such that $x_i(\beta )\ne 0$ for all $\beta \in {]}a_k^{(i)},b_k^{(i)}{[}$ and $x_i(\beta )= 0$ for all $\beta \in {]}c_k^{(i)},d_k^{(i)}{[}$. Based on this property, we can easily find a sequence of mutually disjoint open intervals $(I_k = {]}\nu _k,\mu _k{[})_{k\ge 0}$ such that $\mu _k \searrow {{\hat{\beta }}}$, $\nu _k \searrow {{\hat{\beta }}}$ and for each $i \in J_1^c\cap J_2^c$ and for each integer $k\ge 0$ we have
$$\begin{aligned} \left( x_i(\beta )\ne 0 \text { for all } \beta \in I_k \right) \quad \text {OR} \quad \left( x_i(\beta )= 0 \text { for all } \beta \in I_k \right) . \end{aligned}$$
Moreover, we can assume that these intervals are maximal with this property. This implies that for each k, there exists $i\in J_1^c\cap J_2^c$ such that,
$$\begin{aligned} \begin{aligned}&\text {if } x_i(\beta )\ne 0 \text { for all } \beta \in I_k, \text { then } x_i(\nu _k)=0\\&\qquad \qquad \qquad \qquad \text {OR}\\&\text {if } |\xi _i(\beta )| < 1 \text { for all } \beta \in I_k, \text { then } |\xi _i(\nu _k)| = 1\,. \end{aligned} \end{aligned}$$
(64)
Consider now the index sets $L_k=\{ i \in J_1^c\cap J_2^c :x_i(\beta ) =0 \text { for all } \beta \in I_k\}$, $J^* =\{ 1\le i \le n :|\xi _i({{\hat{\beta }}})| <1\}$. So, for $\beta >{{\hat{\beta }}}$ near ${{\hat{\beta }}}$ and for all $i\in J^*$, $ |\xi _i( \beta )| <1$, and hence, $x_i(\beta )=0$. Thus, for sufficiently large k and for all $i\in M_k := J^*\cup L_k$, we have $x_i(\beta ) = 0$ for all $\beta \in I_k$. On the other hand, for all $i\in M_k^c$ we must have $| \xi _i(\beta )|=1$ for all $\beta \in I_k$. In other words, $\xi _{M_k^c}(\beta )$ is constant on $I_k$ having the coordinates 1 or $-1$. Since for all $\beta \in I_k$ we have $x_{M_k}(\beta )=0$, using (10) we obtain
$$\begin{aligned} 0&=u_{M_k}-\beta R_{M_k\cdot }\xi (\beta )\nonumber \\&=u_{M_k}-\beta \Big (R_{M_kM_k} \xi _{M_k}(\beta ) +R_{M_kM_k^c} \xi _{M_k^c}(\beta )\Big ) \nonumber \\&=u_{M_k}-\beta \Big (R_{M_kM_k} \xi _{M_k}(\beta ) +R_{M_kM_k^c}G^k_{M_k^c}\Big ), \end{aligned}$$
(65)
where $G^k\in \mathbb {R}^n$ is the constant vector such that $G^k_{M_k^c}:=\xi _{M_k^c}(\beta ) $ (for all $\beta \in I_k$). Multiplying Eq. (65) on the left with $\frac{1}{\beta } (R_{M_kM_k}) ^{-1}$, we get easily
$$\begin{aligned} \xi (\beta )=\frac{1}{\beta } F^k+G^k \qquad \text {for all } \beta \in I_k\,, \end{aligned}$$
(66)
where
$$\begin{aligned} F^k_{M_k}&=(R_{M_kM_k}) ^{-1}u_{M_k} \\ G^k_{M_k}&=-(R_{M_kM_k}) ^{-1}R_{M_kM_k^c}G^k_{M_k^c}\\ F^k_{M_k^c}&=0\,. \end{aligned}$$
From (66) and (10), we obtain
$$\begin{aligned} x(\beta )=u-RF^k-\beta RG^k \qquad \text {for all } \beta \in I_k\,. \end{aligned}$$
(67)
Since $M_k\subseteq \{1, \ldots ,n\}$, the number of distinct sets $M_k$ is upper bounded by $2^n$. Hence, we can find a constant subsequence of the sequence $(M_k)_{k\ge 0}$, which—to simplify notation—we denote $(M_{k'})_{k'\ge 0}$. So, let M such that $M_{k'}=M$ for all $k'$. Therefore, there exist F and G such that $F^{k'}=F$, $G^{k'}=G$ for all $k'$. Finally, we have for all $k'$
$$\begin{aligned} x(\beta )&=u-RF-\beta RG&\text {for all } \beta \in I_{k'}\\ \xi (\beta )&=\frac{1}{\beta } F+G&\text {for all } \beta \in I_{k'}\,. \end{aligned}$$
By (64) and using the fact that the set $J_1^c\cap J_2^c$ is finite, we can find a subsequence $(I_{k''})_{k''\ge 0}$ such that there exists $i\in J_1^c\cap J_2^c$ verifying $\Big (x_i(\beta )\ne 0$ for all $\beta \in I_{k''}$ and $x_i(\nu _{k''})=0\Big )$ OR $\Big (|\xi _i(\beta )|<1$ for all $\beta \in I_{k''}$ and $| \xi _i(\nu _{k''})|=1\Big )$ for all $k''$. By the continuity of $x_i(\cdot )$ and $\xi _i(\cdot )$, we get for all $k''$
$$\begin{aligned}&\Big (u_i-R_{i\cdot }F-\nu _{k''} R_{i\cdot }G=0 \quad \text {with }u_i-R_{i\cdot }F\ne 0\Big ) \\&\quad \text { OR } \quad \Big (\Big |\frac{1}{\nu _{k''}} F_i+G_i\Big |=1\quad \text {with } F_i\ne 0\Big ) \end{aligned}$$
which is impossible. This contradiction proves that (63) is satisfied.
(ii)
By (63) we have $x_{J_1}(\beta )=0$ and $\xi _{J_2}(\beta )=\xi _{J_2}({{\hat{\beta }}})$ for all $\beta >{{\hat{\beta }}}$ near ${{\hat{\beta }}}$. Since $J_1^c\subset J_2$ we also have $\xi _{J_1^c}(\beta )=\xi _{J_1^c}({{\hat{\beta }}})$ for all $\beta >{{\hat{\beta }}}$ near ${{\hat{\beta }}}$. From (10)
$$\begin{aligned} 0&=x_{J_1}(\beta ) \\&= u_{J_1}-\beta R_{J_1\cdot }\xi (\beta )\\&= u_{J_1}-\beta \Big (R_{J_1J_1}\xi _{J_1}(\beta )+R_{J_1J_1^c}\xi _{J_1^c}(\beta )\Big )\\&=u_{J_1}-\beta \Big (R_{J_1J_1}\xi _{J_1}(\beta )+R_{J_1J_1^c}\xi _{J_1^c}({{\hat{\beta }}})\Big ) \end{aligned}$$
for all $\beta >{{\hat{\beta }}}$ near ${{\hat{\beta }}}$. It follows immediately that
$$\begin{aligned} \xi _{J_1}(\beta )=\frac{1}{\beta }(R_{J_1J_1})^{-1}u_{J_1}-(R_{J_1J_1})^{-1}R_{J_1J_1^c}\xi _{J_1^c}({{\hat{\beta }}})\,, \end{aligned}$$
and hence, defining F, $G\in \mathbb {R}^n$ by
$$\begin{aligned}&F_{J_1}=(R_{J_1J_1})^{-1}u_{J_1}\,, \quad G_{J_1}=-(R_{J_1J_1})^{-1}R_{J_1J_1^c}\xi _{J_1^c}({{\hat{\beta }}})\,, \quad F_{J_1^c}=0\,,\\&\quad G_{J_1^c} = \xi _{J_1^c}({{\hat{\beta }}})\,, \end{aligned}$$
we obtain (32) and (33).

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonnel, H., Schneider, C. Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net. J Optim Theory Appl 183, 993–1027 (2019). https://doi.org/10.1007/s10957-019-01592-x

Download citation

Received: 25 August 2018
Accepted: 23 September 2019
Published: 18 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10957-019-01592-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net

Abstract

Access this article

Similar content being viewed by others

Inverse optimization for multi-objective linear programming

A General Modeling Framework for Robust Optimization

Approximations for Pareto and Proper Pareto solutions and their KKT conditions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix: Proofs of Theorem 3.1, Proposition 3.1, and Lemma 3.2

Proof of Theorem 3.1

Proof of Proposition 3.1

Proof of Lemma 3.2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Post-Pareto Analysis and a New Algorithm for the Optimal Parameter Tuning of the Elastic Net

Abstract

Access this article

Similar content being viewed by others

Inverse optimization for multi-objective linear programming

A General Modeling Framework for Robust Optimization

Approximations for Pareto and Proper Pareto solutions and their KKT conditions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix: Proofs of Theorem 3.1, Proposition 3.1, and Lemma 3.2

A Appendix: Proofs of Theorem 3.1, Proposition 3.1, and Lemma 3.2

Proof of Theorem 3.1

Proof of Proposition 3.1

Proof of Lemma 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation