The degrees of freedom of partly smooth regularizers

Vaiter, Samuel; Deledalle, Charles; Fadili, Jalal; Peyré, Gabriel; Dossal, Charles

doi:10.1007/s10463-016-0563-z

The degrees of freedom of partly smooth regularizers

Published: 25 May 2016

Volume 69, pages 791–832, (2017)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Samuel Vaiter¹,
Charles Deledalle³,
Jalal Fadili²,
Gabriel Peyré¹ &
…
Charles Dossal³

527 Accesses
18 Citations
Explore all metrics

Abstract

We study regularized regression problems where the regularizer is a proper, lower-semicontinuous, convex and partly smooth function relative to a Riemannian submanifold. This encompasses several popular examples including the Lasso, the group Lasso, the max and nuclear norms, as well as their composition with linear operators (e.g., total variation or fused Lasso). Our main sensitivity analysis result shows that the predictor moves locally stably along the same active submanifold as the observations undergo small perturbations. This plays a pivotal role in getting a closed-form expression for the divergence of the predictor w.r.t. observations. We also show that, for many regularizers, including polyhedral ones or the analysis group Lasso, this divergence formula holds Lebesgue a.e. When the perturbation is random (with an appropriate continuous distribution), this allows us to derive an unbiased estimator of the degrees of freedom and the prediction risk. Our results unify and go beyond those already known in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quadratic Growth Conditions and Uniqueness of Optimal Solution to Lasso

Article Open access 25 March 2022

Convergence and sparsity of Lasso and group Lasso in high-dimensional generalized linear models

Article 03 July 2014

Kurdyka–Łojasiewicz Exponent via Inf-projection

Article 09 July 2021

Notes

Strictly speaking, the minimization may have to be over a convex subset of $\mathbb {R}^p$.
The meaning of sensitivity is different here from what is usually intended in statistical sensitivity and uncertainty analysis.
We write the same symbol as for the derivative, and rigorously speaking, this has to be understood to hold Lebesgue a.e.
To be understood here as a set-valued mapping.
Obviously, Lemma 2(ii) holds in such a case at the unique minimizer ${\widehat{\beta }}(y)$.

References

Absil, P. A., Mahony, R., Trumpf, J. (2013). An extrinsic look at the riemannian hessian. Geometric science. of information (Vol. 8085, pp. 361–368)., Lecture notes in computer science. Berlin: Springer.
Bach, F. (2008). Consistency of the group lasso and multiple kernel learning. Journal of Machine Learning Research, 9, 1179–1225.
MathSciNet MATH Google Scholar
Bach, F. (2010). Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4, 384–414.
Article MathSciNet MATH Google Scholar
Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Thesis (Ph.D.)–Australian National University.
Bickel, P. J., Ritov, Y., Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics, 37(4), 1705–1732.
Bolte, J., Daniilidis, A., Lewis, A. S. (2011). Generic optimality conditions for semialgebraic convex programs. Mathematics of Operations Research, 36(1), 55–70.
Bonnans, J., Shapiro, A. (2000). Perturbation analysis of optimization problems., Springer Series in Operations Research. New York: Springer.
Brown, L. D. (1986). Fundamentals of statistical exponential families with applications in statistical decision theory, monograph series. Institute of Mathematical Statistics lecture notes (Vol. 9). Hayward: IMS.
Google Scholar
Bühlmann, P., van de Geer, S. (2011). Statistics for high-dimensional data: Methods. Theory and Applications., Springer Series in Statistics. Berlin: Springer.
Bunea, F. (2008). Honest variable selection in linear and logistic regression models via $\ell _1$ and $\ell _1+\ell _2$ penalization. Electronic Journal of Statistics, 2, 1153–1194.
Article MathSciNet MATH Google Scholar
Candès, E., Plan, Y. (2009). Near-ideal model selection by $\ell _1$ minimization. Annals of Statistics, 37(5A), 2145–2177.
Candès, E. J., Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6), 717–772.
Candès, E.J., Li, X., Ma, Y., Wright, J. (2011). Robust principal component analysis? Journal of the ACM 58(3):11:1–11:37.
Candès, E. J., Sing-Long, C. A., Trzasko, J. D. (2012). Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Transactions on Signal Processing, 61(19), 4643–4657.
Candès, E. J., Strohmer, T., Voroninski, V. (2013). Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8), 1241–1274.
Chavel, I. (2006). Riemannian geometry: a modern introduction. Cambridge studies in advanced mathematics (2nd ed., Vol. 98). New York: Cambridge University Press.
Chen, S., Donoho, D., Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.
Chen, X., Lin, Q., Kim, S., Carbonell, J. G., Xing, E. P. (2010). An efficient proximal-gradient method for general structured sparse learning. arXiv:1005.4717.
Combettes, P., Pesquet, J. (2007). A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing, 1(4), 564–574.
Coste, M. (1999). An introduction to o-minimal geometry. Technical report, Institut de Recherche Mathematiques de Rennes.
Coste, M. (2002). An introduction to semialgebraic geometry. Technical report, Institut de Recherche Mathematiques de Rennes.
Daniilidis, A., Hare, W., Malick, J. (2009). Geometrical interpretation of the predictor–corrector type algorithms in structured optimization problems. Optimization: A Journal of Mathematical Programming & Operations Research 55(5–6), 482–503.
Daniilidis, A., Drusvyatskiy, D., Lewis, A. S. (2013). Orthogonal invariance and identifiability. Technical report, Vol. 1304, p. 1198.
DasGupta, A. (2008). Asymptotic theory of statistics and probability. Berlin: Springer.
MATH Google Scholar
Deledalle, C. A., Vaiter, S., Peyré, G., Fadili, M., Dossal, C. (2012). Risk estimation for matrix recovery with spectral regularization. In: ICML’12 workshop on sparsity, dictionaries and projections in machine learning and signal processing. arXiv:1205.1482.
Deledalle, C. A., Vaiter, S., Peyré, G., Fadili, J. M. (2014). Stein unbiased gradient estimator of the risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences, 7(4), 2448–2487.
Donoho, D. (2006). For most large underdetermined systems of linear equations the minimal $\ell ^1$-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6), 797–829.
Article MathSciNet MATH Google Scholar
Dossal, C., Kachour, M., Fadili, M. J., Peyré, G., Chesneau, C. (2013). The degrees of freedom of penalized $\ell _1$ minimization. Statistica Sinica, 23(2), 809–828.
Drusvyatskiy, D., Lewis, A. (2011). Generic nondegeneracy in convex optimization. Proceedings of the American Mathematical Society, 129, 2519–2527.
Drusvyatskiy, D., Ioffe, A., Lewis, A. (2015). Generic minimizing behavior in semi-algebraic optimizatio. SIAM Journal on Optimization (To appear).
Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81(394), 461–470.
Article MathSciNet MATH Google Scholar
Eldar, Y. C. (2009). Generalized SURE for exponential families: Applications to regularization. IEEE Transactions on Signal Processing, 57(2), 471–481.
Article MathSciNet Google Scholar
Evans, L. C., Gariepy, R. F. (1992). Measure theory and fine properties of functions. Studies in advanced mathematics. Boca Raton: CRC Press.
Fazel, M., Hindi, H., Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. Proceedings of the American Control Conference IEEE, 6, 4734–4739.
van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Annals of Statistics, 36, 614–645.
Article MathSciNet MATH Google Scholar
Hansen, N. R., Sokol, A. (2014). Degrees of freedom for nonlinear least squares estimation. Technical report, Vol. 1402, p. 2997.
Hudson, H. (1978). A natural identity for exponential families with applications in multiparameter estimation. Annals of Statistics, 6(3), 473–484.
Article MathSciNet MATH Google Scholar
Hwang, J. T. (1982). Improving upon standard estimators in discrete exponential families with applications to poisson and negative binomial cases. Annals of Statistics, 10(3), 857–867.
Article MathSciNet MATH Google Scholar
Jacob, L., Obozinski, G., Vert, J. P. (2009). Group lasso with overlap and graph lasso. In: Danyluk, A. P., Bottou, L., Littman, M. L. (eds.) ICML’09, Vol. 382, p. 55.
Jégou, H., Furon, T., Fuchs, J. J. (2012). Anti-sparse coding for approximate nearest neighbor search. In: IEEE ICASSP, pp. 2029–2032.
Kakade, S., Shamir, O., Sindharan, K., Tewari, A. (2010). Learning exponential families in high-dimensions: Strong convexity and sparsity. In: Teh, Y. W., Titterington, D. M. (eds.) Proceedings of the thirteenth international conference on artificial intelligence and statistics (AISTATS-10), Vol. 9, pp. 381–388.
Kato, K. (2009). On the degrees of freedom in shrinkage estimation. Journal of Multivariate Analysis, 100(7), 1338–1352.
Article MathSciNet MATH Google Scholar
Lee, J. M. (2003). Introduction to smooth manifolds. Graduate texts in mathematics. New York: Springer.
Book Google Scholar
Lemaréchal, C., Hiriart-Urruty, J. (1996). Convex analysis and minimization algorithms: Fundamentals (Vol. 305). Berlin: Springer.
Lemaréchal, C., Oustry, F., Sagastizábal, C. (2000). The $\cal U$-lagrangian of a convex function. Transactions of the American mathematical Society, 352(2), 711–729.
Lewis, A. (1995). The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis, 2, 173–183.
MathSciNet MATH Google Scholar
Lewis, A., Sendov, H. (2001). Twice differentiable spectral functions. SIAM Journal on Matrix Analysis on Matrix Analysis and Applications, 23, 368–386.
Lewis, A. S. (2003a). Active sets, nonsmoothness, and sensitivity. SIAM Journal on Optimization, 13(3), 702–725.
Article MathSciNet MATH Google Scholar
Lewis, A. S. (2003b). The mathematics of eigenvalue optimization. Mathematical Programming, 97(1–2), 155–176.
Article MathSciNet MATH Google Scholar
Lewis, A. S., Zhang, S. (2013). Partial smoothness, tilt stability, and generalized hessians. SIAM Journal on Optimization, 23(1), 74–94.
Liang, J., Fadili, M. J., Peyré, G., Luke, R. (2014). Activity Identification and local linear convergence of Douglas–Rachford/ADMM under partial smoothness. arXiv:1412.6858.
Liu, H., Zhang, J. (2009). Estimation consistency of the group lasso and its applications. Journal of Machine Learning Research, 5, 376–383.
Lyubarskii, Y., Vershynin, R. (2010). Uncertainty principles and vector quantization. IEEE Transactions on Information Theory, 56(7), 3491–3501.
McCullagh, P., Nelder, J. A. (1989). Generalized Linear Models (2nd edn). Monographs on Statistics & Applied Probability. Boca Raton: Chapman & Hall/CRC.
Meier, L., Geer, S. V. D., Buhlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 51–71.
Meinshausen, N., Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.
Meyer, M., Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Annals of Statistics, 28(4), 1083–1104.
Miller, S. A., Malick, J. (2005). Newton methods for nonsmooth convex minimization: connections among-lagrangian, riemannian newton and sqp methods. Mathematical Programming, 104(2–3), 609–633.
Mordukhovich, B. (1992). Sensitivity analysis in nonsmooth optimization. In: Field, D. A. & Komkov, V. (eds.) Theoretical aspects of industrial design. SIAM volumes in applied mathematics (Vol. 58), Philadelphia, pp 32–46.
Negahban, S., Ravikumar, P., Wainwright, M. J., Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statistical Science, 27(4), 538–557.
Osborne, M., Presnell, B., Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20(3), 389–403.
Peyré, G., Fadili, J., Chesneau, C. (2011). Adaptive structured block sparsity via dyadic partitioning. In: EUSIPCO, Barcelona, Spain.
Ramani, S., Blu, T., Unser, M. (2008). Monte-Carlo SURE: A black-box optimization of regularization parameters for general denoising algorithms. IEEE Transactions on Image Processing, 17(9), 1540–1554.
Recht, B., Fazel, M., Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.
Rockafellar, R. T. (1996). Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton: Princeton University Press.
Google Scholar
Rudin, L., Osher, S., Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1–4), 259–268.
Saad, Y., Schultz, M. H. (1986). Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7(3), 856–869.
Solo, V., Ulfarsson, M. (2010). Threshold selection for group sparsity. In: IEEE ICASSP, pp. 3754–3757.
Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9(6), 1135–1151.
Article MathSciNet MATH Google Scholar
Studer, C., Yin, W., Baraniuk, R. G. (2012). Signal representations with minimum $\ell _\infty $-norm. In: 50th annual Allerton conference on communication, control, and computing.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B Methodological, 58(1), 267–288.
MathSciNet MATH Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K. (2005). Sparsity and smoothness via the fused Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1), 91–108.
Tibshirani, R. J., Taylor, J. (2012). Degrees of freedom in Lasso problems. Annals of Statistics, 40(2), 639–1284.
Tikhonov, A. N., Arsenin, V. Y. (1977). Solutions of Ill-posed problems. New York: Halsted Press.
Vaiter, S., Deledalle, C., Peyré, G., Fadili, M. J., Dossal, C. (2012). Degrees of freedom of the group Lasso. In: ICML’12 Workshops, pp. 89–92.
Vaiter, S., Deledalle, C., Peyré, G., Dossal, C., Fadili, M. J. (2013). Local behavior of sparse analysis regularization: Applications to risk estimation. Applied and Computational Harmonic Analysis, 35(3), 433–451.
Vaiter, S., Peyré, G., Fadili, M. J. (2014). Model consistency of partly smooth regularizers. arXiv:1405.1004.
Vaiter, S., Golbabaee, M., Fadili, M. J., Peyré, G. (2015). Model selection with low complexity priors. Information and Inference: A Journal of the IMA (IMAIAI).
van den Dries, L. (1998). Tame topology and o-minimal structures. Mathematrical society lecture notes (Vol. 248). New York: Cambridge Univiversity Press.
Book Google Scholar
van den Dries, L., Miller, C. (1996). Geometric categories and o-minimal structures. Duke Mathematical Journal, 84, 497–540.
Vonesch, C., Ramani, S., Unser, M. (2008). Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint. In: ICIP, IEEE, pp. 665–668.
Wei, F., Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli, 16(4), 1369–1384.
Wright, S. J. (1993). Identifiable surfaces in constrained optimization. SIAM Journal on Control and Optimization, 31(4), 1063–1079.
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Yuan, M., Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94(1), 19–35.
Zou, H., Hastie, T., Tibshirani, R. (2007). On the “degrees of freedom” of the Lasso. Annals of Statistics, 35(5), 2173–2192.

Download references

Acknowledgments

This work has been supported by the European Research Council (ERC project SIGMA-Vision) and Institut Universitaire de France.

Author information

Authors and Affiliations

CEREMADE, CNRS, Université Paris-Dauphine, Place du Maréchal De Lattre De Tassigny, 75775, Paris Cedex 16, France
Samuel Vaiter & Gabriel Peyré
Normandie Univ, ENSICAEN, CNRS, GREYC, 6, Bd du Maréchal Juin, 14050, Caen Cedex, France
Jalal Fadili
IMB, CNRS, Université Bordeaux 1, 351, Cours de la libération, 33405, Talence Cedex, France
Charles Deledalle & Charles Dossal

Authors

Samuel Vaiter
View author publications
You can also search for this author in PubMed Google Scholar
Charles Deledalle
View author publications
You can also search for this author in PubMed Google Scholar
Jalal Fadili
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Peyré
View author publications
You can also search for this author in PubMed Google Scholar
Charles Dossal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jalal Fadili.

Basic properties of o-minimal structures

In the following results, we collect some important stability properties of o-minimal structures. To be self-contained, we also provide proofs. To the best of our knowledge, these proofs, although simple, are not reported in the literature or some of them are left as exercises in the authoritative references (van den Dries 1998; Coste 1999). Moreover, in most proofs, to show that a subset is definable, we could just write the appropriate first-order formula (see Coste 1999, Page 12; van den Dries 1998, Section Ch1.1.2), and conclude using (Coste 1999, Theorem 1.13). Here, for the sake of clarity and to avoid cryptic statements for the non-specialist, we will translate the first-order formula into operations on the involved subsets, in particular projections, and invoke the above stability axioms of o-minimal structures. In the following, n denotes an arbitrary (finite) dimension which is not necessarily the number of observations used previously the paper.

Lemma 5

(Addition and multiplication) Let $f : \Omega \subset \mathbb {R}^n \rightarrow \mathbb {R}^p$ and $g : \Omega \subset \mathbb {R}^n \subset \mathbb {R}^p$ be definable functions. Then their pointwise addition and multiplication is also definable.

Proof

Let $h=f+g$, and

$$\begin{aligned} B=(\Omega \times \mathbb {R}\times \Omega \times \mathbb {R}\times \Omega \times \mathbb {R}) \cap (\Omega \times \mathbb {R}\times {{\mathrm{gph}}}(f) \times {{\mathrm{gph}}}(h)) \cap S, \end{aligned}$$

where $S= \{ (x,u,y,v,z,w) \;:\; x=y=z, u=v+w \} $ is obviously an algebraic (in fact linear) subset, hence definable by axiom 2. Axiom 1 and 2 then imply that B is also definable. Let $\varPi _{3n+3p,n+p}: \mathbb {R}^{3n+3p} \rightarrow \mathbb {R}^{n+p}$ be the projection on the first $n+p$ coordinates. We then have

$$\begin{aligned} {{\mathrm{gph}}}(h) = \varPi _{3n+3p,n+p}(B) \end{aligned}$$

whence we deduce that h is definable by applying $3n+3p$ times axiom 4. Definability of the pointwise multiplication follows the same proof taking $u=v \cdot w$ in S. $\square $

Lemma 6

(Inequalities in definable sets) Let $f : \Omega \subset \mathbb {R}^n \rightarrow \mathbb {R}$ be a definable function. Then $ \{ x \in \Omega \;:\; f(x) > 0 \} $, is definable. The same holds when replacing > with <.

Clearly, inequalities involving definable functions are accepted when defining definable sets.

There are many possible proofs of this statement.

Proof

(1) Let $B= \{ (x,y) \in \mathbb {R}\times \mathbb {R} \;:\; f(x)=y \} \cap (\Omega \times (0,+\infty )$, which is definable thanks to axioms 1 and 3, and that the level sets of a definable function are also definable. Thus

$$\begin{aligned} \{ x \in \Omega \;:\; f(x)> 0 \} = \{ x \in \Omega \;:\; \exists y, f(x) = y, y > 0 \} = \varPi _{n+1,n}(B) ~, \end{aligned}$$

and we conclude using again axiom 4. $\square $

Yet another (simpler) proof.

Proof

(2) It is sufficient to remark that $ \{ x \in \Omega \;:\; f(x) > 0 \} $ is the projection of the set $ \{ (x,t) \in \Omega \times \mathbb {R} \;:\; t^2f(x)-1 = 0 \} $, where the latter is definable owing to Lemma 5. $\square $

Lemma 7

(Derivative) Let $f: I \rightarrow \mathbb {R}$ be a definable differentiable function on an open interval I of $\mathbb {R}$. Then its derivative $f': I \rightarrow \mathbb {R}$ is also definable.

Proof

Let $g: (x,t) \in I \times \mathbb {R}\mapsto g(x,t) = f(x+t)-f(x)$. Note that g is definable function on $I \times \mathbb {R}$ by Lemma 5. We now write the graph of $f'$ as

$$\begin{aligned} {{\mathrm{gph}}}(f') \!=\! \{ (x,y) \!\in \!s I \times \mathbb {R} \;:\; \forall \varepsilon> 0, \exists \delta > 0, \forall t \in \mathbb {R}, | t | \!<\! \delta , | g(x,t) - yt | < \varepsilon |t| \} . \end{aligned}$$

Let $C= \{ (x,y,v,t,\varepsilon ,\delta ) \in I \times \mathbb {R}^5 \;:\; ((x,t),v) \in {{\mathrm{gph}}}(g) \} $, which is definable since g is definable and using axiom 3. Let

$$\begin{aligned} B = \{ (x,y,v,t,\varepsilon ,\delta ) \;:\; t^2< \delta ^2, (v-ty)^2 < \varepsilon ^2t^2 \} \cap C. \end{aligned}$$

The first part in B is semi-algebraic, hence definable thanks to axiom 2. Thus B is also definable using axiom 1. We can now write

$$\begin{aligned} {{\mathrm{gph}}}(f') = \mathbb {R}^3 {\setminus } \left( \varPi _{5,3}\left( \mathbb {R}^5 {\setminus } \varPi _{6,5}(B)\right) \right) \cap (I \times \mathbb {R}), \end{aligned}$$

where the projectors and completions translate the actions of the existential and universal quantifiers. Using again axioms 4 and 1, we conclude. $\square $

With such a result at hand, this proposition follows immediately.

Proposition 2

(Differential and Jacobian) Let $f=(f_1,\ldots ,f_p): \Omega \rightarrow \mathbb {R}^p$ be a differentiable function on an open subset $\Omega $ of $\mathbb {R}^n$. If f is definable, then so its differential mapping and its Jacobian. In particular, for each $i=1,\ldots ,n$ and $j=1,\ldots ,p$, the partial derivative $\partial f_i/\partial x_j: \Omega \rightarrow \mathbb {R}$ is definable.

We provide below some results concerning the subdifferential.

Proposition 3

(Subdifferential) Suppose that f is a finite-valued convex definable function. Then for any $x \in \mathbb {R}^n$, the subdifferential $\partial f(x)$ is definable.

Proof

For every $x \in \mathbb {R}^n$, the subdifferential $\partial f(x)$ reads

$$\begin{aligned} \partial f(x) = \{ \eta \in \mathbb {R}^n \;:\; f(x') \ge f(x) + \langle \eta ,\,x'-x\rangle \quad \forall x' \in \mathbb {R}^n \} . \end{aligned}$$

Let $K = \{ (\eta ,x') \in \mathbb {R}^n \times \mathbb {R}^n \;:\; f(x') < f(x) + \langle \eta ,\,x'-x\rangle \} $. Hence, $\partial f(x) = \mathbb {R}^n {\setminus } \varPi _{2n,n}(K)$. Since f is definable, the set K is also definable using Lemmas 5 and 6, whence definability of $\partial f(x)$ follows using axiom 4. $\square $

Lemma 8

Suppose that f is a finite-valued convex definable function. Then, the set

$$\begin{aligned} \{ (x,\eta ) \;:\; \eta \in {{\mathrm{ri}}}\partial f(x) \} \end{aligned}$$

is definable.

Proof

Denote $C = \{ (x,\eta ) \;:\; \eta \in {{\mathrm{ri}}}\partial f(x) \} $. Using the characterization of the relative interior of a convex set (Rockafellar 1996, Theorem 6.4), we rewrite C in the more convenient form

$$\begin{aligned} C = \{(x,\eta ) \, : \,&\forall u \in \mathbb {R}^n, \forall z \in \mathbb {R}^n, f(z) - f(x) \ge \langle u,\,z-x\rangle ,\\&\exists t > 1, \forall x' \in \mathbb {R}^n, f(x') - f(x) \ge \langle (1-t) u + t \eta ,\,x'-x\rangle \} . \end{aligned}$$

Let $D = \mathbb {R}^n \times \mathbb {R}^n \times \mathbb {R}^n \times \mathbb {R}^n \times (1,+\infty ) \times \mathbb {R}^n$ and K defined as

$$\begin{aligned} K= & {} \{ (x,\eta ,u,z,t,x') \!\in \! D \;:\; f(z) - f(x) \geqslant \langle u,\,z-x\rangle ), f(x') - f(x)\nonumber \\&\geqslant \langle (1-t) u + t \eta ,\,x'-x\rangle \} . \end{aligned}$$

Thus,

$$\begin{aligned} C = \mathbb {R}^{2n} {\setminus } \varPi _{3n,2n} \left( \mathbb {R}^{3n} {\setminus } \varPi _{4n,3n} \left( \varPi _{4n+1,4n} \left( \mathbb {R}^{4n} \times (1,+\infty ) {\setminus } \varPi _{5n+1,4n+1} (K) \right) \right) \right) , \end{aligned}$$

where the projectors and completions translate the actions of the existential and universal quantifiers. Using again axioms 4 and 1, we conclude. $\square $

About this article

Cite this article

Vaiter, S., Deledalle, C., Fadili, J. et al. The degrees of freedom of partly smooth regularizers. Ann Inst Stat Math 69, 791–832 (2017). https://doi.org/10.1007/s10463-016-0563-z

Download citation

Received: 22 July 2014
Revised: 09 February 2016
Published: 25 May 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10463-016-0563-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The degrees of freedom of partly smooth regularizers

Abstract

Access this article

Similar content being viewed by others

Quadratic Growth Conditions and Uniqueness of Optimal Solution to Lasso

Convergence and sparsity of Lasso and group Lasso in high-dimensional generalized linear models

Kurdyka–Łojasiewicz Exponent via Inf-projection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Basic properties of o-minimal structures

Lemma 5

Proof

Lemma 6

Proof

Proof

Lemma 7

Proof

Proposition 2

Proposition 3

Proof

Lemma 8

Proof

About this article

Cite this article

Keywords

Navigation

The degrees of freedom of partly smooth regularizers

Abstract

Access this article

Similar content being viewed by others

Quadratic Growth Conditions and Uniqueness of Optimal Solution to Lasso

Convergence and sparsity of Lasso and group Lasso in high-dimensional generalized linear models

Kurdyka–Łojasiewicz Exponent via Inf-projection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Basic properties of o-minimal structures

Basic properties of o-minimal structures

Lemma 5

Proof

Lemma 6

Proof

Proof

Lemma 7

Proof

Proposition 2

Proposition 3

Proof

Lemma 8

Proof

About this article

Cite this article

Share this article

Keywords

Search

Navigation