Skip to main content
Log in

The degrees of freedom of partly smooth regularizers

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We study regularized regression problems where the regularizer is a proper, lower-semicontinuous, convex and partly smooth function relative to a Riemannian submanifold. This encompasses several popular examples including the Lasso, the group Lasso, the max and nuclear norms, as well as their composition with linear operators (e.g., total variation or fused Lasso). Our main sensitivity analysis result shows that the predictor moves locally stably along the same active submanifold as the observations undergo small perturbations. This plays a pivotal role in getting a closed-form expression for the divergence of the predictor w.r.t. observations. We also show that, for many regularizers, including polyhedral ones or the analysis group Lasso, this divergence formula holds Lebesgue a.e. When the perturbation is random (with an appropriate continuous distribution), this allows us to derive an unbiased estimator of the degrees of freedom and the prediction risk. Our results unify and go beyond those already known in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Strictly speaking, the minimization may have to be over a convex subset of \(\mathbb {R}^p\).

  2. The meaning of sensitivity is different here from what is usually intended in statistical sensitivity and uncertainty analysis.

  3. We write the same symbol as for the derivative, and rigorously speaking, this has to be understood to hold Lebesgue a.e.

  4. To be understood here as a set-valued mapping.

  5. Obviously, Lemma 2(ii) holds in such a case at the unique minimizer \({\widehat{\beta }}(y)\).

References

  • Absil, P. A., Mahony, R., Trumpf, J. (2013). An extrinsic look at the riemannian hessian. Geometric science. of information (Vol. 8085, pp. 361–368)., Lecture notes in computer science. Berlin: Springer.

  • Bach, F. (2008). Consistency of the group lasso and multiple kernel learning. Journal of Machine Learning Research, 9, 1179–1225.

    MathSciNet  MATH  Google Scholar 

  • Bach, F. (2010). Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4, 384–414.

    Article  MathSciNet  MATH  Google Scholar 

  • Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Thesis (Ph.D.)–Australian National University.

  • Bickel, P. J., Ritov, Y., Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics, 37(4), 1705–1732.

  • Bolte, J., Daniilidis, A., Lewis, A. S. (2011). Generic optimality conditions for semialgebraic convex programs. Mathematics of Operations Research, 36(1), 55–70.

  • Bonnans, J., Shapiro, A. (2000). Perturbation analysis of optimization problems., Springer Series in Operations Research. New York: Springer.

  • Brown, L. D. (1986). Fundamentals of statistical exponential families with applications in statistical decision theory, monograph series. Institute of Mathematical Statistics lecture notes (Vol. 9). Hayward: IMS.

    Google Scholar 

  • Bühlmann, P., van de Geer, S. (2011). Statistics for high-dimensional data: Methods. Theory and Applications., Springer Series in Statistics. Berlin: Springer.

  • Bunea, F. (2008). Honest variable selection in linear and logistic regression models via \(\ell _1\) and \(\ell _1+\ell _2\) penalization. Electronic Journal of Statistics, 2, 1153–1194.

    Article  MathSciNet  MATH  Google Scholar 

  • Candès, E., Plan, Y. (2009). Near-ideal model selection by \(\ell _1\) minimization. Annals of Statistics, 37(5A), 2145–2177.

  • Candès, E. J., Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6), 717–772.

  • Candès, E.J., Li, X., Ma, Y., Wright, J. (2011). Robust principal component analysis? Journal of the ACM 58(3):11:1–11:37.

  • Candès, E. J., Sing-Long, C. A., Trzasko, J. D. (2012). Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Transactions on Signal Processing, 61(19), 4643–4657.

  • Candès, E. J., Strohmer, T., Voroninski, V. (2013). Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8), 1241–1274.

  • Chavel, I. (2006). Riemannian geometry: a modern introduction. Cambridge studies in advanced mathematics (2nd ed., Vol. 98). New York: Cambridge University Press.

  • Chen, S., Donoho, D., Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.

  • Chen, X., Lin, Q., Kim, S., Carbonell, J. G., Xing, E. P. (2010). An efficient proximal-gradient method for general structured sparse learning. arXiv:1005.4717.

  • Combettes, P., Pesquet, J. (2007). A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing, 1(4), 564–574.

  • Coste, M. (1999). An introduction to o-minimal geometry. Technical report, Institut de Recherche Mathematiques de Rennes.

  • Coste, M. (2002). An introduction to semialgebraic geometry. Technical report, Institut de Recherche Mathematiques de Rennes.

  • Daniilidis, A., Hare, W., Malick, J. (2009). Geometrical interpretation of the predictor–corrector type algorithms in structured optimization problems. Optimization: A Journal of Mathematical Programming & Operations Research 55(5–6), 482–503.

  • Daniilidis, A., Drusvyatskiy, D., Lewis, A. S. (2013). Orthogonal invariance and identifiability. Technical report, Vol. 1304, p. 1198.

  • DasGupta, A. (2008). Asymptotic theory of statistics and probability. Berlin: Springer.

    MATH  Google Scholar 

  • Deledalle, C. A., Vaiter, S., Peyré, G., Fadili, M., Dossal, C. (2012). Risk estimation for matrix recovery with spectral regularization. In: ICML’12 workshop on sparsity, dictionaries and projections in machine learning and signal processing. arXiv:1205.1482.

  • Deledalle, C. A., Vaiter, S., Peyré, G., Fadili, J. M. (2014). Stein unbiased gradient estimator of the risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences, 7(4), 2448–2487.

  • Donoho, D. (2006). For most large underdetermined systems of linear equations the minimal \(\ell ^1\)-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6), 797–829.

    Article  MathSciNet  MATH  Google Scholar 

  • Dossal, C., Kachour, M., Fadili, M. J., Peyré, G., Chesneau, C. (2013). The degrees of freedom of penalized \(\ell _1\) minimization. Statistica Sinica, 23(2), 809–828.

  • Drusvyatskiy, D., Lewis, A. (2011). Generic nondegeneracy in convex optimization. Proceedings of the American Mathematical Society, 129, 2519–2527.

  • Drusvyatskiy, D., Ioffe, A., Lewis, A. (2015). Generic minimizing behavior in semi-algebraic optimizatio. SIAM Journal on Optimization (To appear).

  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81(394), 461–470.

    Article  MathSciNet  MATH  Google Scholar 

  • Eldar, Y. C. (2009). Generalized SURE for exponential families: Applications to regularization. IEEE Transactions on Signal Processing, 57(2), 471–481.

    Article  MathSciNet  Google Scholar 

  • Evans, L. C., Gariepy, R. F. (1992). Measure theory and fine properties of functions. Studies in advanced mathematics. Boca Raton: CRC Press.

  • Fazel, M., Hindi, H., Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. Proceedings of the American Control Conference IEEE, 6, 4734–4739.

  • van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Annals of Statistics, 36, 614–645.

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen, N. R., Sokol, A. (2014). Degrees of freedom for nonlinear least squares estimation. Technical report, Vol. 1402, p. 2997.

  • Hudson, H. (1978). A natural identity for exponential families with applications in multiparameter estimation. Annals of Statistics, 6(3), 473–484.

    Article  MathSciNet  MATH  Google Scholar 

  • Hwang, J. T. (1982). Improving upon standard estimators in discrete exponential families with applications to poisson and negative binomial cases. Annals of Statistics, 10(3), 857–867.

    Article  MathSciNet  MATH  Google Scholar 

  • Jacob, L., Obozinski, G., Vert, J. P. (2009). Group lasso with overlap and graph lasso. In: Danyluk, A. P., Bottou, L., Littman, M. L. (eds.) ICML’09, Vol. 382, p. 55.

  • Jégou, H., Furon, T., Fuchs, J. J. (2012). Anti-sparse coding for approximate nearest neighbor search. In: IEEE ICASSP, pp. 2029–2032.

  • Kakade, S., Shamir, O., Sindharan, K., Tewari, A. (2010). Learning exponential families in high-dimensions: Strong convexity and sparsity. In: Teh, Y. W., Titterington, D. M. (eds.) Proceedings of the thirteenth international conference on artificial intelligence and statistics (AISTATS-10), Vol. 9, pp. 381–388.

  • Kato, K. (2009). On the degrees of freedom in shrinkage estimation. Journal of Multivariate Analysis, 100(7), 1338–1352.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, J. M. (2003). Introduction to smooth manifolds. Graduate texts in mathematics. New York: Springer.

    Book  Google Scholar 

  • Lemaréchal, C., Hiriart-Urruty, J. (1996). Convex analysis and minimization algorithms: Fundamentals (Vol. 305). Berlin: Springer.

  • Lemaréchal, C., Oustry, F., Sagastizábal, C. (2000). The \(\cal U\)-lagrangian of a convex function. Transactions of the American mathematical Society, 352(2), 711–729.

  • Lewis, A. (1995). The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis, 2, 173–183.

    MathSciNet  MATH  Google Scholar 

  • Lewis, A., Sendov, H. (2001). Twice differentiable spectral functions. SIAM Journal on Matrix Analysis on Matrix Analysis and Applications, 23, 368–386.

  • Lewis, A. S. (2003a). Active sets, nonsmoothness, and sensitivity. SIAM Journal on Optimization, 13(3), 702–725.

    Article  MathSciNet  MATH  Google Scholar 

  • Lewis, A. S. (2003b). The mathematics of eigenvalue optimization. Mathematical Programming, 97(1–2), 155–176.

    Article  MathSciNet  MATH  Google Scholar 

  • Lewis, A. S., Zhang, S. (2013). Partial smoothness, tilt stability, and generalized hessians. SIAM Journal on Optimization, 23(1), 74–94.

  • Liang, J., Fadili, M. J., Peyré, G., Luke, R. (2014). Activity Identification and local linear convergence of Douglas–Rachford/ADMM under partial smoothness. arXiv:1412.6858.

  • Liu, H., Zhang, J. (2009). Estimation consistency of the group lasso and its applications. Journal of Machine Learning Research, 5, 376–383.

  • Lyubarskii, Y., Vershynin, R. (2010). Uncertainty principles and vector quantization. IEEE Transactions on Information Theory, 56(7), 3491–3501.

  • McCullagh, P., Nelder, J. A. (1989). Generalized Linear Models (2nd edn). Monographs on Statistics & Applied Probability. Boca Raton: Chapman & Hall/CRC.

  • Meier, L., Geer, S. V. D., Buhlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 51–71.

  • Meinshausen, N., Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.

  • Meyer, M., Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Annals of Statistics, 28(4), 1083–1104.

  • Miller, S. A., Malick, J. (2005). Newton methods for nonsmooth convex minimization: connections among-lagrangian, riemannian newton and sqp methods. Mathematical Programming, 104(2–3), 609–633.

  • Mordukhovich, B. (1992). Sensitivity analysis in nonsmooth optimization. In: Field, D. A. & Komkov, V. (eds.) Theoretical aspects of industrial design. SIAM volumes in applied mathematics (Vol. 58), Philadelphia, pp 32–46.

  • Negahban, S., Ravikumar, P., Wainwright, M. J., Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statistical Science, 27(4), 538–557.

  • Osborne, M., Presnell, B., Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20(3), 389–403.

  • Peyré, G., Fadili, J., Chesneau, C. (2011). Adaptive structured block sparsity via dyadic partitioning. In: EUSIPCO, Barcelona, Spain.

  • Ramani, S., Blu, T., Unser, M. (2008). Monte-Carlo SURE: A black-box optimization of regularization parameters for general denoising algorithms. IEEE Transactions on Image Processing, 17(9), 1540–1554.

  • Recht, B., Fazel, M., Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.

  • Rockafellar, R. T. (1996). Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton: Princeton University Press.

    Google Scholar 

  • Rudin, L., Osher, S., Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1–4), 259–268.

  • Saad, Y., Schultz, M. H. (1986). Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7(3), 856–869.

  • Solo, V., Ulfarsson, M. (2010). Threshold selection for group sparsity. In: IEEE ICASSP, pp. 3754–3757.

  • Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9(6), 1135–1151.

    Article  MathSciNet  MATH  Google Scholar 

  • Studer, C., Yin, W., Baraniuk, R. G. (2012). Signal representations with minimum \(\ell _\infty \)-norm. In: 50th annual Allerton conference on communication, control, and computing.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B Methodological, 58(1), 267–288.

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K. (2005). Sparsity and smoothness via the fused Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1), 91–108.

  • Tibshirani, R. J., Taylor, J. (2012). Degrees of freedom in Lasso problems. Annals of Statistics, 40(2), 639–1284.

  • Tikhonov, A. N., Arsenin, V. Y. (1977). Solutions of Ill-posed problems. New York: Halsted Press.

  • Vaiter, S., Deledalle, C., Peyré, G., Fadili, M. J., Dossal, C. (2012). Degrees of freedom of the group Lasso. In: ICML’12 Workshops, pp. 89–92.

  • Vaiter, S., Deledalle, C., Peyré, G., Dossal, C., Fadili, M. J. (2013). Local behavior of sparse analysis regularization: Applications to risk estimation. Applied and Computational Harmonic Analysis, 35(3), 433–451.

  • Vaiter, S., Peyré, G., Fadili, M. J. (2014). Model consistency of partly smooth regularizers. arXiv:1405.1004.

  • Vaiter, S., Golbabaee, M., Fadili, M. J., Peyré, G. (2015). Model selection with low complexity priors. Information and Inference: A Journal of the IMA (IMAIAI).

  • van den Dries, L. (1998). Tame topology and o-minimal structures. Mathematrical society lecture notes (Vol. 248). New York: Cambridge Univiversity Press.

    Book  Google Scholar 

  • van den Dries, L., Miller, C. (1996). Geometric categories and o-minimal structures. Duke Mathematical Journal, 84, 497–540.

  • Vonesch, C., Ramani, S., Unser, M. (2008). Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint. In: ICIP, IEEE, pp. 665–668.

  • Wei, F., Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli, 16(4), 1369–1384.

  • Wright, S. J. (1993). Identifiable surfaces in constrained optimization. SIAM Journal on Control and Optimization, 31(4), 1063–1079.

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.

  • Yuan, M., Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94(1), 19–35.

  • Zou, H., Hastie, T., Tibshirani, R. (2007). On the “degrees of freedom” of the Lasso. Annals of Statistics, 35(5), 2173–2192.

Download references

Acknowledgments

This work has been supported by the European Research Council (ERC project SIGMA-Vision) and Institut Universitaire de France.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jalal Fadili.

Basic properties of o-minimal structures

Basic properties of o-minimal structures

In the following results, we collect some important stability properties of o-minimal structures. To be self-contained, we also provide proofs. To the best of our knowledge, these proofs, although simple, are not reported in the literature or some of them are left as exercises in the authoritative references (van den Dries 1998; Coste 1999). Moreover, in most proofs, to show that a subset is definable, we could just write the appropriate first-order formula (see Coste 1999, Page 12; van den Dries 1998, Section Ch1.1.2), and conclude using (Coste 1999, Theorem 1.13). Here, for the sake of clarity and to avoid cryptic statements for the non-specialist, we will translate the first-order formula into operations on the involved subsets, in particular projections, and invoke the above stability axioms of o-minimal structures. In the following, n denotes an arbitrary (finite) dimension which is not necessarily the number of observations used previously the paper.

Lemma 5

(Addition and multiplication) Let \(f : \Omega \subset \mathbb {R}^n \rightarrow \mathbb {R}^p\) and \(g : \Omega \subset \mathbb {R}^n \subset \mathbb {R}^p\) be definable functions. Then their pointwise addition and multiplication is also definable.

Proof

Let \(h=f+g\), and

$$\begin{aligned} B=(\Omega \times \mathbb {R}\times \Omega \times \mathbb {R}\times \Omega \times \mathbb {R}) \cap (\Omega \times \mathbb {R}\times {{\mathrm{gph}}}(f) \times {{\mathrm{gph}}}(h)) \cap S, \end{aligned}$$

where \(S= \{ (x,u,y,v,z,w) \;:\; x=y=z, u=v+w \} \) is obviously an algebraic (in fact linear) subset, hence definable by axiom 2. Axiom 1 and 2 then imply that B is also definable. Let \(\varPi _{3n+3p,n+p}: \mathbb {R}^{3n+3p} \rightarrow \mathbb {R}^{n+p}\) be the projection on the first \(n+p\) coordinates. We then have

$$\begin{aligned} {{\mathrm{gph}}}(h) = \varPi _{3n+3p,n+p}(B) \end{aligned}$$

whence we deduce that h is definable by applying \(3n+3p\) times axiom 4. Definability of the pointwise multiplication follows the same proof taking \(u=v \cdot w\) in S. \(\square \)

Lemma 6

(Inequalities in definable sets) Let \(f : \Omega \subset \mathbb {R}^n \rightarrow \mathbb {R}\) be a definable function. Then \( \{ x \in \Omega \;:\; f(x) > 0 \} \), is definable. The same holds when replacing > with <.

Clearly, inequalities involving definable functions are accepted when defining definable sets.

There are many possible proofs of this statement.

Proof

(1) Let \(B= \{ (x,y) \in \mathbb {R}\times \mathbb {R} \;:\; f(x)=y \} \cap (\Omega \times (0,+\infty )\), which is definable thanks to axioms 1 and 3, and that the level sets of a definable function are also definable. Thus

$$\begin{aligned} \{ x \in \Omega \;:\; f(x)> 0 \} = \{ x \in \Omega \;:\; \exists y, f(x) = y, y > 0 \} = \varPi _{n+1,n}(B) ~, \end{aligned}$$

and we conclude using again axiom 4. \(\square \)

Yet another (simpler) proof.

Proof

(2) It is sufficient to remark that \( \{ x \in \Omega \;:\; f(x) > 0 \} \) is the projection of the set \( \{ (x,t) \in \Omega \times \mathbb {R} \;:\; t^2f(x)-1 = 0 \} \), where the latter is definable owing to Lemma 5. \(\square \)

Lemma 7

(Derivative) Let \(f: I \rightarrow \mathbb {R}\) be a definable differentiable function on an open interval I of \(\mathbb {R}\). Then its derivative \(f': I \rightarrow \mathbb {R}\) is also definable.

Proof

Let \(g: (x,t) \in I \times \mathbb {R}\mapsto g(x,t) = f(x+t)-f(x)\). Note that g is definable function on \(I \times \mathbb {R}\) by Lemma 5. We now write the graph of \(f'\) as

$$\begin{aligned} {{\mathrm{gph}}}(f') \!=\! \{ (x,y) \!\in \!s I \times \mathbb {R} \;:\; \forall \varepsilon> 0, \exists \delta > 0, \forall t \in \mathbb {R}, | t | \!<\! \delta , | g(x,t) - yt | < \varepsilon |t| \} . \end{aligned}$$

Let \(C= \{ (x,y,v,t,\varepsilon ,\delta ) \in I \times \mathbb {R}^5 \;:\; ((x,t),v) \in {{\mathrm{gph}}}(g) \} \), which is definable since g is definable and using axiom 3. Let

$$\begin{aligned} B = \{ (x,y,v,t,\varepsilon ,\delta ) \;:\; t^2< \delta ^2, (v-ty)^2 < \varepsilon ^2t^2 \} \cap C. \end{aligned}$$

The first part in B is semi-algebraic, hence definable thanks to axiom 2. Thus B is also definable using axiom 1. We can now write

$$\begin{aligned} {{\mathrm{gph}}}(f') = \mathbb {R}^3 {\setminus } \left( \varPi _{5,3}\left( \mathbb {R}^5 {\setminus } \varPi _{6,5}(B)\right) \right) \cap (I \times \mathbb {R}), \end{aligned}$$

where the projectors and completions translate the actions of the existential and universal quantifiers. Using again axioms 4 and 1, we conclude. \(\square \)

With such a result at hand, this proposition follows immediately.

Proposition 2

(Differential and Jacobian) Let \(f=(f_1,\ldots ,f_p): \Omega \rightarrow \mathbb {R}^p\) be a differentiable function on an open subset \(\Omega \) of \(\mathbb {R}^n\). If f is definable, then so its differential mapping and its Jacobian. In particular, for each \(i=1,\ldots ,n\) and \(j=1,\ldots ,p\), the partial derivative \(\partial f_i/\partial x_j: \Omega \rightarrow \mathbb {R}\) is definable.

We provide below some results concerning the subdifferential.

Proposition 3

(Subdifferential) Suppose that f is a finite-valued convex definable function. Then for any \(x \in \mathbb {R}^n\), the subdifferential \(\partial f(x)\) is definable.

Proof

For every \(x \in \mathbb {R}^n\), the subdifferential \(\partial f(x)\) reads

$$\begin{aligned} \partial f(x) = \{ \eta \in \mathbb {R}^n \;:\; f(x') \ge f(x) + \langle \eta ,\,x'-x\rangle \quad \forall x' \in \mathbb {R}^n \} . \end{aligned}$$

Let \(K = \{ (\eta ,x') \in \mathbb {R}^n \times \mathbb {R}^n \;:\; f(x') < f(x) + \langle \eta ,\,x'-x\rangle \} \). Hence, \(\partial f(x) = \mathbb {R}^n {\setminus } \varPi _{2n,n}(K)\). Since f is definable, the set K is also definable using Lemmas 5 and 6, whence definability of \(\partial f(x)\) follows using axiom 4. \(\square \)

Lemma 8

Suppose that f is a finite-valued convex definable function. Then, the set

$$\begin{aligned} \{ (x,\eta ) \;:\; \eta \in {{\mathrm{ri}}}\partial f(x) \} \end{aligned}$$

is definable.

Proof

Denote \(C = \{ (x,\eta ) \;:\; \eta \in {{\mathrm{ri}}}\partial f(x) \} \). Using the characterization of the relative interior of a convex set (Rockafellar 1996, Theorem 6.4), we rewrite C in the more convenient form

$$\begin{aligned} C = \{(x,\eta ) \, : \,&\forall u \in \mathbb {R}^n, \forall z \in \mathbb {R}^n, f(z) - f(x) \ge \langle u,\,z-x\rangle ,\\&\exists t > 1, \forall x' \in \mathbb {R}^n, f(x') - f(x) \ge \langle (1-t) u + t \eta ,\,x'-x\rangle \} . \end{aligned}$$

Let \(D = \mathbb {R}^n \times \mathbb {R}^n \times \mathbb {R}^n \times \mathbb {R}^n \times (1,+\infty ) \times \mathbb {R}^n\) and K defined as

$$\begin{aligned} K= & {} \{ (x,\eta ,u,z,t,x') \!\in \! D \;:\; f(z) - f(x) \geqslant \langle u,\,z-x\rangle ), f(x') - f(x)\nonumber \\&\geqslant \langle (1-t) u + t \eta ,\,x'-x\rangle \} . \end{aligned}$$

Thus,

$$\begin{aligned} C = \mathbb {R}^{2n} {\setminus } \varPi _{3n,2n} \left( \mathbb {R}^{3n} {\setminus } \varPi _{4n,3n} \left( \varPi _{4n+1,4n} \left( \mathbb {R}^{4n} \times (1,+\infty ) {\setminus } \varPi _{5n+1,4n+1} (K) \right) \right) \right) , \end{aligned}$$

where the projectors and completions translate the actions of the existential and universal quantifiers. Using again axioms 4 and 1, we conclude. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vaiter, S., Deledalle, C., Fadili, J. et al. The degrees of freedom of partly smooth regularizers. Ann Inst Stat Math 69, 791–832 (2017). https://doi.org/10.1007/s10463-016-0563-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-016-0563-z

Keywords

Navigation