Information criteria bias correction for group selection

Marquis, Bastien; Jansen, Maarten

doi:10.1007/s00362-021-01283-8

Information criteria bias correction for group selection

Regular Article
Published: 22 January 2022

Volume 63, pages 1387–1414, (2022)
Cite this article

Statistical Papers Aims and scope Submit manuscript

198 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The main contribution of this paper lies in the extension towards group lasso of a Mallows’ Cp-like information criterion used in finetuning the lasso selection in a high-dimensional, sparse regression model. The optimisation of an information criterion paired with an $\ell _1$-norm regularisation method of the lasso leads to an overestimation of the model size. This is because the shrinkage following from the $\ell _1$ regularisation is too permissive towards false positives, since shrinkage reduces the effects of false positives. The problem does not arise with $\ell _0$-norm regularisation but this is a combinatorial problem, which is computationally unfeasible in the high-dimensional setting. The strategy adopted in this paper is to select the non-zero variables with $\ell _1$ method and estimate their values with the $\ell _0$, meaning that lasso is used for selection, followed by an orthogonal projection, i.e., debiasing after selection. This approach necessitates the information criterion to be adapted, in particular, by including what is called a “mirror correction”, leading to smaller models. A second contribution of the paper is situated at the methodological level, more precisely in the development of the corrected information criterion using random hard thresholds as a model for the selection process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A group VISA algorithm for variable selection

Article 27 August 2014

The linearized alternating direction method of multipliers for sparse group LAD model

Article 30 August 2017

A flexible shrinkage operator for fussy grouped variable selection

Article 05 July 2016

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B, Csáki F (eds) Second international symposium on information theory. Akadémiai Kiadó, Budapest, pp 267–281
Google Scholar
Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547
Article MathSciNet Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
MathSciNet MATH Google Scholar
Claeskens G, Hjort NL (2008) Model selection and model averaging, 1st edn. Cambridge University Press, Cambridge
MATH Google Scholar
Das D, Chatterjee A, Lahiri SN (2020) Higher order refinements by bootstrap in lasso and other penalized regression methods. Tech. Rep. Indian Institute of Technology/Indian Statistical Institute/Washington University in St. Louis, Kanpur/Delhi/St. Louis. arXiv: 1909.06649
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun Pure Appl Math 57:1413–1457
Article MathSciNet Google Scholar
Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627
Article MathSciNet Google Scholar
Donoho DL (2006) For most large underdetermined systems of linear equations the minimal $\ell _1$-norm solution is also the sparsest solution. Commun Pure Appl Math 59:797–829
Article Google Scholar
Donoho DL, Johnstone IM (1995) Adapting to unknown smoothness via wavelet shrinkage. J Am Stat Assoc 90(432):1200–1224
Article MathSciNet Google Scholar
Efron B, Hastie TJ, Johnstone IM, Tibshirani RJ (2004) Least angle regression. Ann Stat 32(2):407–499 (With discussion)
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Foygel Barber R, Candès E (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085
MathSciNet MATH Google Scholar
Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Article Google Scholar
Fu W (1998) Penalized regressions: the bridge vs the lasso. J Comput Graph Stat 7(3):397–416
Google Scholar
Huang J, Breheny P, Ma S (2012) A selective review of group selection in high-dimensional models. Stat Sci 27(4):481–499
Article MathSciNet Google Scholar
Jansen M (2014) Information criteria for variable selection under sparsity. Biometrika 101(1):37–55
Article MathSciNet Google Scholar
Jansen M (2015) Generalized cross validation in variable selection with and without shrinkage. J Stat Plan Inference 159:90–104
Article MathSciNet Google Scholar
Javanmard A, Montanari A (2018) Debiasing the lasso: optimal sample size for Gaussian designs. Ann Stat 46(6A):2593–2622
Article MathSciNet Google Scholar
Leadbetter MR, Lindgren G, Rootzén H (1983) Extremes and related properties of random sequences and processes. Springer series in statistics. Springer, New York
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Mallows C (1973) Some comments on ${C}_p$. Technometrics 15:661–675
MATH Google Scholar
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462
Article MathSciNet Google Scholar
Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate distribution. In: Third Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 197–206
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B 67(1):91–108
Article MathSciNet Google Scholar
Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288
MathSciNet MATH Google Scholar
Tibshirani RJ, Taylor JE (2012) Degrees of freedom in lasso problems. Ann Stat 40(2):1198–1232
Article MathSciNet Google Scholar
Wainwright MJ (2009) Sharp thresholds for noisy and high-dimensional recovery of sparsity using $\ell _1$-constrained quadratic programming (lasso). IEEE Trans Inf Theory 55(5):2183–2202
Article Google Scholar
Wang H, Leng C (2008) A note on adaptive group lasso. Comput Stat Data Anal 52(12):5277–5286
Article MathSciNet Google Scholar
Yang Y (2005) Can the strengths of AIC and BIC be shared? Biometrika 92:937–950
Article MathSciNet Google Scholar
Ye J (1998) On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc 93:120–131
Article MathSciNet Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67
Article MathSciNet Google Scholar
Zhang C (2010) Nearly unbiased variable selection under the minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet Google Scholar
Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37:3468–3497
Article MathSciNet Google Scholar
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563
MathSciNet MATH Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar
Zou H, Hastie TJ, Tibshirani RJ (2007) On the degrees of freedom of the lasso. Ann Stat 35(5):2173–2192
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Université libre de Bruxelles, Boulevard du Triomphe, 1050, Bruxelles, Belgium
Bastien Marquis & Maarten Jansen

Authors

Bastien Marquis
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Jansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maarten Jansen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Proposition 1

Proof

The key point in the proof is to realise that the largest contributions to the approximation error come from the values in $\mu $ away from zero. Using the assumption of asymptotic sparsity, these contributions become less and less important.

Defining

$$\begin{aligned} {\bar{h}}(t; {\varvec{\mu }}) = \frac{1}{n} \sum ^n_{i = 1} h(t; \mu _i), \end{aligned}$$

we can write from (9),

$$\begin{aligned} m_k - {\widetilde{m}}_k= & {} E\left[ {\bar{h}}(T_{k,i}; {\varvec{\mu }}) - {\bar{h}}(T_{k,i}; {\varvec{0}})\right] = \frac{1}{n} \sum ^n_{i = 1} E\left[ h(T_{k,i}; \mu _i) - h(T_{k,i}; 0)\right] \\= & {} \frac{1}{n} \sum ^n_{i = 1} E\left[ 2 G_{\varepsilon }(T_{k,i}) - G_{\varepsilon }(T_{k,i} - \mu _i) - G_{\varepsilon }(T_{k,i} + \mu _i)\right] . \end{aligned}$$

The value of ${\bar{h}}(t; {\varvec{\mu }})$ is depicted as a function of t in Fig. 2. The individual contributions $h(t; \mu _i)$ for a typical sparse signal are plotted in Fig. 3.

We construct an upper bound for $h(t; \mu )$, consisting of three parts, depending on the value of $\mu $. First, we have a general upper bound

$$\begin{aligned} |h(t; \mu ) - h(t; 0)|\le & {} \max _t |2 G_{\varepsilon }(t) - G_{\varepsilon }(t - \mu ) - G_{\varepsilon }(t + \mu )| \\\le & {} 4 \max _x |G_{\varepsilon }(x)| = 4 |G_{\varepsilon }(\sigma )|, \end{aligned}$$

as indeed, on the positive axis, $|G_{\varepsilon }(x)|$ is unimodal with global maximum in $x = \sigma $.

Remark 1

The upper bound is pessimistic, since $\lim _{t \rightarrow \infty } |G_{\varepsilon }(t)| = 0$, so for every $\eta > 0$, there exists a $t^{*}$, so that for $t > t^{*}$, we find $ 2 |G_{\varepsilon }(t) - G_{\varepsilon }(t + \mu _i)| \le |2 G_{\varepsilon }(t)| < 2 \eta , $ and so $ |2 G_{\varepsilon }(t) - G_{\varepsilon }(t-\mu _i) - G_{\varepsilon }(t+\mu _i)| < 2|G_{\varepsilon }(\sigma )| + 2 \eta . $

The second part of the upper bound is for small values of $\mu $, as illustrated in Fig. 4. Let M be a constant, a priori depending on t, so that for $|\mu | \le t - \sigma $, we have that

$$\begin{aligned} h(t; \mu ) - h(t; 0) \le \left[ M - h(t; 0)\right] \cdot \left[ \frac{\mu }{t - \sigma }\right] ^2. \end{aligned}$$

(21)

This construction is possible since $h'(t; 0) = 0$.

Remark 2

Obviously, one can take $[M - h(t; 0)] / (t - \sigma )^2$ to be equal to $\max _{x \in \mathrm{I\!R}} 2 |G_{\varepsilon }''(x)|$, but that choice would lead to a pessimistic upper bound when t grows larger. We will keep $2 \max _{x \in \mathrm{I\!R}} |G_{\varepsilon }''(x)|$ as an upper bound when $[M - h(t; 0)] / (t - \sigma )^2$ is replaced by the random version $[M - h(t; 0)] / (T_{k,i} - \sigma )^2$.

For $t \ge \sigma $ and $\mu \ge \tau $, we have that $-G_{\varepsilon }(t + \tau + \mu ) \le -G_{\varepsilon }(t - \tau + \mu )$, and so that $h(t + \tau ; \mu ) \le h(t; \mu - \tau )$, and thus

$$\begin{aligned} h(t + \tau ; \mu ) \le h(t; 0) + [M - h(t; 0)] \cdot \left[ \frac{\mu - \tau }{t - \sigma }\right] ^2. \end{aligned}$$

For t sufficiently large, $h(t; 0) - h(t + \tau ; 0)$ is small enough for any $\tau $, so that

$$\begin{aligned} \frac{h(t ;0) - h(t + \tau ; 0)}{M - h(t + \tau ; 0)} \le \left( \frac{\tau }{t + \tau - \sigma }\right) ^2. \end{aligned}$$

This is equivalent to

$$\begin{aligned} h(t; 0) \le h(t + \tau ; 0) + \left[ M - h(t + \tau ; 0)\right] \cdot \left[ \frac{\tau }{t + \tau - \sigma }\right] ^2. \end{aligned}$$

(22)

This implies that on $[\tau , t + \tau - \sigma ]$,

$$\begin{aligned} h(t; 0) + \left[ M - h(t + \tau ; 0)\right] \cdot \left[ \frac{\mu - \tau }{t - \sigma }\right] ^2\le & {} h(t + \tau ; 0) + \left[ M - h(t + \tau ; 0)\right] \nonumber \\&\times \left[ \frac{\mu }{t + \tau - \sigma }\right] ^2 \end{aligned}$$

(23)

as indeed both quadratic forms have the same value, M, at $\mu = t + \tau - \sigma $, while for $\mu = \tau $ this reduces to (22). The right hand side of (23) has the same form as the right hand side in (21). As a result, for t sufficiently large, the constant M in (21) does not depend on t. By choosing a value for M larger than $4 |G_{\varepsilon }(\sigma )|$, the upper bound in (21) holds for any $\mu $. Taking into account Remark 2, we can write

$$\begin{aligned} h(t ; \mu ) - h(t; 0) \le q(t)\mu ^2, \end{aligned}$$

where

$$\begin{aligned} q(t) = \min \left( 2 \max _{x \in \mathrm{I\!R}} |G_{\varepsilon }''(x)|, \frac{M - h(t; 0)}{(t - \sigma )^2}\right) . \end{aligned}$$

For large values of $\mu $, we however need a third and tighter upper bound. Because of the symmetry in $f_{\varepsilon }(x)$, we have for $\mu = 2 t$, that $h(2t; t) = -G_{\varepsilon }(3 t) - G_{\varepsilon }(-t) = -G_{\varepsilon }(3 t) + G_{\varepsilon }(t)$ and as 2t is far beyond the largest local maximum of $|h(\mu ; t)|$ as a function of $\mu $, it holds for $\mu > 2 t$ that $h(\mu ; t) > h(2 t; t)$, and so

$$\begin{aligned} |h(\mu ; t) - h(0; t)|= & {} h(0; t) - h(\mu ; t)< h(0; t) - h(2 t; t) \\= & {} |3 G_{\varepsilon }(t) - G_{\varepsilon }(3 t)| < 3 |G_{\varepsilon }(t)|. \end{aligned}$$

The three parts of the analysis allow us to conclude that the approximation error of ${\widetilde{m}}_k$ is bounded by

$$\begin{aligned} |m_k - {\widetilde{m}}_k|\le & {} \frac{1}{n} \sum ^n_{i = 1} E\left| h(T_{k,i}; \mu _i) - h(T_{k,i}; 0)\right| \nonumber \\\le & {} \frac{1}{n} \sum ^n_{i = 1} P(T_{k,i}> |\mu _i|/2) E\left( q(T_{k,i}) | T_{k,i} > |\mu _i|/2\right) \mu ^2_i \nonumber \\&+ P(T_{k,i} \le |\mu _i|/2) 3 E\left( |G_{\varepsilon }(T_{k,i})|~| T_{k,i} \le |\mu _i|/2\right) . \end{aligned}$$

(24)

Moreover, it is easy to find a constant K so that $q(t) \le K/t$, for all values of t, and also, because q(t) is a monotonously non-increasing function, we have

$$\begin{aligned} E\left( q(T_{k,i}) | T_{k,i} > |\mu _i|/2\right) \le E\left( q(T_{k,i})\right) \le K E\left( 1/T_{k,i}\right) \rightarrow 0 \text{ when } n \rightarrow \infty . \end{aligned}$$

Combining this with Assumption (11), we find for the first sum in (24),

$$\begin{aligned} \frac{1}{n} \sum ^n_{i = 1} P(T_{k,i}> |\mu _i|/2) E\left( q(T_{k,i}) | T_{k,i} > |\mu _i|/2\right) \mu ^2_i = o\left[ \mathrm {PE}(\widehat{{\varvec{\mu }}}_k)\right] . \end{aligned}$$

For the second sum in (24), we see that if $P(T_{k,i} \le |\mu _i|/2)$ does not tend to zero, then by Markov’s inequality, we have

$$\begin{aligned} P\left( \frac{1}{T_{k,i}}> \frac{2}{|\mu _i|}\right) \le E\left( \frac{1}{T_{k,i}}\right) \frac{|\mu _i|}{2} \Rightarrow |\mu _i| \ge \frac{2P\left( \frac{1}{T_{k,i}} > \frac{2}{|\mu _i|}\right) }{E\left( \frac{1}{T_{k,i}}\right) } \rightarrow \infty . \end{aligned}$$

With $|\mu _i| \rightarrow \infty $ and $E\left( 1/T_{k,i}\right) \rightarrow 0$, the value of $E\left( |G_{\varepsilon }(T_{k,i})|~|~T_{k,i} \le |\mu _i|/2\right) $ then tends to $E\left( |G_{\varepsilon }(T_{k,i})|\right) $ which in turn tends to zero since, under Assumption (A2), there exists a constant L so that $|G_{\varepsilon }(t)| \le L/t$. As a result, we have

$$\begin{aligned} \frac{1}{n} \sum ^n_{i = 1} P(T_{k,i} \le |\mu _i|/2) 3 E\left( |G_{\varepsilon }(T_{k,i})| | T_{k,i} \le |\mu _i|/2\right) = o\left[ \mathrm {PE}(\widehat{{\varvec{\mu }}}_k)\right] , \end{aligned}$$

thereby completing the proof. $\square $

Appendix B: Development of calculations for Eqs. 17 and 18

With $\varGamma $ the gamma function and $F_{\chi ^2_w}$ and $f_{\chi ^2_w}$ the cumulative distribution function and density of the $\chi ^2_w$ distribution, we find the following result for Eq. (17):

$$\begin{aligned} {\widetilde{m}}_l= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 (1 - u w^{-1})f_{\chi ^2_w}(u)du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 (1 - u w^{-1}) \frac{1}{2^{\frac{w}{2}} \varGamma (\frac{w}{2})} u^{\frac{w}{2} - 1} e^{-\frac{u}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 \frac{1}{2^{\frac{w}{2}}\varGamma (\frac{w}{2})} u^{\frac{w}{2} - 1} e^{-\frac{u}{2}} du - \frac{1}{w} \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 u \frac{1}{2^{\frac{w}{2}}\varGamma (\frac{w}{2})} u^{\frac{w}{2} - 1} e^{-\frac{u}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_w}(T_{l,j}^2\sigma ^{-2}) - \frac{2 \varGamma (\frac{w}{2} + 1)}{w \varGamma (\frac{w}{2})} \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 \frac{1}{2^{\frac{w}{2}+1}\varGamma (\frac{w}{2}+1)} u^{\frac{w}{2}} e^{-\frac{u}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_w}(T_{l,j}^2\sigma ^{-2}) - \frac{2 \varGamma (\frac{w}{2} + 1)}{w \varGamma (\frac{w}{2})} F_{\chi ^2_{w + 2}}(T_{l,j}^2\sigma ^{-2})\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_w}(T_{l,j}^2\sigma ^{-2}) - F_{\chi ^2_{w + 2}}(T_{l,j}^2\sigma ^{-2})\right) . \end{aligned}$$

Also, when the group size w is 1 (singletons), Eq. (17) reduces to Eq. (18):

$$\begin{aligned} {\widetilde{m}}_k= & {} {\widetilde{m}}_l = \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_1}(T_{l,j}^2\sigma ^{-2}) - F_{\chi ^2_3}(T_{l,j}^2\sigma ^{-2})\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{1}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}^2}{2\sigma ^2}}_0 u^{-\frac{1}{2}} e^{-u} du - \frac{1}{\varGamma (\frac{3}{2})} \int ^{\frac{T_{l,j}^2}{2\sigma ^2}}_0 u^{\frac{1}{2}} e^{-u} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{1}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \left( \frac{u^2}{2}\right) ^{-\frac{1}{2}} e^{-\frac{u^2}{2}} u du - \frac{2}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \left( \frac{u^2}{2}\right) ^{\frac{1}{2}} e^{-\frac{u^2}{2}} u du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{1}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \sqrt{2} e^{-\frac{u^2}{2}} du - \frac{2}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \frac{1}{\sqrt{2}} u^2 e^{-\frac{u^2}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{2}{\sqrt{2}\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 (1-u^2) e^{-\frac{u^2}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{2}{\sqrt{2}\varGamma (\frac{1}{2})} \frac{T_{l,j}}{\sigma } e^{-\frac{T_{l,j}^2}{2\sigma ^2}}\right) \\= & {} 2\sigma ^2 r^{-1} \sum ^r_{j = 1} E\left( T_{l,j}\phi _{\sigma }(T_{l,j})\right) = 2\sigma ^2 n^{-1} \sum ^n_{i = 1} E\left( T_{k,i}\phi _{\sigma }(T_{k,i})\right) , \end{aligned}$$

as $e^{-\frac{u^2}{2}} - u^2 e^{-\frac{u^2}{2}}$ is the derivative of $u e^{-\frac{u^2}{2}}$ and $\varGamma (\frac{1}{2}) = \sqrt{\pi }$, $\phi _{\sigma }$ being the density of a zero-mean normal random variable with variance $\sigma ^2$.

Appendix C: Illustration for image denoising

In its first column, Fig. 5 presents a sample of 5 noise-free images, then the noisy ones in the second column. Finally the denoised images using Mallows’ Cp and the mirror-corrected Cp in unstructured and group selections are shown in the third to fourth, and fifth to sixth columns respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marquis, B., Jansen, M. Information criteria bias correction for group selection. Stat Papers 63, 1387–1414 (2022). https://doi.org/10.1007/s00362-021-01283-8

Download citation

Received: 14 January 2021
Revised: 02 December 2021
Accepted: 07 December 2021
Published: 22 January 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00362-021-01283-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information criteria bias correction for group selection

Abstract

Access this article

Similar content being viewed by others

A group VISA algorithm for variable selection

The linearized alternating direction method of multipliers for sparse group LAD model

A flexible shrinkage operator for fussy grouped variable selection

References