Skip to main content
Log in

Information criteria bias correction for group selection

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

The main contribution of this paper lies in the extension towards group lasso of a Mallows’ Cp-like information criterion used in finetuning the lasso selection in a high-dimensional, sparse regression model. The optimisation of an information criterion paired with an \(\ell _1\)-norm regularisation method of the lasso leads to an overestimation of the model size. This is because the shrinkage following from the \(\ell _1\) regularisation is too permissive towards false positives, since shrinkage reduces the effects of false positives. The problem does not arise with \(\ell _0\)-norm regularisation but this is a combinatorial problem, which is computationally unfeasible in the high-dimensional setting. The strategy adopted in this paper is to select the non-zero variables with \(\ell _1\) method and estimate their values with the \(\ell _0\), meaning that lasso is used for selection, followed by an orthogonal projection, i.e., debiasing after selection. This approach necessitates the information criterion to be adapted, in particular, by including what is called a “mirror correction”, leading to smaller models. A second contribution of the paper is situated at the methodological level, more precisely in the development of the corrected information criterion using random hard thresholds as a model for the selection process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B, Csáki F (eds) Second international symposium on information theory. Akadémiai Kiadó, Budapest, pp 267–281

    Google Scholar 

  • Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547

    Article  MathSciNet  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Claeskens G, Hjort NL (2008) Model selection and model averaging, 1st edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Das D, Chatterjee A, Lahiri SN (2020) Higher order refinements by bootstrap in lasso and other penalized regression methods. Tech. Rep. Indian Institute of Technology/Indian Statistical Institute/Washington University in St. Louis, Kanpur/Delhi/St. Louis. arXiv: 1909.06649

  • Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun Pure Appl Math 57:1413–1457

    Article  MathSciNet  Google Scholar 

  • Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627

    Article  MathSciNet  Google Scholar 

  • Donoho DL (2006) For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun Pure Appl Math 59:797–829

    Article  Google Scholar 

  • Donoho DL, Johnstone IM (1995) Adapting to unknown smoothness via wavelet shrinkage. J Am Stat Assoc 90(432):1200–1224

    Article  MathSciNet  Google Scholar 

  • Efron B, Hastie TJ, Johnstone IM, Tibshirani RJ (2004) Least angle regression. Ann Stat 32(2):407–499 (With discussion)

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Foygel Barber R, Candès E (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085

    MathSciNet  MATH  Google Scholar 

  • Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332

    Article  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441

    Article  Google Scholar 

  • Fu W (1998) Penalized regressions: the bridge vs the lasso. J Comput Graph Stat 7(3):397–416

    Google Scholar 

  • Huang J, Breheny P, Ma S (2012) A selective review of group selection in high-dimensional models. Stat Sci 27(4):481–499

    Article  MathSciNet  Google Scholar 

  • Jansen M (2014) Information criteria for variable selection under sparsity. Biometrika 101(1):37–55

    Article  MathSciNet  Google Scholar 

  • Jansen M (2015) Generalized cross validation in variable selection with and without shrinkage. J Stat Plan Inference 159:90–104

    Article  MathSciNet  Google Scholar 

  • Javanmard A, Montanari A (2018) Debiasing the lasso: optimal sample size for Gaussian designs. Ann Stat 46(6A):2593–2622

    Article  MathSciNet  Google Scholar 

  • Leadbetter MR, Lindgren G, Rootzén H (1983) Extremes and related properties of random sequences and processes. Springer series in statistics. Springer, New York

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Mallows C (1973) Some comments on \({C}_p\). Technometrics 15:661–675

    MATH  Google Scholar 

  • Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462

    Article  MathSciNet  Google Scholar 

  • Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate distribution. In: Third Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 197–206

  • Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B 67(1):91–108

    Article  MathSciNet  Google Scholar 

  • Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani RJ, Taylor JE (2012) Degrees of freedom in lasso problems. Ann Stat 40(2):1198–1232

    Article  MathSciNet  Google Scholar 

  • Wainwright MJ (2009) Sharp thresholds for noisy and high-dimensional recovery of sparsity using \(\ell _1\)-constrained quadratic programming (lasso). IEEE Trans Inf Theory 55(5):2183–2202

    Article  Google Scholar 

  • Wang H, Leng C (2008) A note on adaptive group lasso. Comput Stat Data Anal 52(12):5277–5286

    Article  MathSciNet  Google Scholar 

  • Yang Y (2005) Can the strengths of AIC and BIC be shared? Biometrika 92:937–950

    Article  MathSciNet  Google Scholar 

  • Ye J (1998) On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc 93:120–131

    Article  MathSciNet  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67

    Article  MathSciNet  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under the minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  Google Scholar 

  • Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37:3468–3497

    Article  MathSciNet  Google Scholar 

  • Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563

    MathSciNet  MATH  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie TJ, Tibshirani RJ (2007) On the degrees of freedom of the lasso. Ann Stat 35(5):2173–2192

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maarten Jansen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Proposition 1

Proof

The key point in the proof is to realise that the largest contributions to the approximation error come from the values in \(\mu \) away from zero. Using the assumption of asymptotic sparsity, these contributions become less and less important.

Fig. 2
figure 2

Plot of \((-1/n) \sum ^n_{i=1} h(t; \mu _i)\) as a function of t, along with an approximation h(t; 0)

Defining

$$\begin{aligned} {\bar{h}}(t; {\varvec{\mu }}) = \frac{1}{n} \sum ^n_{i = 1} h(t; \mu _i), \end{aligned}$$

we can write from (9),

$$\begin{aligned} m_k - {\widetilde{m}}_k= & {} E\left[ {\bar{h}}(T_{k,i}; {\varvec{\mu }}) - {\bar{h}}(T_{k,i}; {\varvec{0}})\right] = \frac{1}{n} \sum ^n_{i = 1} E\left[ h(T_{k,i}; \mu _i) - h(T_{k,i}; 0)\right] \\= & {} \frac{1}{n} \sum ^n_{i = 1} E\left[ 2 G_{\varepsilon }(T_{k,i}) - G_{\varepsilon }(T_{k,i} - \mu _i) - G_{\varepsilon }(T_{k,i} + \mu _i)\right] . \end{aligned}$$

The value of \({\bar{h}}(t; {\varvec{\mu }})\) is depicted as a function of t in Fig. 2. The individual contributions \(h(t; \mu _i)\) for a typical sparse signal are plotted in Fig. 3.

Fig. 3
figure 3

Plots of \(h(t; \mu _i)\) with in bold line, h(t; 0)

We construct an upper bound for \(h(t; \mu )\), consisting of three parts, depending on the value of \(\mu \). First, we have a general upper bound

$$\begin{aligned} |h(t; \mu ) - h(t; 0)|\le & {} \max _t |2 G_{\varepsilon }(t) - G_{\varepsilon }(t - \mu ) - G_{\varepsilon }(t + \mu )| \\\le & {} 4 \max _x |G_{\varepsilon }(x)| = 4 |G_{\varepsilon }(\sigma )|, \end{aligned}$$

as indeed, on the positive axis, \(|G_{\varepsilon }(x)|\) is unimodal with global maximum in \(x = \sigma \).

Remark 1

The upper bound is pessimistic, since \(\lim _{t \rightarrow \infty } |G_{\varepsilon }(t)| = 0\), so for every \(\eta > 0\), there exists a \(t^{*}\), so that for \(t > t^{*}\), we find \( 2 |G_{\varepsilon }(t) - G_{\varepsilon }(t + \mu _i)| \le |2 G_{\varepsilon }(t)| < 2 \eta , \) and so \( |2 G_{\varepsilon }(t) - G_{\varepsilon }(t-\mu _i) - G_{\varepsilon }(t+\mu _i)| < 2|G_{\varepsilon }(\sigma )| + 2 \eta . \)

The second part of the upper bound is for small values of \(\mu \), as illustrated in Fig. 4. Let M be a constant, a priori depending on t, so that for \(|\mu | \le t - \sigma \), we have that

$$\begin{aligned} h(t; \mu ) - h(t; 0) \le \left[ M - h(t; 0)\right] \cdot \left[ \frac{\mu }{t - \sigma }\right] ^2. \end{aligned}$$
(21)

This construction is possible since \(h'(t; 0) = 0\).

Fig. 4
figure 4

In bold line, plots of \(h(t; \mu _i)\) as a function of \(\mu _i\) for two values of t. The dotted lines represent the upper bounds

Remark 2

Obviously, one can take \([M - h(t; 0)] / (t - \sigma )^2\) to be equal to \(\max _{x \in \mathrm{I\!R}} 2 |G_{\varepsilon }''(x)|\), but that choice would lead to a pessimistic upper bound when t grows larger. We will keep \(2 \max _{x \in \mathrm{I\!R}} |G_{\varepsilon }''(x)|\) as an upper bound when \([M - h(t; 0)] / (t - \sigma )^2\) is replaced by the random version \([M - h(t; 0)] / (T_{k,i} - \sigma )^2\).

For \(t \ge \sigma \) and \(\mu \ge \tau \), we have that \(-G_{\varepsilon }(t + \tau + \mu ) \le -G_{\varepsilon }(t - \tau + \mu )\), and so that \(h(t + \tau ; \mu ) \le h(t; \mu - \tau )\), and thus

$$\begin{aligned} h(t + \tau ; \mu ) \le h(t; 0) + [M - h(t; 0)] \cdot \left[ \frac{\mu - \tau }{t - \sigma }\right] ^2. \end{aligned}$$

For t sufficiently large, \(h(t; 0) - h(t + \tau ; 0)\) is small enough for any \(\tau \), so that

$$\begin{aligned} \frac{h(t ;0) - h(t + \tau ; 0)}{M - h(t + \tau ; 0)} \le \left( \frac{\tau }{t + \tau - \sigma }\right) ^2. \end{aligned}$$

This is equivalent to

$$\begin{aligned} h(t; 0) \le h(t + \tau ; 0) + \left[ M - h(t + \tau ; 0)\right] \cdot \left[ \frac{\tau }{t + \tau - \sigma }\right] ^2. \end{aligned}$$
(22)

This implies that on \([\tau , t + \tau - \sigma ]\),

$$\begin{aligned} h(t; 0) + \left[ M - h(t + \tau ; 0)\right] \cdot \left[ \frac{\mu - \tau }{t - \sigma }\right] ^2\le & {} h(t + \tau ; 0) + \left[ M - h(t + \tau ; 0)\right] \nonumber \\&\times \left[ \frac{\mu }{t + \tau - \sigma }\right] ^2 \end{aligned}$$
(23)

as indeed both quadratic forms have the same value, M, at \(\mu = t + \tau - \sigma \), while for \(\mu = \tau \) this reduces to (22). The right hand side of (23) has the same form as the right hand side in (21). As a result, for t sufficiently large, the constant M in (21) does not depend on t. By choosing a value for M larger than \(4 |G_{\varepsilon }(\sigma )|\), the upper bound in (21) holds for any \(\mu \). Taking into account Remark 2, we can write

$$\begin{aligned} h(t ; \mu ) - h(t; 0) \le q(t)\mu ^2, \end{aligned}$$

where

$$\begin{aligned} q(t) = \min \left( 2 \max _{x \in \mathrm{I\!R}} |G_{\varepsilon }''(x)|, \frac{M - h(t; 0)}{(t - \sigma )^2}\right) . \end{aligned}$$

For large values of \(\mu \), we however need a third and tighter upper bound. Because of the symmetry in \(f_{\varepsilon }(x)\), we have for \(\mu = 2 t\), that \(h(2t; t) = -G_{\varepsilon }(3 t) - G_{\varepsilon }(-t) = -G_{\varepsilon }(3 t) + G_{\varepsilon }(t)\) and as 2t is far beyond the largest local maximum of \(|h(\mu ; t)|\) as a function of \(\mu \), it holds for \(\mu > 2 t\) that \(h(\mu ; t) > h(2 t; t)\), and so

$$\begin{aligned} |h(\mu ; t) - h(0; t)|= & {} h(0; t) - h(\mu ; t)< h(0; t) - h(2 t; t) \\= & {} |3 G_{\varepsilon }(t) - G_{\varepsilon }(3 t)| < 3 |G_{\varepsilon }(t)|. \end{aligned}$$

The three parts of the analysis allow us to conclude that the approximation error of \({\widetilde{m}}_k\) is bounded by

$$\begin{aligned} |m_k - {\widetilde{m}}_k|\le & {} \frac{1}{n} \sum ^n_{i = 1} E\left| h(T_{k,i}; \mu _i) - h(T_{k,i}; 0)\right| \nonumber \\\le & {} \frac{1}{n} \sum ^n_{i = 1} P(T_{k,i}> |\mu _i|/2) E\left( q(T_{k,i}) | T_{k,i} > |\mu _i|/2\right) \mu ^2_i \nonumber \\&+ P(T_{k,i} \le |\mu _i|/2) 3 E\left( |G_{\varepsilon }(T_{k,i})|~| T_{k,i} \le |\mu _i|/2\right) . \end{aligned}$$
(24)

Moreover, it is easy to find a constant K so that \(q(t) \le K/t\), for all values of t, and also, because q(t) is a monotonously non-increasing function, we have

$$\begin{aligned} E\left( q(T_{k,i}) | T_{k,i} > |\mu _i|/2\right) \le E\left( q(T_{k,i})\right) \le K E\left( 1/T_{k,i}\right) \rightarrow 0 \text{ when } n \rightarrow \infty . \end{aligned}$$

Combining this with Assumption (11), we find for the first sum in (24),

$$\begin{aligned} \frac{1}{n} \sum ^n_{i = 1} P(T_{k,i}> |\mu _i|/2) E\left( q(T_{k,i}) | T_{k,i} > |\mu _i|/2\right) \mu ^2_i = o\left[ \mathrm {PE}(\widehat{{\varvec{\mu }}}_k)\right] . \end{aligned}$$

For the second sum in (24), we see that if \(P(T_{k,i} \le |\mu _i|/2)\) does not tend to zero, then by Markov’s inequality, we have

$$\begin{aligned} P\left( \frac{1}{T_{k,i}}> \frac{2}{|\mu _i|}\right) \le E\left( \frac{1}{T_{k,i}}\right) \frac{|\mu _i|}{2} \Rightarrow |\mu _i| \ge \frac{2P\left( \frac{1}{T_{k,i}} > \frac{2}{|\mu _i|}\right) }{E\left( \frac{1}{T_{k,i}}\right) } \rightarrow \infty . \end{aligned}$$

With \(|\mu _i| \rightarrow \infty \) and \(E\left( 1/T_{k,i}\right) \rightarrow 0\), the value of \(E\left( |G_{\varepsilon }(T_{k,i})|~|~T_{k,i} \le |\mu _i|/2\right) \) then tends to \(E\left( |G_{\varepsilon }(T_{k,i})|\right) \) which in turn tends to zero since, under Assumption (A2), there exists a constant L so that \(|G_{\varepsilon }(t)| \le L/t\). As a result, we have

$$\begin{aligned} \frac{1}{n} \sum ^n_{i = 1} P(T_{k,i} \le |\mu _i|/2) 3 E\left( |G_{\varepsilon }(T_{k,i})| | T_{k,i} \le |\mu _i|/2\right) = o\left[ \mathrm {PE}(\widehat{{\varvec{\mu }}}_k)\right] , \end{aligned}$$

thereby completing the proof. \(\square \)

Appendix B: Development of calculations for Eqs. 17 and 18

With \(\varGamma \) the gamma function and \(F_{\chi ^2_w}\) and \(f_{\chi ^2_w}\) the cumulative distribution function and density of the \(\chi ^2_w\) distribution, we find the following result for Eq. (17):

$$\begin{aligned} {\widetilde{m}}_l= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 (1 - u w^{-1})f_{\chi ^2_w}(u)du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 (1 - u w^{-1}) \frac{1}{2^{\frac{w}{2}} \varGamma (\frac{w}{2})} u^{\frac{w}{2} - 1} e^{-\frac{u}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 \frac{1}{2^{\frac{w}{2}}\varGamma (\frac{w}{2})} u^{\frac{w}{2} - 1} e^{-\frac{u}{2}} du - \frac{1}{w} \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 u \frac{1}{2^{\frac{w}{2}}\varGamma (\frac{w}{2})} u^{\frac{w}{2} - 1} e^{-\frac{u}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_w}(T_{l,j}^2\sigma ^{-2}) - \frac{2 \varGamma (\frac{w}{2} + 1)}{w \varGamma (\frac{w}{2})} \int ^{\frac{T_{l,j}^2}{\sigma ^2}}_0 \frac{1}{2^{\frac{w}{2}+1}\varGamma (\frac{w}{2}+1)} u^{\frac{w}{2}} e^{-\frac{u}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_w}(T_{l,j}^2\sigma ^{-2}) - \frac{2 \varGamma (\frac{w}{2} + 1)}{w \varGamma (\frac{w}{2})} F_{\chi ^2_{w + 2}}(T_{l,j}^2\sigma ^{-2})\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_w}(T_{l,j}^2\sigma ^{-2}) - F_{\chi ^2_{w + 2}}(T_{l,j}^2\sigma ^{-2})\right) . \end{aligned}$$

Also, when the group size w is 1 (singletons), Eq. (17) reduces to Eq. (18):

$$\begin{aligned} {\widetilde{m}}_k= & {} {\widetilde{m}}_l = \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( F_{\chi ^2_1}(T_{l,j}^2\sigma ^{-2}) - F_{\chi ^2_3}(T_{l,j}^2\sigma ^{-2})\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{1}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}^2}{2\sigma ^2}}_0 u^{-\frac{1}{2}} e^{-u} du - \frac{1}{\varGamma (\frac{3}{2})} \int ^{\frac{T_{l,j}^2}{2\sigma ^2}}_0 u^{\frac{1}{2}} e^{-u} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{1}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \left( \frac{u^2}{2}\right) ^{-\frac{1}{2}} e^{-\frac{u^2}{2}} u du - \frac{2}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \left( \frac{u^2}{2}\right) ^{\frac{1}{2}} e^{-\frac{u^2}{2}} u du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{1}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \sqrt{2} e^{-\frac{u^2}{2}} du - \frac{2}{\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 \frac{1}{\sqrt{2}} u^2 e^{-\frac{u^2}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{2}{\sqrt{2}\varGamma (\frac{1}{2})} \int ^{\frac{T_{l,j}}{\sigma }}_0 (1-u^2) e^{-\frac{u^2}{2}} du\right) \\= & {} \frac{\sigma ^2}{r} \sum ^r_{j = 1} E\left( \frac{2}{\sqrt{2}\varGamma (\frac{1}{2})} \frac{T_{l,j}}{\sigma } e^{-\frac{T_{l,j}^2}{2\sigma ^2}}\right) \\= & {} 2\sigma ^2 r^{-1} \sum ^r_{j = 1} E\left( T_{l,j}\phi _{\sigma }(T_{l,j})\right) = 2\sigma ^2 n^{-1} \sum ^n_{i = 1} E\left( T_{k,i}\phi _{\sigma }(T_{k,i})\right) , \end{aligned}$$

as \(e^{-\frac{u^2}{2}} - u^2 e^{-\frac{u^2}{2}}\) is the derivative of \(u e^{-\frac{u^2}{2}}\) and \(\varGamma (\frac{1}{2}) = \sqrt{\pi }\), \(\phi _{\sigma }\) being the density of a zero-mean normal random variable with variance \(\sigma ^2\).

Appendix C: Illustration for image denoising

In its first column, Fig. 5 presents a sample of 5 noise-free images, then the noisy ones in the second column. Finally the denoised images using Mallows’ Cp and the mirror-corrected Cp in unstructured and group selections are shown in the third to fourth, and fifth to sixth columns respectively.

Fig. 5
figure 5

Image denoising for 10 handwritten digits using Mallows’ Cp and the mirror-corrected Cp in unstructured and group selections

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marquis, B., Jansen, M. Information criteria bias correction for group selection. Stat Papers 63, 1387–1414 (2022). https://doi.org/10.1007/s00362-021-01283-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-021-01283-8

Keywords

Navigation