Skip to main content
Log in

Bootstrap for inference after model selection and model averaging for likelihood models

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

A one-step semiparametric bootstrap procedure is constructed to estimate the distribution of estimators after model selection and of model averaging estimators with data-dependent weights. The method is generally applicable to non-normal models. Misspecification is allowed for all candidate parametric models. The semiparametric bootstrap estimator is shown to be consistent within specific regions such that the good and the bad candidate models are separated. Simulation studies exemplify that the bootstrap procedure leads to short confidence intervals with a good coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aerts M, Claeskens G (2001) Bootstrap tests for misspecified models, with application to clustered binary data. Comput Stat Data Anal 36(3):383–401

    Article  MathSciNet  Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B, Csáki F (eds) Second international symposium on information theory. Akadémiai Kiadó, Budapest, pp 267–281

    Google Scholar 

  • Bachoc F, Preinerstorfer D, Steinberger L (2020) Uniformly valid confidence intervals post-model-selection. Ann Stat 48(1):440–463

    Article  MathSciNet  Google Scholar 

  • Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547

    Article  MathSciNet  Google Scholar 

  • Berk R, Brown L, Buja A et al (2013) Valid post-selection inference. Ann Stat 41(2):802–837

    Article  MathSciNet  Google Scholar 

  • Camponovo L (2015) On the validity of the pairs bootstrap for lasso estimators. Biometrika 102(4):981–987

    Article  MathSciNet  Google Scholar 

  • Charkhi A, Claeskens G (2018) Asymptotic post-selection inference for the Akaike information criterion. Biometrika 105(3):645–664

    Article  MathSciNet  Google Scholar 

  • Claeskens G (1999) Smoothing techniques and bootstrap methods for multiparameter likelihood models. Ph.D. Thesis, Limburgs Universitair Centrum, Diepenbeek

  • Claeskens G, Hjort N (2003) The focused information criterion. J Am Stat Assoc 98:900–916. With discussion and a rejoinder by the authors

  • Danilov D, Magnus JR (2004) On the harm that ignoring pretesting can cause. J Econom 122(1):27–46

    Article  MathSciNet  Google Scholar 

  • Efron B (2014) Estimation and accuracy after model selection. J Am Stat Assoc 109(507):991–1007

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Garcia-Angulo A, Claeskens G (2023) Exact uniformly most powerful post-selection confidence distributions. Scand J Stat 50:358–382

    Article  MathSciNet  Google Scholar 

  • Garcia-Angulo A, Claeskens G (2023) Optimal finite sample post-selection confidence distributions in generalized linear models. J Stat Plan Inference 222:66–77

    Article  MathSciNet  Google Scholar 

  • Giurcanu MC (2012) Bootstrapping in non-regular smooth function models. J Multivar Anal 111:78–93

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  • Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879–899

    Article  MathSciNet  Google Scholar 

  • Hu F, Zidek JV (1995) A bootstrap based on the estimating equations of the linear model. Biometrika 82(2):263–275

    Article  MathSciNet  Google Scholar 

  • Iverson HK, Randles RH (1989) The effects on convergence of substituting parameter estimates into U-statistics and other families of statistics. Probab Theory Relat Fields 81(3):453–471

    Article  MathSciNet  Google Scholar 

  • Kabaila P (2009) The coverage properties of confidence regions after model selection. Int Stat Rev 77(3):405–414

    Article  Google Scholar 

  • Kabaila P, Welsh AH, Abeysekera W (2016) Model-averaged confidence intervals. Scand J Stat 43(1):35–48

    Article  MathSciNet  Google Scholar 

  • Lee SMS, Wu Y (2018) A bootstrap recipe for post-model-selection inference under linear regression models. Biometrika 105(4):873–890

    MathSciNet  Google Scholar 

  • Leeb H, Pötscher BM (2008) Can one estimate the unconditional distribution of post-model-selection estimators? Economet Theor 24(2):338–376

    Article  MathSciNet  Google Scholar 

  • Lehmann EL, Romano JP (2022) Bootstrap and subsampling methods. Springer, Cham, pp 863–918

    Google Scholar 

  • Lu W, Goldberg Y, Fine JP (2012) On the robustness of the adaptive lasso to model misspecification. Biometrika 99(3):717–731

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  • Pötscher BM (2009) Confidence sets based on sparse estimators are necessarily large. Sankhyā Indian J Stat Ser A (2008-) 71(1):1–18

  • Rao RR (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33(2):659–680

    Article  MathSciNet  Google Scholar 

  • Rossouw JE, Du Plessis JP, Benadé AJ et al (1983) Coronary risk factor screening in three rural communities. The CORIS baseline study. S Afr Med J Suid-Afrikaanse Tydskrif Vir Geneeskunde 64(12):430–436

    CAS  PubMed  Google Scholar 

  • Sin CY, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econom 71(1):207–225

    Article  MathSciNet  Google Scholar 

  • Taylor J, Tibshirani R (2018) Post-selection inference for l1-penalized likelihood models. Can J Stat 46(1):41–61

    Article  PubMed  Google Scholar 

  • Tian X, Taylor J (2018) Selective inference with a randomized response. Ann Stat 46(2):679–710

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression Shrinkage and selection via the Lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  Google Scholar 

  • Tibshirani RJ, Rinaldo A, Tibshirani R et al (2018) Uniform asymptotic inference and the bootstrap after model selection. Ann Stat 46(3):1255–1287

    Article  MathSciNet  Google Scholar 

  • Wang H, Zhou SZF (2013) Interval estimation by frequentist model averaging. Commun Stat Theory Methods 42(23):4342–4356

    Article  MathSciNet  Google Scholar 

  • White H (1994) Estimation, inference and specification analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Zou H (2006) The adaptive lasso and its Oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  CAS  Google Scholar 

Download references

Acknowledgements

Support from the Research Foundation Flanders and the KU Leuven Research Fund Project C16/20/002 is acknowledged. The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Andrea C. Garcia-Angulo or Gerda Claeskens.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Proofs

Appendix. Proofs

1.1 Proof of Proposition 1

Proof

Let \(\Delta \Psi _{M,n} = \Psi _{M,n} - \Psi _{M_\textrm{full},n} = c \{\ell _{n}(\varvec{Y}_n;\hat{\theta }_{n}) - \ell _{n}(\varvec{Y}_n;\hat{\theta }_{M,n})\} - (\kappa _{M_\textrm{full},n}-\kappa _{M,n})\). By (A1)–(A3)(i,iii) \(\hat{\theta }_{M,n}-\tilde{\theta }_{M,n}={o}_{a.s.}(1)\). By (A3)(ii,iv) and Proposition 4.2(a) of Sin and White (1996), if \(\kappa _{M,n}= {o}_p(n)\), then \(\Delta \Psi _{M,n} = n \Delta _{M,n} + {o}_p(n)\). If additionally, \(\liminf _{n \rightarrow \infty } \Delta _{M,n} > 0 \), then also by Proposition 4.2(a) of Sin and White (1996) \(\lim _{n \rightarrow \infty } P(\Delta \Psi _{M,n}>0)=1\) for \(M \in \mathcal {I}\). On the other hand, for all \(M' \notin \mathcal {I}\), \(\Delta \Psi _{M',n} \rightarrow 0\) as \(n \rightarrow \infty \) at different rates. Therefore, \(\lim _{n \rightarrow \infty } P(\Delta \Psi _{M,n}>\Delta \Psi _{M',n})=1\) for all \(M \in \mathcal {I}\) and \(M' \notin \mathcal {I}\), which implies \(\lim _{n \rightarrow \infty } P(\Psi _{M,n}>\Psi _{M',n})=1\). Given conditions (A4),(a)–(c) on the weight functions, \(\lim _{n \rightarrow \infty } P(W_{M,n}(\Psi _{M'',n}, M''\in \mathcal {M}) < W_{M',n}(\Psi _{M'',n}, M''\in \mathcal {M}))=1\) for all \(M \in \mathcal {I} \) and at least one \(M'\notin \mathcal {I}\). By condition (d) as \(\Psi _{M',n} \rightarrow _p \infty \) and given that \(\lim _{n \rightarrow \infty } P(\Psi _{M,n}>\Psi _{M',n})=1\), \(\widehat{W}_{M',n} \rightarrow _p 1\), for at least one \(M'\notin \mathcal {I}\). Finally, by condition (e) \(\widehat{W}_{M,n} =o_p(1)\) as \(n \rightarrow \infty \) for \(M \in \mathcal {I}\). This completes the proof.

1.2 Proof of Theorem 1

Proof

For a \((\mathcal {S},\tau _n)\) convergent sequence of pseudo-true parameters \((\tilde{\theta }_n)\), we write

$$\begin{aligned} T(\varvec{Y}_n, \tilde{\theta }_{n})= & {} \sum _{M \in \mathcal {S}\cup \mathcal {I}} \widehat{W}_{M,n} n^{1/2} \{h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{\min ,n})\}\\= & {} \sum _{M \in \mathcal {S} } \widehat{W}_{M,n} \big [ n^{1/2} \{ h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{M,n}) \} + n^{1/2} \{ h(\tilde{\theta }_{M,n}) - h(\tilde{\theta }_{n})\} \big ] \\{} & {} \quad + \sum _{M \in \mathcal {I} } \widehat{W}_{M,n} \big [ n^{1/2} \{h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{M,n})\} + n^{1/2} \{ h(\tilde{\theta }_{M,n}) - h(\tilde{\theta }_{n}) \} \big ]. \end{aligned}$$

By Definition 1, for \(M \in \mathcal {S}\) it holds that \(|\tilde{\theta }_{n} - \tilde{\theta }_{M,n}|= {o}_p(n^{-1/2})\). Further, for \(M \in \mathcal {S}\cup \mathcal {I}\) it holds by Sin and White (1996, Prop 4.1(a)) that \(\hat{\theta }_{M,n} - \tilde{\theta }_{M,n} = O_p(n^{-1/2})\). This, combined with Proposition 1 which yields that \(\widehat{W}_{M,n}\) is \(o_p(1)\) for \(M\in \mathcal {I}\) and Assumption (A6) guarantee that the sum over \(M\in \mathcal {I}\) is \(o_p(1)\). Therefore using the results obtained until now,

$$\begin{aligned} T(\varvec{Y}_n, \tilde{\theta }_{n})= \sum _{M \in \mathcal {S} } \widehat{W}_{M,n} n^{1/2} \{ h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{M,n}) \} + {o}_p(1). \end{aligned}$$

For each \(M \in \mathcal {S}\), under (A1)–(A3) the strong consistency of \(\hat{\theta }_{M,n}\) is guaranteed (see White 1994, Th 3.6). For the \(|M|\times |M|\) matrix \(\Sigma _{M,n}(\tilde{\theta }_{M,n})=[J_{M,n} (\tilde{\theta }_{M,n})]^{-1} K_{M,n}(\tilde{\theta }_{M,n}) [J_{M,n}(\tilde{\theta }_{M,n})]^{-1}\) it holds that since \(|\tilde{\theta }_{n} - \tilde{\theta }_{M,n}|={o}_p(n^{-1/2})\) for each \(M \in \mathcal {S}\), by (A3) (vi) \(E[\Sigma _{M,n}(\tilde{\theta }_{M,n}) - \Sigma _{M,n}(\Pi _M\tilde{\theta }_{n})] \rightarrow 0 \). Due to the convergence of \(\tilde{\theta }_n\) to \(\tilde{\theta }_\infty \), the limit of the covariance matrix is \(\Sigma _M= J_M(\Pi _M \tilde{\theta }_\infty )^{ -1} K_M(\Pi _M \tilde{\theta }_\infty )J_M(\Pi _M \tilde{\theta }_\infty )^{ -1}\). Then, if n tends to infinity, there is convergence in distribution

$$\begin{aligned} n^{1/2} \Pi _M(\hat{\theta }_{M,n}- \tilde{\theta }_{M,n}) \overset{d}{\rightarrow }\ [\Sigma _M]^{1/2} \Pi _M Z, \end{aligned}$$

where \(Z\sim N_{p}(\varvec{0},I_{p})\). Moreover, using the form of the information criterion as in (5) and Taylor expansions of the log likelihood in model M around the full model’s log likelihood, it is clear that by assumption (A3) for all \(M\in \mathcal {S}\) there is joint convergence of all \(n^{1/2}\Pi _M(\hat{\theta }_{M,n}-\tilde{\theta }_{M,n})\) and all weights \(\widehat{W}_{M,n}\) to corresponding limiting distributions. Consequently, for \(M\in \mathcal {S}\), \(\hat{\Lambda }_{M,n}\) can be expressed in the limit as a function of the same variable Z, leading to the limit version of the weights \(\mathcal {W}_{M}(Z)\). Combining the above results, an application of the continuous mapping theorem leads to result of Theorem 1.

1.3 Proof of Theorem 2

Proof

We use the notation \(E^*\), Var\(^*\) and \(d^*\) to represent the bootstrap expectation, variance and convergence in distribution, conditionally on \(\varvec{Y}_n\) and X. The \(|M'|\times |M'|\) matrices \(K^*_{M',n}(\theta )\) and \(J^*_{M',n}(\theta )\) are defined in the same way as \(K_{M',n}(\theta )\) and \(J_{M',n}(\theta )\) though using the bootstrap data \((Y_{i^*},x_{i^*}^\top )\) with \(i^*\in S^*\), the set of n integers taken randomly with replacement from \(\{1,\ldots ,n\}\). The sequence of pseudo-true parameters \((\tilde{\theta }_n)\) is \((\mathcal {S},\tau _n)\) convergent.

Following the proof of Theorem 9.1 of Claeskens (1999) for the one-step semiparametric bootstrap, by (A3)(v) choosing \(\delta >0\) and the strong consistency of \(\hat{\theta }_{M',n}\) for each \(M' \in \mathcal {S}\), any \(|M'|\)-dimensional vector \(\varvec{v}_{|M'|}\), with norm equal to 1,

$$\begin{aligned} \sum _{i=1}^{n} E^*[|n^{-1/2} \varvec{v}^\top _{|M'|} \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})|^{2+\delta } ] = {O}_p(n^{-\delta /2}). \end{aligned}$$
(12)

The resampling scheme ensures that \(E^*[K^*_{M',n}(\hat{\theta }_{M',n})] = K_{M',n}(\hat{\theta }_{M',n})\). Applying Theorem 2.9 of Iverson and Randles (1989) to each of the \(|M'|^2\) components of \(K_{M',n}(\hat{\theta }_{M',n})\), by assumptions (A3)(iv,vi), the strong consistency of \(\hat{\theta }_{M',n}\) and the fact that \(\hat{\theta }_{M',n}-\tilde{\theta }_{n} = {O}_p(n^{-1/2})\) (using Proposition 4.1 of (Sin and White 1996) and proof of Theorem 1) this implies that conditional on \(\varvec{Y_n}\) and X, \(E^*[K_{M',n}(\hat{\theta }_{M',n})]-K_{M',n}(\tilde{\theta }_{n})\) converges to zero as \(n \rightarrow \infty \). Using (12) with a \(|M'|\)-dimensional vector \(\varvec{v}_{|M'|}\), this implies the Liapunov’s condition,

$$\begin{aligned} \frac{\sum _{i=1}^{n} E^*[|n^{-1/2} \varvec{v}_{|M'|}^\top \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})|^{2+\delta } ]}{(\sum _{i=1}^{n} E^*[\{n^{-1/2} \varvec{v}_{|M'|}^\top \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})\}^2 ])^{1+\delta /2}} \rightarrow 0. \end{aligned}$$

By the Crámer-Wold theorem then,

$$\begin{aligned} n^{-1/2} \sum _{i=1}^{n} \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n}) \overset{d^*}{\rightarrow }\ N_{|M'|}(\varvec{0},K_{M'}(\tilde{\theta }_{\infty })). \end{aligned}$$

Also by Theorem 2.9 of Iverson and Randles (1989), \(E^*[J_{M',n}^*(\hat{\theta }_{M',n})] -J_{M',n}(\tilde{\theta }_n) \rightarrow 0 \) and Var\(^* [J_{M',n}^*(\hat{\theta }_{M',n})]= {O}_p(n^{-1})\) such that \(J_{M',n}^*(\hat{\theta }_{M',n})-J_{M'}(\tilde{\theta }_\infty ) \rightarrow 0 \) in bootstrap probability.

Combining the above results, by Slutsky’s theorem, as n tends to infinity,

$$\begin{aligned} n^{1/2} \bigg [\sum _{i=1}^{n} \ddot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})\bigg ]^{-1} \sum _{i=1}^{n} \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n}) \overset{d^*}{\rightarrow }\ [\Sigma _{M'}]^{1/2} \Pi _{M'} Z, \end{aligned}$$

with \(Z \sim N_p( \varvec{0},I_p)\).

Since \(\overline{\mathcal {S}}\) is a subset of \(\overline{\mathcal {M}}\) that is closed under intersections, we define \(M_{\min }\) as the most parsimonious model in \(\overline{\mathcal {S}}\), that is, \(|M_{\min }|< |M'|\) for all \(M' \in \overline{\mathcal {S}}\setminus M_{\min }\). Because the closedness under intersections, \(M_{\min }\) is a submodel of each \(M' \in \overline{\mathcal {S}}{\setminus } M_{\min }\). Then, \(\hat{\theta }^{(M_{\min })}_{M',n} = \hat{\theta }_{M_{\min },n}\) and by Sin and White (1996, [Prop. 4.1]), \(\hat{\theta }_{M_{\min },n}= \tilde{\theta }_{M_{\min },n} + {O}_p(n^{-1/2})\).

Now, we study the bootstrap information criteria and the weights . The semiparametric sampling scheme is such that for any \(j^*\in S^*\), \(E^*[ \dot{\ell }_{M_\textrm{full}}(Y_{j^*};\hat{\theta }_{M',n})] = n^{-1}\dot{\ell }_{M_\textrm{full}}(\varvec{Y}_{n};\hat{\theta }_{M',n})\) and \(E^*[ \ddot{\ell }_{M_\textrm{full}}(Y_{j^*};\hat{\theta }_{M',n})] = n^{-1}\ddot{\ell }_{M_\textrm{full}}(\varvec{Y}_{n};\hat{\theta }_{M',n})\). Therefore, as \(n \rightarrow \infty \), tends to zero. Under (A3)(ii,iv) and (A4) for the weight functions, as \(n \rightarrow \infty \), converges in bootstrap distribution to the random variable \(\mathcal {W}_{M'}(Z)\). An application of Polya’s theorem (e.g. Rao 1962, Lemma 3.2) yields that for \(M_{\min }\)

$$\begin{aligned} \sup _{x\in \mathbb {R}^q}\big |P( n^{1/2} \{\hat{\theta }^{* (M_{\min })}_{h,\textrm{avg},n} - h(\hat{\theta }_{M_{\min },n})\}\le x \mid \varvec{Y}_n ) -H_n(x) \big |= 0. \end{aligned}$$

Finally, by Proposition 4.2 (b) of Sin and White (1996), if \(\widetilde{\Psi }_{M,n}\) is an information criterion satisfying condition (A5) and \(\hat{\iota }_{M,n}\) is a weight that also satisfies condition (A4), then \(\hat{\iota }_{M,n}\) consistently selects and assigns higher weight to \(M_{\min }\) such that \(\hat{\iota }_{M',n}=o_p(1)\) for \(M' \in \overline{\mathcal {S}}{\setminus } M_{\min }\). Theorem 2 follows by combining the above results.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garcia-Angulo, A.C., Claeskens, G. Bootstrap for inference after model selection and model averaging for likelihood models. Metrika (2024). https://doi.org/10.1007/s00184-024-00956-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00184-024-00956-2

Keywords

Navigation