Bootstrap for inference after model selection and model averaging for likelihood models

Garcia-Angulo, Andrea C.; Claeskens, Gerda

doi:10.1007/s00184-024-00956-2

Bootstrap for inference after model selection and model averaging for likelihood models

Published: 05 March 2024

(2024)
Cite this article

Metrika Aims and scope Submit manuscript

98 Accesses
Explore all metrics

Abstract

A one-step semiparametric bootstrap procedure is constructed to estimate the distribution of estimators after model selection and of model averaging estimators with data-dependent weights. The method is generally applicable to non-normal models. Misspecification is allowed for all candidate parametric models. The semiparametric bootstrap estimator is shown to be consistent within specific regions such that the good and the bad candidate models are separated. Simulation studies exemplify that the bootstrap procedure leads to short confidence intervals with a good coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical estimation in the presence of possibly incorrect model assumptions

Article 01 September 2017

On Bootstrapping Using Smoothed Bootstrap

Bootstrapping the Stein-Rule Estimators

Article 18 November 2021

References

Aerts M, Claeskens G (2001) Bootstrap tests for misspecified models, with application to clustered binary data. Comput Stat Data Anal 36(3):383–401
Article MathSciNet Google Scholar
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B, Csáki F (eds) Second international symposium on information theory. Akadémiai Kiadó, Budapest, pp 267–281
Google Scholar
Bachoc F, Preinerstorfer D, Steinberger L (2020) Uniformly valid confidence intervals post-model-selection. Ann Stat 48(1):440–463
Article MathSciNet Google Scholar
Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547
Article MathSciNet Google Scholar
Berk R, Brown L, Buja A et al (2013) Valid post-selection inference. Ann Stat 41(2):802–837
Article MathSciNet Google Scholar
Camponovo L (2015) On the validity of the pairs bootstrap for lasso estimators. Biometrika 102(4):981–987
Article MathSciNet Google Scholar
Charkhi A, Claeskens G (2018) Asymptotic post-selection inference for the Akaike information criterion. Biometrika 105(3):645–664
Article MathSciNet Google Scholar
Claeskens G (1999) Smoothing techniques and bootstrap methods for multiparameter likelihood models. Ph.D. Thesis, Limburgs Universitair Centrum, Diepenbeek
Claeskens G, Hjort N (2003) The focused information criterion. J Am Stat Assoc 98:900–916. With discussion and a rejoinder by the authors
Danilov D, Magnus JR (2004) On the harm that ignoring pretesting can cause. J Econom 122(1):27–46
Article MathSciNet Google Scholar
Efron B (2014) Estimation and accuracy after model selection. J Am Stat Assoc 109(507):991–1007
Article MathSciNet CAS PubMed Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Garcia-Angulo A, Claeskens G (2023) Exact uniformly most powerful post-selection confidence distributions. Scand J Stat 50:358–382
Article MathSciNet Google Scholar
Garcia-Angulo A, Claeskens G (2023) Optimal finite sample post-selection confidence distributions in generalized linear models. J Stat Plan Inference 222:66–77
Article MathSciNet Google Scholar
Giurcanu MC (2012) Bootstrapping in non-regular smooth function models. J Multivar Anal 111:78–93
Article MathSciNet Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin
Book Google Scholar
Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879–899
Article MathSciNet Google Scholar
Hu F, Zidek JV (1995) A bootstrap based on the estimating equations of the linear model. Biometrika 82(2):263–275
Article MathSciNet Google Scholar
Iverson HK, Randles RH (1989) The effects on convergence of substituting parameter estimates into U-statistics and other families of statistics. Probab Theory Relat Fields 81(3):453–471
Article MathSciNet Google Scholar
Kabaila P (2009) The coverage properties of confidence regions after model selection. Int Stat Rev 77(3):405–414
Article Google Scholar
Kabaila P, Welsh AH, Abeysekera W (2016) Model-averaged confidence intervals. Scand J Stat 43(1):35–48
Article MathSciNet Google Scholar
Lee SMS, Wu Y (2018) A bootstrap recipe for post-model-selection inference under linear regression models. Biometrika 105(4):873–890
MathSciNet Google Scholar
Leeb H, Pötscher BM (2008) Can one estimate the unconditional distribution of post-model-selection estimators? Economet Theor 24(2):338–376
Article MathSciNet Google Scholar
Lehmann EL, Romano JP (2022) Bootstrap and subsampling methods. Springer, Cham, pp 863–918
Google Scholar
Lu W, Goldberg Y, Fine JP (2012) On the robustness of the adaptive lasso to model misspecification. Biometrika 99(3):717–731
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Pötscher BM (2009) Confidence sets based on sparse estimators are necessarily large. Sankhyā Indian J Stat Ser A (2008-) 71(1):1–18
Rao RR (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33(2):659–680
Article MathSciNet Google Scholar
Rossouw JE, Du Plessis JP, Benadé AJ et al (1983) Coronary risk factor screening in three rural communities. The CORIS baseline study. S Afr Med J Suid-Afrikaanse Tydskrif Vir Geneeskunde 64(12):430–436
CAS PubMed Google Scholar
Sin CY, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econom 71(1):207–225
Article MathSciNet Google Scholar
Taylor J, Tibshirani R (2018) Post-selection inference for l1-penalized likelihood models. Can J Stat 46(1):41–61
Article PubMed Google Scholar
Tian X, Taylor J (2018) Selective inference with a randomized response. Ann Stat 46(2):679–710
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression Shrinkage and selection via the Lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet Google Scholar
Tibshirani RJ, Rinaldo A, Tibshirani R et al (2018) Uniform asymptotic inference and the bootstrap after model selection. Ann Stat 46(3):1255–1287
Article MathSciNet Google Scholar
Wang H, Zhou SZF (2013) Interval estimation by frequentist model averaging. Commun Stat Theory Methods 42(23):4342–4356
Article MathSciNet Google Scholar
White H (1994) Estimation, inference and specification analysis. Cambridge University Press, Cambridge
Book Google Scholar
Zou H (2006) The adaptive lasso and its Oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MathSciNet CAS Google Scholar

Download references

Acknowledgements

Support from the Research Foundation Flanders and the KU Leuven Research Fund Project C16/20/002 is acknowledged. The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government.

Author information

Authors and Affiliations

Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, ESPOL, Km 30.5 Vía Perimetral, 090902, Guayaquil, Ecuador
Andrea C. Garcia-Angulo
Orstat and Leuven Statistics Research Centre, KU Leuven, Naamsestraat 69, 3000, Leuven, Belgium
Gerda Claeskens

Authors

Andrea C. Garcia-Angulo
View author publications
You can also search for this author in PubMed Google Scholar
Gerda Claeskens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Andrea C. Garcia-Angulo or Gerda Claeskens.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Proofs

1.1 Proof of Proposition 1

Proof

Let $\Delta \Psi _{M,n} = \Psi _{M,n} - \Psi _{M_\textrm{full},n} = c \{\ell _{n}(\varvec{Y}_n;\hat{\theta }_{n}) - \ell _{n}(\varvec{Y}_n;\hat{\theta }_{M,n})\} - (\kappa _{M_\textrm{full},n}-\kappa _{M,n})$. By (A1)–(A3)(i,iii) $\hat{\theta }_{M,n}-\tilde{\theta }_{M,n}={o}_{a.s.}(1)$. By (A3)(ii,iv) and Proposition 4.2(a) of Sin and White (1996), if $\kappa _{M,n}= {o}_p(n)$, then $\Delta \Psi _{M,n} = n \Delta _{M,n} + {o}_p(n)$. If additionally, $\liminf _{n \rightarrow \infty } \Delta _{M,n} > 0 $, then also by Proposition 4.2(a) of Sin and White (1996) $\lim _{n \rightarrow \infty } P(\Delta \Psi _{M,n}>0)=1$ for $M \in \mathcal {I}$. On the other hand, for all $M' \notin \mathcal {I}$, $\Delta \Psi _{M',n} \rightarrow 0$ as $n \rightarrow \infty $ at different rates. Therefore, $\lim _{n \rightarrow \infty } P(\Delta \Psi _{M,n}>\Delta \Psi _{M',n})=1$ for all $M \in \mathcal {I}$ and $M' \notin \mathcal {I}$, which implies $\lim _{n \rightarrow \infty } P(\Psi _{M,n}>\Psi _{M',n})=1$. Given conditions (A4),(a)–(c) on the weight functions, $\lim _{n \rightarrow \infty } P(W_{M,n}(\Psi _{M'',n}, M''\in \mathcal {M}) < W_{M',n}(\Psi _{M'',n}, M''\in \mathcal {M}))=1$ for all $M \in \mathcal {I} $ and at least one $M'\notin \mathcal {I}$. By condition (d) as $\Psi _{M',n} \rightarrow _p \infty $ and given that $\lim _{n \rightarrow \infty } P(\Psi _{M,n}>\Psi _{M',n})=1$, $\widehat{W}_{M',n} \rightarrow _p 1$, for at least one $M'\notin \mathcal {I}$. Finally, by condition (e) $\widehat{W}_{M,n} =o_p(1)$ as $n \rightarrow \infty $ for $M \in \mathcal {I}$. This completes the proof.

1.2 Proof of Theorem 1

Proof

For a $(\mathcal {S},\tau _n)$ convergent sequence of pseudo-true parameters $(\tilde{\theta }_n)$, we write

$$\begin{aligned} T(\varvec{Y}_n, \tilde{\theta }_{n})= & {} \sum _{M \in \mathcal {S}\cup \mathcal {I}} \widehat{W}_{M,n} n^{1/2} \{h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{\min ,n})\}\\= & {} \sum _{M \in \mathcal {S} } \widehat{W}_{M,n} \big [ n^{1/2} \{ h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{M,n}) \} + n^{1/2} \{ h(\tilde{\theta }_{M,n}) - h(\tilde{\theta }_{n})\} \big ] \\{} & {} \quad + \sum _{M \in \mathcal {I} } \widehat{W}_{M,n} \big [ n^{1/2} \{h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{M,n})\} + n^{1/2} \{ h(\tilde{\theta }_{M,n}) - h(\tilde{\theta }_{n}) \} \big ]. \end{aligned}$$

By Definition 1, for $M \in \mathcal {S}$ it holds that $|\tilde{\theta }_{n} - \tilde{\theta }_{M,n}|= {o}_p(n^{-1/2})$. Further, for $M \in \mathcal {S}\cup \mathcal {I}$ it holds by Sin and White (1996, Prop 4.1(a)) that $\hat{\theta }_{M,n} - \tilde{\theta }_{M,n} = O_p(n^{-1/2})$. This, combined with Proposition 1 which yields that $\widehat{W}_{M,n}$ is $o_p(1)$ for $M\in \mathcal {I}$ and Assumption (A6) guarantee that the sum over $M\in \mathcal {I}$ is $o_p(1)$. Therefore using the results obtained until now,

$$\begin{aligned} T(\varvec{Y}_n, \tilde{\theta }_{n})= \sum _{M \in \mathcal {S} } \widehat{W}_{M,n} n^{1/2} \{ h(\hat{\theta }_{M,n}) - h(\tilde{\theta }_{M,n}) \} + {o}_p(1). \end{aligned}$$

For each $M \in \mathcal {S}$, under (A1)–(A3) the strong consistency of $\hat{\theta }_{M,n}$ is guaranteed (see White 1994, Th 3.6). For the $|M|\times |M|$ matrix $\Sigma _{M,n}(\tilde{\theta }_{M,n})=[J_{M,n} (\tilde{\theta }_{M,n})]^{-1} K_{M,n}(\tilde{\theta }_{M,n}) [J_{M,n}(\tilde{\theta }_{M,n})]^{-1}$ it holds that since $|\tilde{\theta }_{n} - \tilde{\theta }_{M,n}|={o}_p(n^{-1/2})$ for each $M \in \mathcal {S}$, by (A3) (vi) $E[\Sigma _{M,n}(\tilde{\theta }_{M,n}) - \Sigma _{M,n}(\Pi _M\tilde{\theta }_{n})] \rightarrow 0 $. Due to the convergence of $\tilde{\theta }_n$ to $\tilde{\theta }_\infty $, the limit of the covariance matrix is $\Sigma _M= J_M(\Pi _M \tilde{\theta }_\infty )^{ -1} K_M(\Pi _M \tilde{\theta }_\infty )J_M(\Pi _M \tilde{\theta }_\infty )^{ -1}$. Then, if n tends to infinity, there is convergence in distribution

$$\begin{aligned} n^{1/2} \Pi _M(\hat{\theta }_{M,n}- \tilde{\theta }_{M,n}) \overset{d}{\rightarrow }\ [\Sigma _M]^{1/2} \Pi _M Z, \end{aligned}$$

where $Z\sim N_{p}(\varvec{0},I_{p})$. Moreover, using the form of the information criterion as in (5) and Taylor expansions of the log likelihood in model M around the full model’s log likelihood, it is clear that by assumption (A3) for all $M\in \mathcal {S}$ there is joint convergence of all $n^{1/2}\Pi _M(\hat{\theta }_{M,n}-\tilde{\theta }_{M,n})$ and all weights $\widehat{W}_{M,n}$ to corresponding limiting distributions. Consequently, for $M\in \mathcal {S}$, $\hat{\Lambda }_{M,n}$ can be expressed in the limit as a function of the same variable Z, leading to the limit version of the weights $\mathcal {W}_{M}(Z)$. Combining the above results, an application of the continuous mapping theorem leads to result of Theorem 1.

1.3 Proof of Theorem 2

Proof

We use the notation $E^*$, Var$^*$ and $d^*$ to represent the bootstrap expectation, variance and convergence in distribution, conditionally on $\varvec{Y}_n$ and X. The $|M'|\times |M'|$ matrices $K^*_{M',n}(\theta )$ and $J^*_{M',n}(\theta )$ are defined in the same way as $K_{M',n}(\theta )$ and $J_{M',n}(\theta )$ though using the bootstrap data $(Y_{i^*},x_{i^*}^\top )$ with $i^*\in S^*$, the set of n integers taken randomly with replacement from $\{1,\ldots ,n\}$. The sequence of pseudo-true parameters $(\tilde{\theta }_n)$ is $(\mathcal {S},\tau _n)$ convergent.

Following the proof of Theorem 9.1 of Claeskens (1999) for the one-step semiparametric bootstrap, by (A3)(v) choosing $\delta >0$ and the strong consistency of $\hat{\theta }_{M',n}$ for each $M' \in \mathcal {S}$, any $|M'|$-dimensional vector $\varvec{v}_{|M'|}$, with norm equal to 1,

$$\begin{aligned} \sum _{i=1}^{n} E^*[|n^{-1/2} \varvec{v}^\top _{|M'|} \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})|^{2+\delta } ] = {O}_p(n^{-\delta /2}). \end{aligned}$$

(12)

The resampling scheme ensures that $E^*[K^*_{M',n}(\hat{\theta }_{M',n})] = K_{M',n}(\hat{\theta }_{M',n})$. Applying Theorem 2.9 of Iverson and Randles (1989) to each of the $|M'|^2$ components of $K_{M',n}(\hat{\theta }_{M',n})$, by assumptions (A3)(iv,vi), the strong consistency of $\hat{\theta }_{M',n}$ and the fact that $\hat{\theta }_{M',n}-\tilde{\theta }_{n} = {O}_p(n^{-1/2})$ (using Proposition 4.1 of (Sin and White 1996) and proof of Theorem 1) this implies that conditional on $\varvec{Y_n}$ and X, $E^*[K_{M',n}(\hat{\theta }_{M',n})]-K_{M',n}(\tilde{\theta }_{n})$ converges to zero as $n \rightarrow \infty $. Using (12) with a $|M'|$-dimensional vector $\varvec{v}_{|M'|}$, this implies the Liapunov’s condition,

$$\begin{aligned} \frac{\sum _{i=1}^{n} E^*[|n^{-1/2} \varvec{v}_{|M'|}^\top \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})|^{2+\delta } ]}{(\sum _{i=1}^{n} E^*[\{n^{-1/2} \varvec{v}_{|M'|}^\top \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})\}^2 ])^{1+\delta /2}} \rightarrow 0. \end{aligned}$$

By the Crámer-Wold theorem then,

$$\begin{aligned} n^{-1/2} \sum _{i=1}^{n} \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n}) \overset{d^*}{\rightarrow }\ N_{|M'|}(\varvec{0},K_{M'}(\tilde{\theta }_{\infty })). \end{aligned}$$

Also by Theorem 2.9 of Iverson and Randles (1989), $E^*[J_{M',n}^*(\hat{\theta }_{M',n})] -J_{M',n}(\tilde{\theta }_n) \rightarrow 0 $ and Var$^* [J_{M',n}^*(\hat{\theta }_{M',n})]= {O}_p(n^{-1})$ such that $J_{M',n}^*(\hat{\theta }_{M',n})-J_{M'}(\tilde{\theta }_\infty ) \rightarrow 0 $ in bootstrap probability.

Combining the above results, by Slutsky’s theorem, as n tends to infinity,

$$\begin{aligned} n^{1/2} \bigg [\sum _{i=1}^{n} \ddot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n})\bigg ]^{-1} \sum _{i=1}^{n} \dot{\ell }_{M'}(Y_{i^*};\hat{\theta }_{M',n}) \overset{d^*}{\rightarrow }\ [\Sigma _{M'}]^{1/2} \Pi _{M'} Z, \end{aligned}$$

with $Z \sim N_p( \varvec{0},I_p)$.

Since $\overline{\mathcal {S}}$ is a subset of $\overline{\mathcal {M}}$ that is closed under intersections, we define $M_{\min }$ as the most parsimonious model in $\overline{\mathcal {S}}$, that is, $|M_{\min }|< |M'|$ for all $M' \in \overline{\mathcal {S}}\setminus M_{\min }$. Because the closedness under intersections, $M_{\min }$ is a submodel of each $M' \in \overline{\mathcal {S}}{\setminus } M_{\min }$. Then, $\hat{\theta }^{(M_{\min })}_{M',n} = \hat{\theta }_{M_{\min },n}$ and by Sin and White (1996, [Prop. 4.1]), $\hat{\theta }_{M_{\min },n}= \tilde{\theta }_{M_{\min },n} + {O}_p(n^{-1/2})$.

Now, we study the bootstrap information criteria and the weights . The semiparametric sampling scheme is such that for any $j^*\in S^*$, $E^*[ \dot{\ell }_{M_\textrm{full}}(Y_{j^*};\hat{\theta }_{M',n})] = n^{-1}\dot{\ell }_{M_\textrm{full}}(\varvec{Y}_{n};\hat{\theta }_{M',n})$ and $E^*[ \ddot{\ell }_{M_\textrm{full}}(Y_{j^*};\hat{\theta }_{M',n})] = n^{-1}\ddot{\ell }_{M_\textrm{full}}(\varvec{Y}_{n};\hat{\theta }_{M',n})$. Therefore, as $n \rightarrow \infty $, tends to zero. Under (A3)(ii,iv) and (A4) for the weight functions, as $n \rightarrow \infty $, converges in bootstrap distribution to the random variable $\mathcal {W}_{M'}(Z)$. An application of Polya’s theorem (e.g. Rao 1962, Lemma 3.2) yields that for $M_{\min }$

$$\begin{aligned} \sup _{x\in \mathbb {R}^q}\big |P( n^{1/2} \{\hat{\theta }^{* (M_{\min })}_{h,\textrm{avg},n} - h(\hat{\theta }_{M_{\min },n})\}\le x \mid \varvec{Y}_n ) -H_n(x) \big |= 0. \end{aligned}$$

Finally, by Proposition 4.2 (b) of Sin and White (1996), if $\widetilde{\Psi }_{M,n}$ is an information criterion satisfying condition (A5) and $\hat{\iota }_{M,n}$ is a weight that also satisfies condition (A4), then $\hat{\iota }_{M,n}$ consistently selects and assigns higher weight to $M_{\min }$ such that $\hat{\iota }_{M',n}=o_p(1)$ for $M' \in \overline{\mathcal {S}}{\setminus } M_{\min }$. Theorem 2 follows by combining the above results.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Garcia-Angulo, A.C., Claeskens, G. Bootstrap for inference after model selection and model averaging for likelihood models. Metrika (2024). https://doi.org/10.1007/s00184-024-00956-2

Download citation

Received: 21 January 2023
Accepted: 28 January 2024
Published: 05 March 2024
DOI: https://doi.org/10.1007/s00184-024-00956-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bootstrap for inference after model selection and model averaging for likelihood models

Abstract

Access this article

Similar content being viewed by others

Statistical estimation in the presence of possibly incorrect model assumptions