## Abstract

A one-step semiparametric bootstrap procedure is constructed to estimate the distribution of estimators after model selection and of model averaging estimators with data-dependent weights. The method is generally applicable to non-normal models. Misspecification is allowed for all candidate parametric models. The semiparametric bootstrap estimator is shown to be consistent within specific regions such that the good and the bad candidate models are separated. Simulation studies exemplify that the bootstrap procedure leads to short confidence intervals with a good coverage.

### Similar content being viewed by others

## References

Aerts M, Claeskens G (2001) Bootstrap tests for misspecified models, with application to clustered binary data. Comput Stat Data Anal 36(3):383–401

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B, Csáki F (eds) Second international symposium on information theory. Akadémiai Kiadó, Budapest, pp 267–281

Bachoc F, Preinerstorfer D, Steinberger L (2020) Uniformly valid confidence intervals post-model-selection. Ann Stat 48(1):440–463

Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547

Berk R, Brown L, Buja A et al (2013) Valid post-selection inference. Ann Stat 41(2):802–837

Camponovo L (2015) On the validity of the pairs bootstrap for lasso estimators. Biometrika 102(4):981–987

Charkhi A, Claeskens G (2018) Asymptotic post-selection inference for the Akaike information criterion. Biometrika 105(3):645–664

Claeskens G (1999) Smoothing techniques and bootstrap methods for multiparameter likelihood models. Ph.D. Thesis, Limburgs Universitair Centrum, Diepenbeek

Claeskens G, Hjort N (2003) The focused information criterion. J Am Stat Assoc 98:900–916. With discussion and a rejoinder by the authors

Danilov D, Magnus JR (2004) On the harm that ignoring pretesting can cause. J Econom 122(1):27–46

Efron B (2014) Estimation and accuracy after model selection. J Am Stat Assoc 109(507):991–1007

Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

Garcia-Angulo A, Claeskens G (2023) Exact uniformly most powerful post-selection confidence distributions. Scand J Stat 50:358–382

Garcia-Angulo A, Claeskens G (2023) Optimal finite sample post-selection confidence distributions in generalized linear models. J Stat Plan Inference 222:66–77

Giurcanu MC (2012) Bootstrapping in non-regular smooth function models. J Multivar Anal 111:78–93

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879–899

Hu F, Zidek JV (1995) A bootstrap based on the estimating equations of the linear model. Biometrika 82(2):263–275

Iverson HK, Randles RH (1989) The effects on convergence of substituting parameter estimates into U-statistics and other families of statistics. Probab Theory Relat Fields 81(3):453–471

Kabaila P (2009) The coverage properties of confidence regions after model selection. Int Stat Rev 77(3):405–414

Kabaila P, Welsh AH, Abeysekera W (2016) Model-averaged confidence intervals. Scand J Stat 43(1):35–48

Lee SMS, Wu Y (2018) A bootstrap recipe for post-model-selection inference under linear regression models. Biometrika 105(4):873–890

Leeb H, Pötscher BM (2008) Can one estimate the unconditional distribution of post-model-selection estimators? Economet Theor 24(2):338–376

Lehmann EL, Romano JP (2022) Bootstrap and subsampling methods. Springer, Cham, pp 863–918

Lu W, Goldberg Y, Fine JP (2012) On the robustness of the adaptive lasso to model misspecification. Biometrika 99(3):717–731

Pötscher BM (2009) Confidence sets based on sparse estimators are necessarily large. Sankhyā Indian J Stat Ser A (2008-) 71(1):1–18

Rao RR (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33(2):659–680

Rossouw JE, Du Plessis JP, Benadé AJ et al (1983) Coronary risk factor screening in three rural communities. The CORIS baseline study. S Afr Med J Suid-Afrikaanse Tydskrif Vir Geneeskunde 64(12):430–436

Sin CY, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econom 71(1):207–225

Taylor J, Tibshirani R (2018) Post-selection inference for l1-penalized likelihood models. Can J Stat 46(1):41–61

Tian X, Taylor J (2018) Selective inference with a randomized response. Ann Stat 46(2):679–710

Tibshirani R (1996) Regression Shrinkage and selection via the Lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288

Tibshirani RJ, Rinaldo A, Tibshirani R et al (2018) Uniform asymptotic inference and the bootstrap after model selection. Ann Stat 46(3):1255–1287

Wang H, Zhou SZF (2013) Interval estimation by frequentist model averaging. Commun Stat Theory Methods 42(23):4342–4356

White H (1994) Estimation, inference and specification analysis. Cambridge University Press, Cambridge

Zou H (2006) The adaptive lasso and its Oracle properties. J Am Stat Assoc 101(476):1418–1429

## Acknowledgements

Support from the Research Foundation Flanders and the KU Leuven Research Fund Project C16/20/002 is acknowledged. The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government.

## Author information

### Authors and Affiliations

### Corresponding authors

## Ethics declarations

### Conflicts of interest

The authors declare that they have no conflict of interest.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendix. Proofs

### Appendix. Proofs

### 1.1 Proof of Proposition 1

### Proof

Let \(\Delta \Psi _{M,n} = \Psi _{M,n} - \Psi _{M_\textrm{full},n} = c \{\ell _{n}(\varvec{Y}_n;\hat{\theta }_{n}) - \ell _{n}(\varvec{Y}_n;\hat{\theta }_{M,n})\} - (\kappa _{M_\textrm{full},n}-\kappa _{M,n})\). By (A1)–(A3)(i,iii) \(\hat{\theta }_{M,n}-\tilde{\theta }_{M,n}={o}_{a.s.}(1)\). By (A3)(ii,iv) and Proposition 4.2(a) of Sin and White (1996), if \(\kappa _{M,n}= {o}_p(n)\), then \(\Delta \Psi _{M,n} = n \Delta _{M,n} + {o}_p(n)\). If additionally, \(\liminf _{n \rightarrow \infty } \Delta _{M,n} > 0 \), then also by Proposition 4.2(a) of Sin and White (1996) \(\lim _{n \rightarrow \infty } P(\Delta \Psi _{M,n}>0)=1\) for \(M \in \mathcal {I}\). On the other hand, for all \(M' \notin \mathcal {I}\), \(\Delta \Psi _{M',n} \rightarrow 0\) as \(n \rightarrow \infty \) at different rates. Therefore, \(\lim _{n \rightarrow \infty } P(\Delta \Psi _{M,n}>\Delta \Psi _{M',n})=1\) for all \(M \in \mathcal {I}\) and \(M' \notin \mathcal {I}\), which implies \(\lim _{n \rightarrow \infty } P(\Psi _{M,n}>\Psi _{M',n})=1\). Given conditions (A4),(a)–(c) on the weight functions, \(\lim _{n \rightarrow \infty } P(W_{M,n}(\Psi _{M'',n}, M''\in \mathcal {M}) < W_{M',n}(\Psi _{M'',n}, M''\in \mathcal {M}))=1\) for all \(M \in \mathcal {I} \) and at least one \(M'\notin \mathcal {I}\). By condition (d) as \(\Psi _{M',n} \rightarrow _p \infty \) and given that \(\lim _{n \rightarrow \infty } P(\Psi _{M,n}>\Psi _{M',n})=1\), \(\widehat{W}_{M',n} \rightarrow _p 1\), for at least one \(M'\notin \mathcal {I}\). Finally, by condition (e) \(\widehat{W}_{M,n} =o_p(1)\) as \(n \rightarrow \infty \) for \(M \in \mathcal {I}\). This completes the proof.

### 1.2 Proof of Theorem 1

### Proof

For a \((\mathcal {S},\tau _n)\) convergent sequence of pseudo-true parameters \((\tilde{\theta }_n)\), we write

By Definition 1, for \(M \in \mathcal {S}\) it holds that \(|\tilde{\theta }_{n} - \tilde{\theta }_{M,n}|= {o}_p(n^{-1/2})\). Further, for \(M \in \mathcal {S}\cup \mathcal {I}\) it holds by Sin and White (1996, Prop 4.1(a)) that \(\hat{\theta }_{M,n} - \tilde{\theta }_{M,n} = O_p(n^{-1/2})\). This, combined with Proposition 1 which yields that \(\widehat{W}_{M,n}\) is \(o_p(1)\) for \(M\in \mathcal {I}\) and Assumption (A6) guarantee that the sum over \(M\in \mathcal {I}\) is \(o_p(1)\). Therefore using the results obtained until now,

For each \(M \in \mathcal {S}\), under (A1)–(A3) the strong consistency of \(\hat{\theta }_{M,n}\) is guaranteed (see White 1994, Th 3.6). For the \(|M|\times |M|\) matrix \(\Sigma _{M,n}(\tilde{\theta }_{M,n})=[J_{M,n} (\tilde{\theta }_{M,n})]^{-1} K_{M,n}(\tilde{\theta }_{M,n}) [J_{M,n}(\tilde{\theta }_{M,n})]^{-1}\) it holds that since \(|\tilde{\theta }_{n} - \tilde{\theta }_{M,n}|={o}_p(n^{-1/2})\) for each \(M \in \mathcal {S}\), by (A3) (vi) \(E[\Sigma _{M,n}(\tilde{\theta }_{M,n}) - \Sigma _{M,n}(\Pi _M\tilde{\theta }_{n})] \rightarrow 0 \). Due to the convergence of \(\tilde{\theta }_n\) to \(\tilde{\theta }_\infty \), the limit of the covariance matrix is \(\Sigma _M= J_M(\Pi _M \tilde{\theta }_\infty )^{ -1} K_M(\Pi _M \tilde{\theta }_\infty )J_M(\Pi _M \tilde{\theta }_\infty )^{ -1}\). Then, if *n* tends to infinity, there is convergence in distribution

where \(Z\sim N_{p}(\varvec{0},I_{p})\). Moreover, using the form of the information criterion as in (5) and Taylor expansions of the log likelihood in model *M* around the full model’s log likelihood, it is clear that by assumption (A3) for all \(M\in \mathcal {S}\) there is joint convergence of all \(n^{1/2}\Pi _M(\hat{\theta }_{M,n}-\tilde{\theta }_{M,n})\) and all weights \(\widehat{W}_{M,n}\) to corresponding limiting distributions. Consequently, for \(M\in \mathcal {S}\), \(\hat{\Lambda }_{M,n}\) can be expressed in the limit as a function of the same variable *Z*, leading to the limit version of the weights \(\mathcal {W}_{M}(Z)\). Combining the above results, an application of the continuous mapping theorem leads to result of Theorem 1.

### 1.3 Proof of Theorem 2

### Proof

We use the notation \(E^*\), Var\(^*\) and \(d^*\) to represent the bootstrap expectation, variance and convergence in distribution, conditionally on \(\varvec{Y}_n\) and *X*. The \(|M'|\times |M'|\) matrices \(K^*_{M',n}(\theta )\) and \(J^*_{M',n}(\theta )\) are defined in the same way as \(K_{M',n}(\theta )\) and \(J_{M',n}(\theta )\) though using the bootstrap data \((Y_{i^*},x_{i^*}^\top )\) with \(i^*\in S^*\), the set of *n* integers taken randomly with replacement from \(\{1,\ldots ,n\}\). The sequence of pseudo-true parameters \((\tilde{\theta }_n)\) is \((\mathcal {S},\tau _n)\) convergent.

Following the proof of Theorem 9.1 of Claeskens (1999) for the one-step semiparametric bootstrap, by (A3)(v) choosing \(\delta >0\) and the strong consistency of \(\hat{\theta }_{M',n}\) for each \(M' \in \mathcal {S}\), any \(|M'|\)-dimensional vector \(\varvec{v}_{|M'|}\), with norm equal to 1,

The resampling scheme ensures that \(E^*[K^*_{M',n}(\hat{\theta }_{M',n})] = K_{M',n}(\hat{\theta }_{M',n})\). Applying Theorem 2.9 of Iverson and Randles (1989) to each of the \(|M'|^2\) components of \(K_{M',n}(\hat{\theta }_{M',n})\), by assumptions (A3)(iv,vi), the strong consistency of \(\hat{\theta }_{M',n}\) and the fact that \(\hat{\theta }_{M',n}-\tilde{\theta }_{n} = {O}_p(n^{-1/2})\) (using Proposition 4.1 of (Sin and White 1996) and proof of Theorem 1) this implies that conditional on \(\varvec{Y_n}\) and *X*, \(E^*[K_{M',n}(\hat{\theta }_{M',n})]-K_{M',n}(\tilde{\theta }_{n})\) converges to zero as \(n \rightarrow \infty \). Using (12) with a \(|M'|\)-dimensional vector \(\varvec{v}_{|M'|}\), this implies the Liapunov’s condition,

By the Crámer-Wold theorem then,

Also by Theorem 2.9 of Iverson and Randles (1989), \(E^*[J_{M',n}^*(\hat{\theta }_{M',n})] -J_{M',n}(\tilde{\theta }_n) \rightarrow 0 \) and Var\(^* [J_{M',n}^*(\hat{\theta }_{M',n})]= {O}_p(n^{-1})\) such that \(J_{M',n}^*(\hat{\theta }_{M',n})-J_{M'}(\tilde{\theta }_\infty ) \rightarrow 0 \) in bootstrap probability.

Combining the above results, by Slutsky’s theorem, as *n* tends to infinity,

with \(Z \sim N_p( \varvec{0},I_p)\).

Since \(\overline{\mathcal {S}}\) is a subset of \(\overline{\mathcal {M}}\) that is closed under intersections, we define \(M_{\min }\) as the most parsimonious model in \(\overline{\mathcal {S}}\), that is, \(|M_{\min }|< |M'|\) for all \(M' \in \overline{\mathcal {S}}\setminus M_{\min }\). Because the closedness under intersections, \(M_{\min }\) is a submodel of each \(M' \in \overline{\mathcal {S}}{\setminus } M_{\min }\). Then, \(\hat{\theta }^{(M_{\min })}_{M',n} = \hat{\theta }_{M_{\min },n}\) and by Sin and White (1996, [Prop. 4.1]), \(\hat{\theta }_{M_{\min },n}= \tilde{\theta }_{M_{\min },n} + {O}_p(n^{-1/2})\).

Now, we study the bootstrap information criteria and the weights . The semiparametric sampling scheme is such that for any \(j^*\in S^*\), \(E^*[ \dot{\ell }_{M_\textrm{full}}(Y_{j^*};\hat{\theta }_{M',n})] = n^{-1}\dot{\ell }_{M_\textrm{full}}(\varvec{Y}_{n};\hat{\theta }_{M',n})\) and \(E^*[ \ddot{\ell }_{M_\textrm{full}}(Y_{j^*};\hat{\theta }_{M',n})] = n^{-1}\ddot{\ell }_{M_\textrm{full}}(\varvec{Y}_{n};\hat{\theta }_{M',n})\). Therefore, as \(n \rightarrow \infty \), tends to zero. Under (A3)(ii,iv) and (A4) for the weight functions, as \(n \rightarrow \infty \), converges in bootstrap distribution to the random variable \(\mathcal {W}_{M'}(Z)\). An application of Polya’s theorem (e.g. Rao 1962, Lemma 3.2) yields that for \(M_{\min }\)

Finally, by Proposition 4.2 (b) of Sin and White (1996), if \(\widetilde{\Psi }_{M,n}\) is an information criterion satisfying condition (A5) and \(\hat{\iota }_{M,n}\) is a weight that also satisfies condition (A4), then \(\hat{\iota }_{M,n}\) consistently selects and assigns higher weight to \(M_{\min }\) such that \(\hat{\iota }_{M',n}=o_p(1)\) for \(M' \in \overline{\mathcal {S}}{\setminus } M_{\min }\). Theorem 2 follows by combining the above results.

## Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

## About this article

### Cite this article

Garcia-Angulo, A.C., Claeskens, G. Bootstrap for inference after model selection and model averaging for likelihood models.
*Metrika* (2024). https://doi.org/10.1007/s00184-024-00956-2

Received:

Accepted:

Published:

DOI: https://doi.org/10.1007/s00184-024-00956-2