Focused vector information criterion model selection and model averaging regression with missing response

Sun, Zhimeng; Su, Zhi; Ma, Jingyi

doi:10.1007/s00184-013-0446-8

Focused vector information criterion model selection and model averaging regression with missing response

Published: 19 June 2013

Volume 77, pages 415–432, (2014)
Cite this article

Metrika Aims and scope Submit manuscript

Zhimeng Sun¹,
Zhi Su¹ &
Jingyi Ma¹

365 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, a focused vector information criterion for model selection and model averaging is considered for the linear model with missing response. Based on the focused information criterion of Hjort and Claeskens (J Am Stat Assoc 98:879–945, 2003) and imputation idea, a frequentist model averaging estimator for a focused vector of a linear model is proposed, and the estimator is shown to be root-n consistent and asymptotical normal. In addition, the proposed focused vector information criterion is designed for focused multidimensional parameter, which is a little different from conventional focused information criterion for one dimensional focused parameter. A model averaging based confidence interval estimation method and estimation of the mean of the response are also proposed. A simulation study is conducted to investigate the performance of the proposed estimator with finite sample sizes and a real data example is presented to illustrate its application in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Article Open access 30 January 2020

Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note

Article 05 May 2015

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Article 14 November 2023

References

Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 22:203–217
Google Scholar
Azar B (2002) Finding a solution for missing data. Monit Psychol 33:70
Google Scholar
Bradic J, Fan J, Wang W (2011) Penalized composite quasi-likelihood for ultrahigh-dimensional variable selection. J Roy Stat Soc Ser B 73:325–349
Article MathSciNet Google Scholar
Cavanaugh J, Shumway R (1998) An akaike information criterion for model selection in the presence of incomplete data. J Stat Plan Inf 67:45–65
Article MATH MathSciNet Google Scholar
Claeskens G, Consentino F (2008) Variable selection with incomplete covariate data. Biometrics 64:1062–1069
Article MATH MathSciNet Google Scholar
Du J, Zhang ZZ, Xie TF (2012) Model averaging in quantile regression. Communications in Statistics-Theory and Methods, to appear
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MATH MathSciNet Google Scholar
Fan JQ, Li RZ (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723
Article MATH MathSciNet Google Scholar
Hens N, Aerts MGM (2006) Model selection for incomplete and design based samples. Stat Med 25:2502–2520
Article MathSciNet Google Scholar
Huang JZ, Wu CO, Zhou L (2002) Varying-coecient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111–128
Article MATH MathSciNet Google Scholar
Hjort NL, Claeskens G (2003) Frequentist model average estimators (with discussion). J Am Stat Assoc 98:879–945
Article MATH MathSciNet Google Scholar
Hjort NL, Claeskens G (2006) Focussed information criteria and model averaging for Coxs hazard regression model. J Am Stat Assoc 101:1449–1464
Article MATH MathSciNet Google Scholar
Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91:222–230
Article MATH Google Scholar
Liang H, Wang S, Carroll RJ (2007) Partially linear models with missing response variables and error-prone covariates. Biometrika 94:185–198
Article MATH MathSciNet Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Leeb H, Potscher BM (2003) The finite sample distribution of post-model-selection estimators and uniform versus non-uniform approximations. Econ Theory 19:100–142
Article MATH MathSciNet Google Scholar
Leeb H, Potscher BM (2008) Can one estimate the unconditional distribution of post-model-selection estimators. Econ Theory 24:38–376
Google Scholar
Leung G, Barron AR (2006) Information theory and mixing least-squares regressions. IEEE Trans Inf Theory 52:3396–3410
Article MathSciNet Google Scholar
Meinshausen N, Buhlmann P (2006) High dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462
Article MATH MathSciNet Google Scholar
Meinshausen N, Yu B (2009) Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat 37:246–270
Article MATH MathSciNet Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MATH Google Scholar
Schomakera M, Wan ATK (2010) Frequentist model averaging with missing observations. Comput Stat Data Anal 54:3336–33474
Article Google Scholar
Sun ZM, Zhang ZZ, Du J (2012) Semiparametric analysis of isotonic errors-in-variables regression models with missing response. Commun Stat Theory Methods 41:2034–2060
Article MATH MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B 58:267–288
MATH MathSciNet Google Scholar
Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with application to statistics. Springer, Berlin
Book Google Scholar
Wang QH, Sun ZH (2007) Estimation in partially linear models with missing responses at random. J Multivar Anal 98:1470–1493
Article MATH Google Scholar
Wang H, Zhou, Z FS (2012) Interval estimation by frequentist model averaging. Communications in Statistics-Theory and Methods, forthcoming
Xue LG (2009) Empirical likelihood for linear models with missing responses. J Multivar Anal 100:1353–1366
Article MATH Google Scholar
Yang Y (2001) Adaptive regression by mixing. J Am Stat Assoc 96:574–586
Article MATH Google Scholar
Yang YP, Xue LG, Cheng WH (2009) Empirical likelihood for a partially linear model with covariate data missing at random. J Stat Plan Inf 139:4143–4153
Article MATH MathSciNet Google Scholar
Zhang H, Wahba G, Lin Y, Voelker M, Ferris M, Klein R, Klein B (2004) Variable selection and model building via likelihood basis pursuit. J Am Stat Assoc 99:659–672
Article MATH MathSciNet Google Scholar
Zhang CH, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594
Article MATH Google Scholar
Zhang XY, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39(1):174–200
Article MATH MathSciNet Google Scholar
Zhao PX, Xue LG (2010) Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. J Multivar Anal 101(8):1872–1883
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The author is grateful to anonymous referees for their careful reading and insightful comments on this paper. This work is supported by National Natural Science Foundation of China (No. 71101157); Program for Innovation Research in Central University of Finance and Economics; 2012 National Project of Statistical Research (2012LY138); Foundation of Academic Discipline Program at Central University of Finance and Economics; MOE (Ministry of Education in China) Project of Humanities and Social Sciences For Youth (10YJC790220); Fund of 211 Project at Central University of Finance and Economics.

Author information

Authors and Affiliations

School of Statistics, Central University of Finance and Economics, Beijing, 100124, China
Zhimeng Sun, Zhi Su & Jingyi Ma

Authors

Zhimeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Su
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhimeng Sun.

Appendix: Proofs

The following Lemma 1 is needed to prove the theorems.

Lemma 1

Under condition C1, we have

$$\begin{aligned} \begin{array}{lll}\displaystyle \sqrt{n}(\tilde{\beta }_n-\beta _0)=\frac{\Sigma _0^{-1}}{\sqrt{n}}\sum \limits _{i=1}^n\delta _iX_i\varepsilon _i+o_p(1). \end{array} \end{aligned}$$

Proof of Lemma 1

It is easily seen that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\tilde{\beta }_{n}-\beta _0)&= \displaystyle \left( \frac{1}{n}\sum \limits _{i=1}^n\delta _iX_iX_i^\top \right) ^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\delta _iX_i\varepsilon _i. \end{array} \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^n\delta _iX_iX_i^\top \stackrel{P}{\longrightarrow }\Sigma _0. \end{aligned}$$

Lemma 1 then follows Directly.

Proof of Theorem 1

Denote $A_{n}=\frac{1}{n}\sum _{i=1}^{n}\Pi _SX_{i}X_{i}^{\top }\Pi _S^{\top }$, with a simple calculation, we obtain

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\beta }_{S}-\Pi _S\beta _0)&{}=&{}\displaystyle A_{n}^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\Pi _SX_i\left( \hat{H}_{i}-X_i^{\top }\Pi _S^{\top }\Pi _S\beta _0\right) \\ &{}=&{}\displaystyle \displaystyle A_{n}^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\Pi _SX_i\left( \delta _{i}\varepsilon _i-\delta _{i}X_i^{\top }(\tilde{\beta }_n-\beta _0)\!+\!X_i^{\top }\tilde{\beta }_n-X_i^{\top }\Pi _S^{\top }\Pi _S\beta _0\right) \\ &{}=&{}\displaystyle B_{n1}+B_{n2}+B_{n3}.\end{array} \end{aligned}$$

where

$$\begin{aligned} \begin{array}{lll} B_{n1}&{}=&{}\displaystyle A_{n}^{-1}\Pi _S\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i,\\ B_{n2}&{}=&{}\displaystyle \displaystyle A_{n}^{-1}\frac{1}{\sqrt{n}}\Pi _S\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }(\tilde{\beta }_n-\beta _0),\\ B_{n3}&{}=&{}\displaystyle \displaystyle A_{n}^{-1}\frac{1}{\sqrt{n}}\Pi _S\sum \limits _{i=1}^{n}X_iX_i^{\top }(I-\Pi _S^{\top }\Pi _S)\beta _0. \end{array} \end{aligned}$$

Under Condition C1, it is not difficult to get

$$\begin{aligned} A_{n}\stackrel{P}{\longrightarrow }\Pi _S\Sigma \Pi _S^{\top }. \end{aligned}$$

Thus,with Lemma 1 in hand, we have

$$\begin{aligned} B_{n2}&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\right) \sqrt{n}(\tilde{\beta }_n-\beta _0)\\&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\right) \frac{\Sigma _0^{-1}}{\sqrt{n}}\sum \limits _{i=1}^n\delta _iX_i\varepsilon _i+o_p(1).\\ B_{n3}&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}X_iX_i^{\top }\right) (I-\Pi _S^{\top }\Pi _S)\sqrt{n}\beta _0\\&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}X_iX_i^{\top }\right) (I-\Pi _S^{\top }\Pi _S)\sqrt{n}\left( \begin{array}{c} \ddot{ \beta }_0\\ \eta /\sqrt{n} \end{array} \right) \\&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}X_iX_i^{\top }\right) (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) \\&= \displaystyle (\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1).\\ \end{aligned}$$

The second last equation follows from $(I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} \ddot{ \beta }_0\\ 0 \end{array} \right) =0$. Thus, we have

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\beta }_{S}-\Pi _S\beta _0)&{}=&{}\displaystyle A_{n}^{-1}\Pi _S\left( \Sigma _0+\left( \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\right) \right) \frac{\Sigma _0^{-1}}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i\\ &{}&{}\displaystyle +(\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\stackrel{P}{\longrightarrow }\Sigma -\Sigma _0. \end{aligned}$$

By the central limit theorem and slutsky’ theorem, we have

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\beta }_{S}-\Pi _S\beta _0)&{}\stackrel{d}{\longrightarrow }&{}\displaystyle (\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma \Sigma _0^{-1}G +(\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) . \end{array} \end{aligned}$$

And a slight transformation of the last equation completes the proof of Theorem 1.

Proof of Theorem 2

Theorem 2 is not difficult to arrive by the delta method and Theorem 3 of Van der Vaart and Wellner (1996) based on Theorem 1.

Proof of Theorem 3

It can be verified that

$$\begin{aligned} \sqrt{n} (\hat{\phi }-\phi _0)&= \sum \limits _S w(S|\hat{\eta })\sqrt{n}(\hat{\phi }_S-\phi _0)\\&= \sum \limits _S w(S|\hat{\eta })\left\{ \phi ^{\prime }_{\beta _0}\Pi ^{\top }_S(\Pi _S\Sigma \Pi _S^{\top })^{-1} \Pi _S\Sigma \Sigma ^{-1}_0 G+\phi ^{\prime }_{\beta _0}A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1)\right\} \\&= \sum \limits _S w(S|\hat{\eta }) \phi ^{\prime }_{\beta _0}\Pi ^{\top }_S(\Pi _S\Sigma \Pi _S^{\top })^{-1} \Pi _S\Sigma \Sigma ^{-1}_0 G+\sum \limits _S w(S|\hat{\eta })\phi ^{\prime }_{\beta _0}A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{aligned}$$

Since $\hat{\eta }\stackrel{d}{\longrightarrow }\Delta $, we have

$$\begin{aligned} w(S|\hat{\eta })\stackrel{d}{\longrightarrow }w(S|\Delta ) \end{aligned}$$

by the continuous mapping theorem, and then Theorem 3 follows directly.

Proof of Theorem 4

It is easily seen that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\theta }-\theta )&{}=&{}\displaystyle \frac{1}{\sqrt{n}} \sum \limits _{i=1}^{n}\{\delta _iY_i+(1-\delta _i)X^{\top }_i\hat{\beta }_{sfvic}-\theta \}\\ &{}=&{}\displaystyle \displaystyle \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}\varepsilon _i +\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}(X^{\top }_i\beta _0-\theta )+\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i(\hat{\beta }_{sfvic}-\beta _0)\\ &{}\doteq &{}\displaystyle H_{n1}+H_{n2}+H_{n3}.\\ \end{array} \end{aligned}$$

For $H_{n3}$, we have

$$\begin{aligned} H_{n3}&= \displaystyle \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sqrt{n}(\hat{\beta }_{sfvic}-\beta _0)\nonumber \\&= \displaystyle \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sum \limits _S w(S|\hat{\eta })\sqrt{n}(\Pi ^{\top }_{S}\hat{\beta }_{S}-\beta _0). \end{aligned}$$

(2)

Note that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\Pi ^{\top }_{S}\hat{\beta }_{S}-\beta _0)&{}=&{}\displaystyle \Pi _S(\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma \Sigma _0^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i +A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \nonumber \\ \end{aligned}$$

(3)

Plugging (3) into (2), we have

$$\begin{aligned} \begin{array}{lll} H_{n3} &{}=&{}\displaystyle \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sum \limits _S w(S|\hat{\eta }) R_S\Sigma \Sigma _0^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i\\ &{}&{}\displaystyle +\frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sum \limits _S w(S|\hat{\eta })A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \end{aligned}$$

Since the term $\frac{1}{n}\sum _{i=1}^{n}(1-\delta _{i})X^{\top }_i\stackrel{P}{\longrightarrow }(EX^{\top }-E\delta X^{\top })$, it is not difficult to get

$$\begin{aligned} \begin{array}{lll} H_{n3} &{}=&{}\displaystyle (EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta ) R_S\Sigma \Sigma _0^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i\\ &{}&{}\displaystyle +(EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta )A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \end{aligned}$$

This together with $H_{n1}$ and $H_{n2}$ implies that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\theta }-\theta ) &{}=&{}\displaystyle \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}[\delta _{i}-(EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta ) R_S\Sigma \Sigma _0^{-1}\delta _{i}X_i]\varepsilon _i \\ &{}&{}\displaystyle +\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}(X^{\top }_i\beta _0-\theta ) +(EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta )A_S\left( \begin{array}{c} 0\\ \eta \end{array}\right) . \end{array} \end{aligned}$$

Then, Theorem 4 follows from the central limit theorem. This completes the whole of the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Z., Su, Z. & Ma, J. Focused vector information criterion model selection and model averaging regression with missing response. Metrika 77, 415–432 (2014). https://doi.org/10.1007/s00184-013-0446-8

Download citation

Received: 21 October 2012
Published: 19 June 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s00184-013-0446-8

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Focused vector information criterion model selection and model averaging regression with missing response

Abstract

Access this article

Similar content being viewed by others

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Focused vector information criterion model selection and model averaging regression with missing response

Abstract

Access this article

Similar content being viewed by others

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Appendix: Proofs

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation