Skip to main content

Advertisement

Log in

Focused vector information criterion model selection and model averaging regression with missing response

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

In this paper, a focused vector information criterion for model selection and model averaging is considered for the linear model with missing response. Based on the focused information criterion of Hjort and Claeskens (J Am Stat Assoc 98:879–945, 2003) and imputation idea, a frequentist model averaging estimator for a focused vector of a linear model is proposed, and the estimator is shown to be root-n consistent and asymptotical normal. In addition, the proposed focused vector information criterion is designed for focused multidimensional parameter, which is a little different from conventional focused information criterion for one dimensional focused parameter. A model averaging based confidence interval estimation method and estimation of the mean of the response are also proposed. A simulation study is conducted to investigate the performance of the proposed estimator with finite sample sizes and a real data example is presented to illustrate its application in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 22:203–217

    Google Scholar 

  • Azar B (2002) Finding a solution for missing data. Monit Psychol 33:70

    Google Scholar 

  • Bradic J, Fan J, Wang W (2011) Penalized composite quasi-likelihood for ultrahigh-dimensional variable selection. J Roy Stat Soc Ser B 73:325–349

    Article  MathSciNet  Google Scholar 

  • Cavanaugh J, Shumway R (1998) An akaike information criterion for model selection in the presence of incomplete data. J Stat Plan Inf 67:45–65

    Article  MATH  MathSciNet  Google Scholar 

  • Claeskens G, Consentino F (2008) Variable selection with incomplete covariate data. Biometrics 64:1062–1069

    Article  MATH  MathSciNet  Google Scholar 

  • Du J, Zhang ZZ, Xie TF (2012) Model averaging in quantile regression. Communications in Statistics-Theory and Methods, to appear

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MATH  MathSciNet  Google Scholar 

  • Fan JQ, Li RZ (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723

    Article  MATH  MathSciNet  Google Scholar 

  • Hens N, Aerts MGM (2006) Model selection for incomplete and design based samples. Stat Med 25:2502–2520

    Article  MathSciNet  Google Scholar 

  • Huang JZ, Wu CO, Zhou L (2002) Varying-coecient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111–128

    Article  MATH  MathSciNet  Google Scholar 

  • Hjort NL, Claeskens G (2003) Frequentist model average estimators (with discussion). J Am Stat Assoc 98:879–945

    Article  MATH  MathSciNet  Google Scholar 

  • Hjort NL, Claeskens G (2006) Focussed information criteria and model averaging for Coxs hazard regression model. J Am Stat Assoc 101:1449–1464

    Article  MATH  MathSciNet  Google Scholar 

  • Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91:222–230

    Article  MATH  Google Scholar 

  • Liang H, Wang S, Carroll RJ (2007) Partially linear models with missing response variables and error-prone covariates. Biometrika 94:185–198

    Article  MATH  MathSciNet  Google Scholar 

  • Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Leeb H, Potscher BM (2003) The finite sample distribution of post-model-selection estimators and uniform versus non-uniform approximations. Econ Theory 19:100–142

    Article  MATH  MathSciNet  Google Scholar 

  • Leeb H, Potscher BM (2008) Can one estimate the unconditional distribution of post-model-selection estimators. Econ Theory 24:38–376

    Google Scholar 

  • Leung G, Barron AR (2006) Information theory and mixing least-squares regressions. IEEE Trans Inf Theory 52:3396–3410

    Article  MathSciNet  Google Scholar 

  • Meinshausen N, Buhlmann P (2006) High dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462

    Article  MATH  MathSciNet  Google Scholar 

  • Meinshausen N, Yu B (2009) Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat 37:246–270

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Schomakera M, Wan ATK (2010) Frequentist model averaging with missing observations. Comput Stat Data Anal 54:3336–33474

    Article  Google Scholar 

  • Sun ZM, Zhang ZZ, Du J (2012) Semiparametric analysis of isotonic errors-in-variables regression models with missing response. Commun Stat Theory Methods 41:2034–2060

    Article  MATH  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B 58:267–288

    MATH  MathSciNet  Google Scholar 

  • Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with application to statistics. Springer, Berlin

    Book  Google Scholar 

  • Wang QH, Sun ZH (2007) Estimation in partially linear models with missing responses at random. J Multivar Anal 98:1470–1493

    Article  MATH  Google Scholar 

  • Wang H, Zhou, Z FS (2012) Interval estimation by frequentist model averaging. Communications in Statistics-Theory and Methods, forthcoming

  • Xue LG (2009) Empirical likelihood for linear models with missing responses. J Multivar Anal 100:1353–1366

    Article  MATH  Google Scholar 

  • Yang Y (2001) Adaptive regression by mixing. J Am Stat Assoc 96:574–586

    Article  MATH  Google Scholar 

  • Yang YP, Xue LG, Cheng WH (2009) Empirical likelihood for a partially linear model with covariate data missing at random. J Stat Plan Inf 139:4143–4153

    Article  MATH  MathSciNet  Google Scholar 

  • Zhang H, Wahba G, Lin Y, Voelker M, Ferris M, Klein R, Klein B (2004) Variable selection and model building via likelihood basis pursuit. J Am Stat Assoc 99:659–672

    Article  MATH  MathSciNet  Google Scholar 

  • Zhang CH, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594

    Article  MATH  Google Scholar 

  • Zhang XY, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39(1):174–200

    Article  MATH  MathSciNet  Google Scholar 

  • Zhao PX, Xue LG (2010) Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. J Multivar Anal 101(8):1872–1883

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author is grateful to anonymous referees for their careful reading and insightful comments on this paper. This work is supported by National Natural Science Foundation of China (No. 71101157); Program for Innovation Research in Central University of Finance and Economics; 2012 National Project of Statistical Research (2012LY138); Foundation of Academic Discipline Program at Central University of Finance and Economics; MOE (Ministry of Education in China) Project of Humanities and Social Sciences For Youth (10YJC790220); Fund of 211 Project at Central University of Finance and Economics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhimeng Sun.

Appendix: Proofs

Appendix: Proofs

The following Lemma 1 is needed to prove the theorems.

Lemma 1

Under condition C1, we have

$$\begin{aligned} \begin{array}{lll}\displaystyle \sqrt{n}(\tilde{\beta }_n-\beta _0)=\frac{\Sigma _0^{-1}}{\sqrt{n}}\sum \limits _{i=1}^n\delta _iX_i\varepsilon _i+o_p(1). \end{array} \end{aligned}$$

Proof of Lemma 1

It is easily seen that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\tilde{\beta }_{n}-\beta _0)&= \displaystyle \left( \frac{1}{n}\sum \limits _{i=1}^n\delta _iX_iX_i^\top \right) ^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\delta _iX_i\varepsilon _i. \end{array} \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^n\delta _iX_iX_i^\top \stackrel{P}{\longrightarrow }\Sigma _0. \end{aligned}$$

Lemma 1 then follows Directly.

Proof of Theorem 1

Denote \(A_{n}=\frac{1}{n}\sum _{i=1}^{n}\Pi _SX_{i}X_{i}^{\top }\Pi _S^{\top }\), with a simple calculation, we obtain

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\beta }_{S}-\Pi _S\beta _0)&{}=&{}\displaystyle A_{n}^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\Pi _SX_i\left( \hat{H}_{i}-X_i^{\top }\Pi _S^{\top }\Pi _S\beta _0\right) \\ &{}=&{}\displaystyle \displaystyle A_{n}^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\Pi _SX_i\left( \delta _{i}\varepsilon _i-\delta _{i}X_i^{\top }(\tilde{\beta }_n-\beta _0)\!+\!X_i^{\top }\tilde{\beta }_n-X_i^{\top }\Pi _S^{\top }\Pi _S\beta _0\right) \\ &{}=&{}\displaystyle B_{n1}+B_{n2}+B_{n3}.\end{array} \end{aligned}$$

where

$$\begin{aligned} \begin{array}{lll} B_{n1}&{}=&{}\displaystyle A_{n}^{-1}\Pi _S\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i,\\ B_{n2}&{}=&{}\displaystyle \displaystyle A_{n}^{-1}\frac{1}{\sqrt{n}}\Pi _S\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }(\tilde{\beta }_n-\beta _0),\\ B_{n3}&{}=&{}\displaystyle \displaystyle A_{n}^{-1}\frac{1}{\sqrt{n}}\Pi _S\sum \limits _{i=1}^{n}X_iX_i^{\top }(I-\Pi _S^{\top }\Pi _S)\beta _0. \end{array} \end{aligned}$$

Under Condition C1, it is not difficult to get

$$\begin{aligned} A_{n}\stackrel{P}{\longrightarrow }\Pi _S\Sigma \Pi _S^{\top }. \end{aligned}$$

Thus,with Lemma 1 in hand, we have

$$\begin{aligned} B_{n2}&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\right) \sqrt{n}(\tilde{\beta }_n-\beta _0)\\&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\right) \frac{\Sigma _0^{-1}}{\sqrt{n}}\sum \limits _{i=1}^n\delta _iX_i\varepsilon _i+o_p(1).\\ B_{n3}&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}X_iX_i^{\top }\right) (I-\Pi _S^{\top }\Pi _S)\sqrt{n}\beta _0\\&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}X_iX_i^{\top }\right) (I-\Pi _S^{\top }\Pi _S)\sqrt{n}\left( \begin{array}{c} \ddot{ \beta }_0\\ \eta /\sqrt{n} \end{array} \right) \\&= \displaystyle A_{n}^{-1}\Pi _S\left( \frac{1}{n}\sum \limits _{i=1}^{n}X_iX_i^{\top }\right) (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) \\&= \displaystyle (\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1).\\ \end{aligned}$$

The second last equation follows from \((I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} \ddot{ \beta }_0\\ 0 \end{array} \right) =0\). Thus, we have

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\beta }_{S}-\Pi _S\beta _0)&{}=&{}\displaystyle A_{n}^{-1}\Pi _S\left( \Sigma _0+\left( \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\right) \right) \frac{\Sigma _0^{-1}}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i\\ &{}&{}\displaystyle +(\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X_iX_i^{\top }\stackrel{P}{\longrightarrow }\Sigma -\Sigma _0. \end{aligned}$$

By the central limit theorem and slutsky’ theorem, we have

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\beta }_{S}-\Pi _S\beta _0)&{}\stackrel{d}{\longrightarrow }&{}\displaystyle (\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma \Sigma _0^{-1}G +(\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma (I-\Pi _S^{\top }\Pi _S)\left( \begin{array}{c} 0\\ \eta \end{array} \right) . \end{array} \end{aligned}$$

And a slight transformation of the last equation completes the proof of Theorem 1.

Proof of Theorem 2

Theorem 2 is not difficult to arrive by the delta method and Theorem 3 of Van der Vaart and Wellner (1996) based on Theorem 1.

Proof of Theorem 3

It can be verified that

$$\begin{aligned} \sqrt{n} (\hat{\phi }-\phi _0)&= \sum \limits _S w(S|\hat{\eta })\sqrt{n}(\hat{\phi }_S-\phi _0)\\&= \sum \limits _S w(S|\hat{\eta })\left\{ \phi ^{\prime }_{\beta _0}\Pi ^{\top }_S(\Pi _S\Sigma \Pi _S^{\top })^{-1} \Pi _S\Sigma \Sigma ^{-1}_0 G+\phi ^{\prime }_{\beta _0}A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1)\right\} \\&= \sum \limits _S w(S|\hat{\eta }) \phi ^{\prime }_{\beta _0}\Pi ^{\top }_S(\Pi _S\Sigma \Pi _S^{\top })^{-1} \Pi _S\Sigma \Sigma ^{-1}_0 G+\sum \limits _S w(S|\hat{\eta })\phi ^{\prime }_{\beta _0}A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{aligned}$$

Since \(\hat{\eta }\stackrel{d}{\longrightarrow }\Delta \), we have

$$\begin{aligned} w(S|\hat{\eta })\stackrel{d}{\longrightarrow }w(S|\Delta ) \end{aligned}$$

by the continuous mapping theorem, and then Theorem 3 follows directly.

Proof of Theorem 4

It is easily seen that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\theta }-\theta )&{}=&{}\displaystyle \frac{1}{\sqrt{n}} \sum \limits _{i=1}^{n}\{\delta _iY_i+(1-\delta _i)X^{\top }_i\hat{\beta }_{sfvic}-\theta \}\\ &{}=&{}\displaystyle \displaystyle \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}\varepsilon _i +\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}(X^{\top }_i\beta _0-\theta )+\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i(\hat{\beta }_{sfvic}-\beta _0)\\ &{}\doteq &{}\displaystyle H_{n1}+H_{n2}+H_{n3}.\\ \end{array} \end{aligned}$$

For \(H_{n3}\), we have

$$\begin{aligned} H_{n3}&= \displaystyle \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sqrt{n}(\hat{\beta }_{sfvic}-\beta _0)\nonumber \\&= \displaystyle \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sum \limits _S w(S|\hat{\eta })\sqrt{n}(\Pi ^{\top }_{S}\hat{\beta }_{S}-\beta _0). \end{aligned}$$
(2)

Note that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\Pi ^{\top }_{S}\hat{\beta }_{S}-\beta _0)&{}=&{}\displaystyle \Pi _S(\Pi _S\Sigma \Pi _S^{\top })^{-1}\Pi _S\Sigma \Sigma _0^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i +A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \nonumber \\ \end{aligned}$$
(3)

Plugging (3) into (2), we have

$$\begin{aligned} \begin{array}{lll} H_{n3} &{}=&{}\displaystyle \frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sum \limits _S w(S|\hat{\eta }) R_S\Sigma \Sigma _0^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i\\ &{}&{}\displaystyle +\frac{1}{n}\sum \limits _{i=1}^{n}(1-\delta _{i})X^{\top }_i\sum \limits _S w(S|\hat{\eta })A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \end{aligned}$$

Since the term \(\frac{1}{n}\sum _{i=1}^{n}(1-\delta _{i})X^{\top }_i\stackrel{P}{\longrightarrow }(EX^{\top }-E\delta X^{\top })\), it is not difficult to get

$$\begin{aligned} \begin{array}{lll} H_{n3} &{}=&{}\displaystyle (EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta ) R_S\Sigma \Sigma _0^{-1}\cdot \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\delta _{i}X_i\varepsilon _i\\ &{}&{}\displaystyle +(EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta )A_S\left( \begin{array}{c} 0\\ \eta \end{array} \right) +o_p(1). \end{array} \end{aligned}$$

This together with \(H_{n1}\) and \(H_{n2}\) implies that

$$\begin{aligned} \begin{array}{lll} \sqrt{n}(\hat{\theta }-\theta ) &{}=&{}\displaystyle \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}[\delta _{i}-(EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta ) R_S\Sigma \Sigma _0^{-1}\delta _{i}X_i]\varepsilon _i \\ &{}&{}\displaystyle +\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}(X^{\top }_i\beta _0-\theta ) +(EX^{\top }-E\delta X^{\top })\sum \limits _S w(S|\Delta )A_S\left( \begin{array}{c} 0\\ \eta \end{array}\right) . \end{array} \end{aligned}$$

Then, Theorem 4 follows from the central limit theorem. This completes the whole of the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Z., Su, Z. & Ma, J. Focused vector information criterion model selection and model averaging regression with missing response. Metrika 77, 415–432 (2014). https://doi.org/10.1007/s00184-013-0446-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-013-0446-8

Keywords

Mathematics Subject Classification (2000)

Navigation