Skip to main content
Log in

Robust model selection with covariables missing at random

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Let \(f_{Y|X,Z}(y|x,z)\) be the conditional probability function of Y given (XZ), where Y is the scalar response variable, while (XZ) is the covariable vector. This paper proposes a robust model selection criterion for \(f_{Y|X,Z}(y|x,z)\) with X missing at random. The proposed method is developed based on a set of assumed models for the selection probability function. However, the consistency of model selection by our proposal does not require these models to be correctly specified, while it only requires that the selection probability function is a function of these assumed selective probability functions. Under some conditions, it is proved that the model selection by the proposed method is consistent and the estimator for population parameter vector is consistent and asymptotically normal. A Monte Carlo study was conducted to evaluate the finite-sample performance of our proposal. A real data analysis was used to illustrate the practical application of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651–673.

    MathSciNet  MATH  Google Scholar 

  • Claeskens, G., Consentino, F. (2008). Variable selection with incomplete covariate data. Biometrics, 64(4), 1062–1069.

    Article  MathSciNet  Google Scholar 

  • Claeskens, G., Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464), 900–916.

    Article  MathSciNet  Google Scholar 

  • Claeskens, G., Hjort, N. L. (2008). Model selection and model averaging. Cambridge University Press.

    MATH  Google Scholar 

  • Fang, F., Shao, J. (2016). Model selection with nonignorable nonresponse. Biometrika, 103(4), 861–874.

    Article  MathSciNet  Google Scholar 

  • Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics, 61(1), 74–85.

  • Gourieroux, C., Monfort, A. (1995). Statistics and econometric models (Vol. 2). Cambridge University Press.

    Book  Google Scholar 

  • Hens, N., Aerts, M., Molenberghs, G. (2006). Model selection for incomplete and design-based samples. Statistics in Medicine, 25(14), 2502–2520.

  • Horvitz, D. G., Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685.

    Article  MathSciNet  Google Scholar 

  • Ibrahim, J. G., Zhu, H., Tang, N. (2008). Model selection criteria for missing data problems using the EM algorithm. Journal of the American Statistical Association, 103(484), 1648–1658.

  • Jiang, J., Rao, J. S., Gu, Z., Nguyen, T. (2008). Fence methods for mixed model selection. The Annals of Statistics, 36(4), 1669–1692.

  • Jiang, J., Nguyen, T., Rao, J. S. (2015). The E-MS algorithm: Model selection with incomplete data. Journal of the American Statistical Association, 110(511), 1136–1147.

  • Little, R. J. A., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.

    Book  Google Scholar 

  • Mallow, C. L. (1973). Some comments on \(C_p\). Technometrics, 15(4), 661–675.

    Google Scholar 

  • Newey, W. K., Mcfadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of Econometrics, 4(05), 2111–2245.

    Article  MathSciNet  Google Scholar 

  • Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427), 846–866.

  • Rolling, C. A., Yang, Y. (2014). Model selection for estimating treatment effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 749–769.

    Article  MathSciNet  Google Scholar 

  • Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    MathSciNet  Google Scholar 

  • Shao, Q., Yang, L. (2017). Oracally efficient estimation and consistent model selection for auto-regressive moving average time series with trend. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(2), 507–524.

    Article  MathSciNet  Google Scholar 

  • Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924.

    Article  MathSciNet  Google Scholar 

  • Wang, Q., Su, M., Wang, R. (2021). A beyond multiple robust approach for missing response problem. Computational Statistics & Data Analysis, 155, 107111.

    Article  MathSciNet  Google Scholar 

  • Wei, Y., Wang, Q., Duan, X., Qin, J. (2021). Bias-corrected Kullback-Leibler distance criterion based model selection with covariables missing at random. Computational Statistics & Data Analysis, 160.

    Article  MathSciNet  Google Scholar 

  • Zhang, X., Wang, H., Ma, Y., Carroll, R. J. (2017). Linear model selection when covariates contain errors. Journal of the American Statistical Association, 112(520), 1553–1561.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Wang’s research was supported by the National Natural Science Foundation of China (General program 11871460, Key program 11331011 and program for Innovative Research Group Project 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qihua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 78 KB)

Appendix

Appendix

We present proofs of theorems as follows.

Proof of Theorem 1

In order to prove Theorem 1, we need to prove the existence and consistency of \({{{\hat{\theta }}}}^{\text {IP}}_M\) defined in (5) firstly. Based on Property 24.1 in Gourieroux and Monfort (1995), the existence of \( {{{\hat{\theta }}}}^{\text {IP}}_M\) can be guaranteed under (C.1), (C.2), (C.5) and (C.6). Recalling the definition of \({{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)\) given in (4), based on Theorem 2.1 in Newey and Mcfadden (1994) and (C.1), (C.2), it suffices to prove the consistency of the \({\hat{\theta }}^{\text {IP}}_M\) by verifying the following equation:

$$\begin{aligned} \sup _{\theta _M \in \Theta _M}|{{\hat{D}}}_{\text {IP}}(M,\theta _M,\hat{\alpha }_n)-D(M,\theta _M)|=o_p(1). \end{aligned}$$

Note that,

$$\begin{aligned}&|{{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)-D(M,\theta _M)| \\&\quad \le |{{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)-{\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)|+|{\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)-D(M,\theta _M)|, \end{aligned}$$

where \({{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*) = n^{-1}\sum \limits _{i=1}^n\left\{ \frac{\delta _i}{p_{\alpha ^*}\left( \phi _{\pi }\left( {Y_i,Z_i};\alpha ^*\right) \right) }\log g_M(Y_i|X_i,Z_i;\theta _M)\right\} \), in which \(p_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))\) is defined in the first paragraph of Sect. 3. We need only to prove,

$$\begin{aligned} \sup _{\theta _M \in \Theta _M}|{{\hat{D}}}_{\text {IP}}(M,\theta _M,\hat{\alpha }_n)-{{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*)|=o_p(1), \end{aligned}$$
(11)

and

$$\begin{aligned} \sup _{\theta _M \in \Theta _M}|{\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)-D(M,\theta _M)|=o_p(1). \end{aligned}$$
(12)

According to Lemma 2.4 in Newey and Mcfadden (1994), with (C.1) and (C.2), it is direct to prove (12) by noting

$$\begin{aligned} E\{\delta |\phi _\pi (y,z;\alpha ^*)\}=\pi (y,z). \end{aligned}$$
(13)

Note that,

$$\begin{aligned}&{{\hat{D}}}_{\text {IP}}(M,\theta _M,{\hat{\alpha _n}})-{{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*)\nonumber \\&\quad =\frac{1}{n}\sum _{i=1}^n\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)\{{{\hat{q}}}_{{{\hat{\alpha _n}}},b_n}\left( \phi _{\pi }\left( {Y_i,Z_i} ;{{{\hat{\alpha }}}_n}\right) \right) -{\hat{q}}_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))\}\nonumber \\+&\frac{1}{n}\sum _{i=1}^n\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)\{{\hat{q}}_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))-q_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))\}\nonumber \\&\qquad +\frac{1}{n}\sum _{i=1}^n\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)\{q_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))- q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\}\nonumber \\ {}&\quad :=Q_{n1}+Q_{n2}+Q_{n3}, \end{aligned}$$
(14)

where \({\hat{q}}_{\alpha ,n}(u)=1/{\hat{p}}_{\alpha ,n}(u)\), \({\hat{q}}_{\alpha ,b_n}(u)=1/{{\hat{p}}}_{\alpha ,b_n}(u)={\hat{q}}_{\alpha ,n}(u){\hat{r}}_{\alpha ,n}(u)/{\hat{r}}_{\alpha ,b_n}(u)\), \(q_{\alpha }(u)=1/p_{\alpha }(u)\), \(q_{\alpha ,b_n}(u)=q_\alpha (u)r_\alpha (u)/r_{\alpha ,b_n}(u)\), \( r_{\alpha ,b_n}(u)=\max \{r_{\alpha }(u),b_n\}\), in which \(p_\alpha (v)\) and \(r_\alpha (v)\) are given at the first paragraph of Sect. 3. By (2) and (3), we know that the definition of \({\hat{q}}_{\alpha ,n}(u)\), \({\hat{q}}_{\alpha ,b_n}(u)\), \(q_\alpha (u)\) and \(q_{\alpha ,b_n}(u)\) are similar to \({\hat{a}}_{\gamma ,n}(v)\), \({\hat{a}}_{\gamma ,b_n}(u)\), \(a_\gamma (u)\) and \(a_{\gamma ,b_n}(u)\) in Wang et al. (2021), respectively. By (C.2), we have \(E\{\sup \limits _{\theta _M\in \Theta _M}\delta \log g_M(Y|X,Z;\theta _M)\}<\infty \). This together with conditions (C.3)-(C.7) proves \(\sup \limits _{\theta _M \in \Theta _M}Q_{n1}=o_p(1)\) and \(\sup \limits _{\theta _M \in \Theta _M}Q_{n2}=o_p(1)\), respectively, using the similar arguments to that of Lemma S1 and S2 in Wang et al. (2021). Clearly,

$$\begin{aligned} |Q_{n3}|\le \frac{2}{n}\sum _{i=1}^n|\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))|I[r_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))<b_n]. \end{aligned}$$

By (C.2), (C.3) and (C.4), \(\forall \epsilon >0\), we then have

$$\begin{aligned}&P(|\sup _{\theta _M\in \Theta _M}Q_{n3}|>\epsilon )\\&\quad \le \frac{2}{\epsilon }E\{|\sup _{\theta _M \in \Theta _M}\delta \log g_M(Y|X,Z;\theta _M)q_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))|I[r_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))<b_n]\}\\&\quad \rightarrow 0. \end{aligned}$$

This yields

$$\begin{aligned}&\sup _{\theta _M \in \Theta _M}|{{\hat{D}}}_{\text {IP}}(M,\theta _M,\hat{\alpha }_n)-{{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*)|\\ {}&\quad \le \left| \sup _{\theta _M\in \Theta _M}Q_{n1}\right| +\left| \sup _{\theta _M\in \Theta _M}Q_{n2}\right| +\left| \sup _{\theta _M\in \Theta _M}Q_{n3}\right| =o_p(1). \end{aligned}$$

This completes the proof of (11).

Now, in what follows, we prove the asymptotically normality of \(\hat{\theta }_M^{\text {IP}}\). By (14), we know that

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n) ={\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)+Q_{n1}+Q_{n2}+Q_{n3}. \end{aligned}$$

Lemma 1 in the supplementary material proves that

$$\begin{aligned} Q_{n1}=&n^{-1}\sum _{i=1}^n\{\delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)\}\{\partial q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))/\partial \alpha \}({\hat{\alpha _n}}-\alpha ^*)\nonumber \\ {}&+o_p(n^{-1/2}). \end{aligned}$$
(15)

Further, according to Lemma 2 in the supplementary material, we have

$$\begin{aligned} Q_{n2}=&n^{-1}\sum _{i=1}^n\{1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\}q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\nonumber \\&\times \delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)+o_p(n^{-1/2}). \end{aligned}$$
(16)

For \(Q_{n3}\), by (C.2), (C.8) and Markov’s inequality, we have

$$\begin{aligned}&P(n^{1/2}|Q_{n3}|>\epsilon )\nonumber \\ \le&P(n^{-1/2}\sum _{i=1}^n2\left| C_iq_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\right| I[r_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))<b_n]>\epsilon ) \nonumber \\&\quad \le 2\epsilon ^{-1}E\{\sqrt{n}\left| Cq_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))\right| I[r_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))<b_n]\} \rightarrow 0, \end{aligned}$$
(17)

where \(C_i=\delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)\). Then, we have \(Q_{n3}=o_p(n^{-1/2})\). Thus, we have

$$\begin{aligned}&{{\hat{D}}}_{\text {IP}}(M,\theta _M,{\hat{\alpha _n}}) ={\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)+Q_{n1}+Q_{n2}+o_p(n^{-1/2}). \end{aligned}$$
(18)

Let \(\Psi (u,v)\) be a general vector-valued or matrix-valued function, and we denote

$$\begin{aligned} \Psi ^{'}_{\{u\}}(u,v)=\frac{ \partial \Psi (u,v)}{\partial u}, \end{aligned}$$
$$\begin{aligned} K_M(\theta _M,{\hat{\alpha _n}})={{\hat{D}}}^{'}_{IP\{\theta _M\}}(M,\theta _M,{\hat{\alpha _n}}). \end{aligned}$$

Then, by (4), we have

$$\begin{aligned} K_M(\theta _M,{\hat{\alpha _n}})=n^{-1}\sum _{i=1}^n \left\{ \frac{\delta _i}{{{\hat{p}}}_{{{\hat{\alpha _n}}},b_n}\left( \phi _{\pi }\left( {Y_i,Z_i};{\hat{\alpha }_n}\right) \right) }t_{M,i}(\theta _M)\right\} , \end{aligned}$$

where \( t_{M,i}(\theta _M)=\frac{\partial \log g_M(Y_i|X_i,Z_i;\theta _M)}{\partial \theta _M}\), for \(i=1,2,\cdot \cdot \cdot ,n\). With the same technique of (18), under (C.2)-(C.8), we can obtain that

$$\begin{aligned}&K_M({\hat{\theta _M}}^{\text {IP}},{\hat{\alpha _n}})=n^{-1}\sum _{i=1}^nq_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\delta _it_{M,i}({\hat{\theta _M}}^{\text {IP}})\nonumber \\&\quad +n^{-1}\sum _{i=1}^n\delta _it_{M,i}({\hat{\theta _M}}^{\text {IP}})\{\partial q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))/\partial \alpha \}({\hat{\alpha _n}}-\alpha ^*) \nonumber \\+&\quad n^{-1}\sum _{i=1}^n\{1-\delta _iq_{\alpha ^{*}}(\phi _{\pi }({Y_i,Z_i};\alpha ^{*}))\}q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t_{M,i}({\hat{\theta _M}}^{\text {IP}})+o_p(n^{-1/2}). \end{aligned}$$
(19)

From (5), it follows that \(K_M({\hat{\theta _M}}^{\text {IP}},\hat{\alpha }_n)=0\). Thus, applying Taylor expansion to \(K_M({\hat{\theta _M}}^{\text {IP}},{{\hat{\alpha }}}_n)\) around the point \((\theta ^*_{M},\alpha ^*)\), we have

$$\begin{aligned}&0=n^{-1}\sum _{i=1}^n\bigg \{q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\delta _it_{M,i}(\theta ^*_M) +[1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))]\nonumber \\&\quad \times q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t_{M,i}(\theta ^*_M)\bigg \}+n^{-1}\sum _{i=1}^n\bigg \{q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\delta _it^{'}_{M,i}(\theta ^*_M) \nonumber \\&\quad +[1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))] q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t^{'}_{M,i}(\theta ^*_M)\bigg \}({\hat{\theta _M}}^{\text {IP}}-\theta ^*_M) \nonumber \\ {}&\quad +n^{-1}\sum _{i=1}^n\bigg \{\delta _it_{M,i}(\theta ^*_M)\partial q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))/\partial \alpha \bigg \}({\hat{\alpha _n}}-\alpha ^*) +o_p(n^{-1/2})\nonumber \\ {}&\quad :=H_{n1}(M,\theta ^*_M,\alpha ^*)+H_{n2}(M,\theta ^*_M,\alpha ^*)({\hat{\theta _M}}^{\text {IP}}-\theta ^*_M)+H_{n3}(M,\theta ^*_M,\alpha ^*)({\hat{\alpha _n}}-\alpha ^*) \nonumber \\ {}&\quad +o_p(n^{-1/2}), \end{aligned}$$
(20)

where \(\alpha ^*\) is the probability limit of \({\hat{\alpha _n}}\) and \(\theta ^*_{M}\) is given in Condition (C.1). Let

$$\begin{aligned} t_\pi (\alpha )=\frac{\delta -\phi _\pi (y,z;\alpha )}{\phi _\pi (y,z;\alpha )\{1-\phi _\pi (y,z;\alpha )\}}\cdot \phi ^{'}_{\pi {\{\alpha \}}}(y,z;\alpha ). \end{aligned}$$

And by a standard argument, one can easily obtain the following equation:

$$\begin{aligned} \sqrt{n}({\hat{\alpha _n}}-\alpha ^*)=I^{-1}_{\alpha ^*}\frac{1}{\sqrt{n}}\sum _{i=1}^n t_{\pi ,i}(\alpha ^*)+o_p(1), \end{aligned}$$
(21)

where \(I_{\alpha ^*}=-E[t^{'}_\pi (\alpha )_{\{\alpha ^T\}}(\alpha ^*)]\).

Similarly, by the law of larger numbers, we can obtain that

$$\begin{aligned} H_{n2}(M,\theta ^*_M,\alpha ^*)=-I_{\theta ^*_{M}}+o_p(1),\ \ H_{n3}(M,\theta ^*_M,\alpha ^*)=-A_{M,\alpha ^*}+o_p(1), \end{aligned}$$
(22)

where

$$\begin{aligned}&I_{\theta ^*_{M}}=-E\big [\delta q_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))t^{'}_{M\{\theta ^T_M\}}(\theta ^*_{M})+\{1-\delta q_{\alpha ^*}(\phi _{\pi }(Y,Z;\alpha ^*))\}\\&\times q_{\alpha ^*}(\phi _{\pi }(Y,Z;\alpha ^*))\delta t^{'}_{M\{\theta ^T_M\}}(\theta ^*_{M})\big ],\\&\quad A_{M,\alpha ^*}=-E\left\{ \delta \frac{\partial q_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))}{\partial \alpha ^*}t_M(\theta ^*_M)\right\} . \end{aligned}$$

Thus, (20) together with (21) and (22), we prove

$$\begin{aligned} \sqrt{n} ({\hat{\theta }}^{\text {IP}}_M-\theta ^*_{M})=\frac{1}{\sqrt{n}}\sum _{i=1}^n R_{M,i}+o_p(1), \end{aligned}$$

where

$$\begin{aligned}&R_{M,i}=I^{-1}_{\theta ^*_{M}}\bigg \{\delta _i q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))t_{M,i}(\theta ^*_{M})+[1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))]\\&\quad \times q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t_{M,i}(\theta ^*_M) -A_{M,\alpha }I^{-1}_{\alpha ^*}t_{\pi ,i}(\alpha ^*)\bigg \}. \end{aligned}$$

By the central limit theorem and the above prove process, we prove that \({\hat{\theta }}^{\text {IP}}_M\) is asymptotically normal with zero mean, as \(n\rightarrow \infty \). This completes the proof of Theorem 1. \(\square \)

Proof of Theorem 2

Obviously, in order to prove Theorem 2, it suffices to prove the following equation:

$$\begin{aligned} P\{{\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})={\text {IC}}_{\text {IP}}(M_{\text {opt}})\}\rightarrow 1 \ \ (n\rightarrow \infty ). \end{aligned}$$
(23)

Recalling the definition of \({\hat{M}}_{\text {IP}}\) in (7), \({\hat{M}}_{\text {IP}}\) is the minimizer of \({\text {IC}}_{\text {IP}}(M)\) with respect to M. So, it is obvious that we have \({\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})\le {\text {IC}}_{\text {IP}}(M_{\text {opt}})\). Thus, to prove (23), we only need to prove that

$$\begin{aligned} P\{{\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})\ge {\text {IC}}_{\text {IP}}(M_{\text {opt}})\}\rightarrow 1 \ \ (n\rightarrow \infty ). \end{aligned}$$
(24)

To prove (24), it’s enough to prove the following equation:

$$\begin{aligned} P\{{\text {IC}}_{\text {IP}}(M)\ge {\text {IC}}_{\text {IP}}(M_{\text {opt}})\}\rightarrow 1 \ \ (n\rightarrow \infty ), \end{aligned}$$
(25)

for each candidate model M. By the definition of \({\text {IC}}_{\text {IP}}(M)\) defined in (6), we know that in order to prove (25), it’s equivalent to prove the following equation, when \(n\rightarrow \infty \),

$$\begin{aligned} P({\hat{D}}_{\text {IP}}(M_{\text {opt}},{{\hat{\theta }}}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})+\lambda _n(d_M-d_{M_{\text {opt}}})\ge 0)\rightarrow 1. \end{aligned}$$
(26)

If \(M=M_{\text {opt}}\), (26) is clearly true. Thus, we consider the case where the model M is not \(M_{\text {opt}}\) only. By Theorem 1, applying Taylor-expansion to \({{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})\), we have

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})&={{\hat{D}}}_{\text {IP}}(M,\theta ^{*}_M,\alpha ^*)+ {{\hat{D}}}^{'}_{IP\{\theta ^T_M\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\theta }}^{\text {IP}}_M-\theta ^{*}_M)\\ {}&+{{\hat{D}}}^{'}_{IP\{\alpha ^T\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\alpha _n}}-\alpha ^*)+o_p(n^{-1/2}), \end{aligned}$$

where the definition of \({\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}}\) is similar to \({\hat{\theta }}^{\text {IP}}_M\). By (C.2), (C.4) and (C.5)(ii) as well as the root n consistency of \({\hat{\theta }}^{\text {IP}}_M\) and \({\hat{\alpha _n}}\), we have

$$\begin{aligned}&{{\hat{D}}}^{'}_{IP\{\theta ^T_M\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\theta }}^{\text {IP}}_M-\theta ^{*}_M)=O_p(n^{-1/2}),\\&\quad {{\hat{D}}}^{'}_{IP\{\alpha ^T\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\alpha _n}}-\alpha ^*)=O_p(n^{-1/2}). \end{aligned}$$

Thus, we have

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})={{\hat{D}}}_{\text {IP}}(M,\theta ^{*}_M,\alpha ^*)+O_p(n^{-1/2}). \end{aligned}$$
(27)

Similarly, we have

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})={{\hat{D}}}_{\text {IP}}(M,\theta ^{*}_{M_{\text {opt}}},\alpha ^*)+O_p(n^{-1/2}), \end{aligned}$$
(28)

where the definition of \(\theta ^{*}_{M_{\text {opt}}}\) is similar to \(\theta ^{*}_M\). Note that, \(\pi (y,z)\) is a function of \(\phi _\pi (y,z;\alpha ^*)\), together with (13) and (18) and the law of large numbers, we have

$$\begin{aligned} {\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})=D(M,\theta ^*_{M})+O_p(n^{-1/2}). \end{aligned}$$
(29)

Obviously, (29) is also true for \(M=M_{\text {opt}}\), we then have

$$\begin{aligned}&{\hat{D}}_{\text {IP}}(M_{\text {opt}},{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})\nonumber \\&\quad =D(M_{\text {opt}},\theta ^*_{M_{\text {opt}}})-D(M,\theta ^*_{M})+O_p(n^{-1/2}). \end{aligned}$$
(30)

Recalling the definition of \(D(M,\theta _M)\) given below (1), it follows that \(D(M_{\text {opt}},\theta ^*_{M_{\text {opt}}})-D(M,\theta ^*_{M})\) is non-negative in probability. Recalling that \(\lambda _n\) is a positive tuning parameter tending to zero as \(n\rightarrow \infty \). By Fang and Shao (2016), we consider the following three cases to prove (26):

Case 1. M is an incorrect model and \(d_{M_{\text {opt}}}<d_M\). In this case, we then have \({\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})>0\) in probability, and hence (26) is clearly true.

Case 2. M is an incorrect model but \(d_{M_{\text {opt}}}\ge d_M\). Similar to Case 1, we have \({\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})>0\) in probability, (26) then holds by noting \(\lambda _n\rightarrow 0\).

Case 3. M is a correct model but \(d_{M_{\text {opt}}}< d_M\). In this case, we then have \({\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})=O_p(n^{-\frac{1}{2}})\), and hence (26) is true as long as \(\sqrt{n}\lambda _n\rightarrow \infty \).

This completes the proof of (26) and hence the proof of Theorem 2. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Z., Wang, Q. & Wei, Y. Robust model selection with covariables missing at random. Ann Inst Stat Math 74, 539–557 (2022). https://doi.org/10.1007/s10463-021-00806-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-021-00806-2

Keywords

Navigation