Robust model selection with covariables missing at random

Liang, Zhongqi; Wang, Qihua; Wei, Yuting

doi:10.1007/s10463-021-00806-2

Robust model selection with covariables missing at random

Published: 25 August 2021

Volume 74, pages 539–557, (2022)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Zhongqi Liang¹,
Qihua Wang^1,2 &
Yuting Wei³

352 Accesses
1 Altmetric
Explore all metrics

Abstract

Let $f_{Y|X,Z}(y|x,z)$ be the conditional probability function of Y given (X, Z), where Y is the scalar response variable, while (X, Z) is the covariable vector. This paper proposes a robust model selection criterion for $f_{Y|X,Z}(y|x,z)$ with X missing at random. The proposed method is developed based on a set of assumed models for the selection probability function. However, the consistency of model selection by our proposal does not require these models to be correctly specified, while it only requires that the selection probability function is a function of these assumed selective probability functions. Under some conditions, it is proved that the model selection by the proposed method is consistent and the estimator for population parameter vector is consistent and asymptotically normal. A Monte Carlo study was conducted to evaluate the finite-sample performance of our proposal. A real data analysis was used to illustrate the practical application of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Article 14 November 2023

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Article Open access 30 January 2020

Estimation in partially linear varying-coefficient errors-in-variables models with missing response variables

Article 14 February 2020

References

Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651–673.
MathSciNet MATH Google Scholar
Claeskens, G., Consentino, F. (2008). Variable selection with incomplete covariate data. Biometrics, 64(4), 1062–1069.
Article MathSciNet Google Scholar
Claeskens, G., Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464), 900–916.
Article MathSciNet Google Scholar
Claeskens, G., Hjort, N. L. (2008). Model selection and model averaging. Cambridge University Press.
MATH Google Scholar
Fang, F., Shao, J. (2016). Model selection with nonignorable nonresponse. Biometrika, 103(4), 861–874.
Article MathSciNet Google Scholar
Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics, 61(1), 74–85.
Gourieroux, C., Monfort, A. (1995). Statistics and econometric models (Vol. 2). Cambridge University Press.
Book Google Scholar
Hens, N., Aerts, M., Molenberghs, G. (2006). Model selection for incomplete and design-based samples. Statistics in Medicine, 25(14), 2502–2520.
Horvitz, D. G., Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685.
Article MathSciNet Google Scholar
Ibrahim, J. G., Zhu, H., Tang, N. (2008). Model selection criteria for missing data problems using the EM algorithm. Journal of the American Statistical Association, 103(484), 1648–1658.
Jiang, J., Rao, J. S., Gu, Z., Nguyen, T. (2008). Fence methods for mixed model selection. The Annals of Statistics, 36(4), 1669–1692.
Jiang, J., Nguyen, T., Rao, J. S. (2015). The E-MS algorithm: Model selection with incomplete data. Journal of the American Statistical Association, 110(511), 1136–1147.
Little, R. J. A., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.
Book Google Scholar
Mallow, C. L. (1973). Some comments on $C_p$. Technometrics, 15(4), 661–675.
Google Scholar
Newey, W. K., Mcfadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of Econometrics, 4(05), 2111–2245.
Article MathSciNet Google Scholar
Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427), 846–866.
Rolling, C. A., Yang, Y. (2014). Model selection for estimating treatment effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 749–769.
Article MathSciNet Google Scholar
Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
MathSciNet Google Scholar
Shao, Q., Yang, L. (2017). Oracally efficient estimation and consistent model selection for auto-regressive moving average time series with trend. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(2), 507–524.
Article MathSciNet Google Scholar
Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924.
Article MathSciNet Google Scholar
Wang, Q., Su, M., Wang, R. (2021). A beyond multiple robust approach for missing response problem. Computational Statistics & Data Analysis, 155, 107111.
Article MathSciNet Google Scholar
Wei, Y., Wang, Q., Duan, X., Qin, J. (2021). Bias-corrected Kullback-Leibler distance criterion based model selection with covariables missing at random. Computational Statistics & Data Analysis, 160.
Article MathSciNet Google Scholar
Zhang, X., Wang, H., Ma, Y., Carroll, R. J. (2017). Linear model selection when covariates contain errors. Journal of the American Statistical Association, 112(520), 1553–1561.
Article MathSciNet Google Scholar

Download references

Acknowledgements

Wang’s research was supported by the National Natural Science Foundation of China (General program 11871460, Key program 11331011 and program for Innovative Research Group Project 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS.

Author information

Authors and Affiliations

School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, 310018, Zhejiang, China
Zhongqi Liang & Qihua Wang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Qihua Wang
Department of Statistics and Finance, Universtiy of Science and Technology of China, Hefei, 230026, China
Yuting Wei

Authors

Zhongqi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Qihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuting Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qihua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 78 KB)

Appendix

We present proofs of theorems as follows.

Proof of Theorem 1

In order to prove Theorem 1, we need to prove the existence and consistency of ${{{\hat{\theta }}}}^{\text {IP}}_M$ defined in (5) firstly. Based on Property 24.1 in Gourieroux and Monfort (1995), the existence of $ {{{\hat{\theta }}}}^{\text {IP}}_M$ can be guaranteed under (C.1), (C.2), (C.5) and (C.6). Recalling the definition of ${{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)$ given in (4), based on Theorem 2.1 in Newey and Mcfadden (1994) and (C.1), (C.2), it suffices to prove the consistency of the ${\hat{\theta }}^{\text {IP}}_M$ by verifying the following equation:

$$\begin{aligned} \sup _{\theta _M \in \Theta _M}|{{\hat{D}}}_{\text {IP}}(M,\theta _M,\hat{\alpha }_n)-D(M,\theta _M)|=o_p(1). \end{aligned}$$

Note that,

$$\begin{aligned}&|{{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)-D(M,\theta _M)| \\&\quad \le |{{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)-{\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)|+|{\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)-D(M,\theta _M)|, \end{aligned}$$

where ${{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*) = n^{-1}\sum \limits _{i=1}^n\left\{ \frac{\delta _i}{p_{\alpha ^*}\left( \phi _{\pi }\left( {Y_i,Z_i};\alpha ^*\right) \right) }\log g_M(Y_i|X_i,Z_i;\theta _M)\right\} $, in which $p_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))$ is defined in the first paragraph of Sect. 3. We need only to prove,

$$\begin{aligned} \sup _{\theta _M \in \Theta _M}|{{\hat{D}}}_{\text {IP}}(M,\theta _M,\hat{\alpha }_n)-{{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*)|=o_p(1), \end{aligned}$$

(11)

and

$$\begin{aligned} \sup _{\theta _M \in \Theta _M}|{\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)-D(M,\theta _M)|=o_p(1). \end{aligned}$$

(12)

According to Lemma 2.4 in Newey and Mcfadden (1994), with (C.1) and (C.2), it is direct to prove (12) by noting

$$\begin{aligned} E\{\delta |\phi _\pi (y,z;\alpha ^*)\}=\pi (y,z). \end{aligned}$$

(13)

Note that,

$$\begin{aligned}&{{\hat{D}}}_{\text {IP}}(M,\theta _M,{\hat{\alpha _n}})-{{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*)\nonumber \\&\quad =\frac{1}{n}\sum _{i=1}^n\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)\{{{\hat{q}}}_{{{\hat{\alpha _n}}},b_n}\left( \phi _{\pi }\left( {Y_i,Z_i} ;{{{\hat{\alpha }}}_n}\right) \right) -{\hat{q}}_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))\}\nonumber \\+&\frac{1}{n}\sum _{i=1}^n\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)\{{\hat{q}}_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))-q_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))\}\nonumber \\&\qquad +\frac{1}{n}\sum _{i=1}^n\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)\{q_{\alpha ^*,b_n}(\phi _\pi (Y_i,Z_i;\alpha ^*))- q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\}\nonumber \\ {}&\quad :=Q_{n1}+Q_{n2}+Q_{n3}, \end{aligned}$$

(14)

where ${\hat{q}}_{\alpha ,n}(u)=1/{\hat{p}}_{\alpha ,n}(u)$, ${\hat{q}}_{\alpha ,b_n}(u)=1/{{\hat{p}}}_{\alpha ,b_n}(u)={\hat{q}}_{\alpha ,n}(u){\hat{r}}_{\alpha ,n}(u)/{\hat{r}}_{\alpha ,b_n}(u)$, $q_{\alpha }(u)=1/p_{\alpha }(u)$, $q_{\alpha ,b_n}(u)=q_\alpha (u)r_\alpha (u)/r_{\alpha ,b_n}(u)$, $ r_{\alpha ,b_n}(u)=\max \{r_{\alpha }(u),b_n\}$, in which $p_\alpha (v)$ and $r_\alpha (v)$ are given at the first paragraph of Sect. 3. By (2) and (3), we know that the definition of ${\hat{q}}_{\alpha ,n}(u)$, ${\hat{q}}_{\alpha ,b_n}(u)$, $q_\alpha (u)$ and $q_{\alpha ,b_n}(u)$ are similar to ${\hat{a}}_{\gamma ,n}(v)$, ${\hat{a}}_{\gamma ,b_n}(u)$, $a_\gamma (u)$ and $a_{\gamma ,b_n}(u)$ in Wang et al. (2021), respectively. By (C.2), we have $E\{\sup \limits _{\theta _M\in \Theta _M}\delta \log g_M(Y|X,Z;\theta _M)\}<\infty $. This together with conditions (C.3)-(C.7) proves $\sup \limits _{\theta _M \in \Theta _M}Q_{n1}=o_p(1)$ and $\sup \limits _{\theta _M \in \Theta _M}Q_{n2}=o_p(1)$, respectively, using the similar arguments to that of Lemma S1 and S2 in Wang et al. (2021). Clearly,

$$\begin{aligned} |Q_{n3}|\le \frac{2}{n}\sum _{i=1}^n|\delta _i \log g_M(Y_i|X_i,Z_i;\theta _M)q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))|I[r_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))<b_n]. \end{aligned}$$

By (C.2), (C.3) and (C.4), $\forall \epsilon >0$, we then have

$$\begin{aligned}&P(|\sup _{\theta _M\in \Theta _M}Q_{n3}|>\epsilon )\\&\quad \le \frac{2}{\epsilon }E\{|\sup _{\theta _M \in \Theta _M}\delta \log g_M(Y|X,Z;\theta _M)q_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))|I[r_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))<b_n]\}\\&\quad \rightarrow 0. \end{aligned}$$

This yields

$$\begin{aligned}&\sup _{\theta _M \in \Theta _M}|{{\hat{D}}}_{\text {IP}}(M,\theta _M,\hat{\alpha }_n)-{{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*)|\\ {}&\quad \le \left| \sup _{\theta _M\in \Theta _M}Q_{n1}\right| +\left| \sup _{\theta _M\in \Theta _M}Q_{n2}\right| +\left| \sup _{\theta _M\in \Theta _M}Q_{n3}\right| =o_p(1). \end{aligned}$$

This completes the proof of (11).

Now, in what follows, we prove the asymptotically normality of $\hat{\theta }_M^{\text {IP}}$. By (14), we know that

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n) ={\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)+Q_{n1}+Q_{n2}+Q_{n3}. \end{aligned}$$

Lemma 1 in the supplementary material proves that

$$\begin{aligned} Q_{n1}=&n^{-1}\sum _{i=1}^n\{\delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)\}\{\partial q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))/\partial \alpha \}({\hat{\alpha _n}}-\alpha ^*)\nonumber \\ {}&+o_p(n^{-1/2}). \end{aligned}$$

(15)

Further, according to Lemma 2 in the supplementary material, we have

$$\begin{aligned} Q_{n2}=&n^{-1}\sum _{i=1}^n\{1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\}q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\nonumber \\&\times \delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)+o_p(n^{-1/2}). \end{aligned}$$

(16)

For $Q_{n3}$, by (C.2), (C.8) and Markov’s inequality, we have

$$\begin{aligned}&P(n^{1/2}|Q_{n3}|>\epsilon )\nonumber \\ \le&P(n^{-1/2}\sum _{i=1}^n2\left| C_iq_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\right| I[r_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))<b_n]>\epsilon ) \nonumber \\&\quad \le 2\epsilon ^{-1}E\{\sqrt{n}\left| Cq_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))\right| I[r_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))<b_n]\} \rightarrow 0, \end{aligned}$$

(17)

where $C_i=\delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)$. Then, we have $Q_{n3}=o_p(n^{-1/2})$. Thus, we have

$$\begin{aligned}&{{\hat{D}}}_{\text {IP}}(M,\theta _M,{\hat{\alpha _n}}) ={\tilde{D}}_{\text {IP}}(M,\theta _M,\alpha ^*)+Q_{n1}+Q_{n2}+o_p(n^{-1/2}). \end{aligned}$$

(18)

Let $\Psi (u,v)$ be a general vector-valued or matrix-valued function, and we denote

$$\begin{aligned} \Psi ^{'}_{\{u\}}(u,v)=\frac{ \partial \Psi (u,v)}{\partial u}, \end{aligned}$$

$$\begin{aligned} K_M(\theta _M,{\hat{\alpha _n}})={{\hat{D}}}^{'}_{IP\{\theta _M\}}(M,\theta _M,{\hat{\alpha _n}}). \end{aligned}$$

Then, by (4), we have

$$\begin{aligned} K_M(\theta _M,{\hat{\alpha _n}})=n^{-1}\sum _{i=1}^n \left\{ \frac{\delta _i}{{{\hat{p}}}_{{{\hat{\alpha _n}}},b_n}\left( \phi _{\pi }\left( {Y_i,Z_i};{\hat{\alpha }_n}\right) \right) }t_{M,i}(\theta _M)\right\} , \end{aligned}$$

where $ t_{M,i}(\theta _M)=\frac{\partial \log g_M(Y_i|X_i,Z_i;\theta _M)}{\partial \theta _M}$, for $i=1,2,\cdot \cdot \cdot ,n$. With the same technique of (18), under (C.2)-(C.8), we can obtain that

$$\begin{aligned}&K_M({\hat{\theta _M}}^{\text {IP}},{\hat{\alpha _n}})=n^{-1}\sum _{i=1}^nq_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\delta _it_{M,i}({\hat{\theta _M}}^{\text {IP}})\nonumber \\&\quad +n^{-1}\sum _{i=1}^n\delta _it_{M,i}({\hat{\theta _M}}^{\text {IP}})\{\partial q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))/\partial \alpha \}({\hat{\alpha _n}}-\alpha ^*) \nonumber \\+&\quad n^{-1}\sum _{i=1}^n\{1-\delta _iq_{\alpha ^{*}}(\phi _{\pi }({Y_i,Z_i};\alpha ^{*}))\}q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t_{M,i}({\hat{\theta _M}}^{\text {IP}})+o_p(n^{-1/2}). \end{aligned}$$

(19)

From (5), it follows that $K_M({\hat{\theta _M}}^{\text {IP}},\hat{\alpha }_n)=0$. Thus, applying Taylor expansion to $K_M({\hat{\theta _M}}^{\text {IP}},{{\hat{\alpha }}}_n)$ around the point $(\theta ^*_{M},\alpha ^*)$, we have

$$\begin{aligned}&0=n^{-1}\sum _{i=1}^n\bigg \{q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\delta _it_{M,i}(\theta ^*_M) +[1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))]\nonumber \\&\quad \times q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t_{M,i}(\theta ^*_M)\bigg \}+n^{-1}\sum _{i=1}^n\bigg \{q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))\delta _it^{'}_{M,i}(\theta ^*_M) \nonumber \\&\quad +[1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))] q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t^{'}_{M,i}(\theta ^*_M)\bigg \}({\hat{\theta _M}}^{\text {IP}}-\theta ^*_M) \nonumber \\ {}&\quad +n^{-1}\sum _{i=1}^n\bigg \{\delta _it_{M,i}(\theta ^*_M)\partial q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))/\partial \alpha \bigg \}({\hat{\alpha _n}}-\alpha ^*) +o_p(n^{-1/2})\nonumber \\ {}&\quad :=H_{n1}(M,\theta ^*_M,\alpha ^*)+H_{n2}(M,\theta ^*_M,\alpha ^*)({\hat{\theta _M}}^{\text {IP}}-\theta ^*_M)+H_{n3}(M,\theta ^*_M,\alpha ^*)({\hat{\alpha _n}}-\alpha ^*) \nonumber \\ {}&\quad +o_p(n^{-1/2}), \end{aligned}$$

(20)

where $\alpha ^*$ is the probability limit of ${\hat{\alpha _n}}$ and $\theta ^*_{M}$ is given in Condition (C.1). Let

$$\begin{aligned} t_\pi (\alpha )=\frac{\delta -\phi _\pi (y,z;\alpha )}{\phi _\pi (y,z;\alpha )\{1-\phi _\pi (y,z;\alpha )\}}\cdot \phi ^{'}_{\pi {\{\alpha \}}}(y,z;\alpha ). \end{aligned}$$

And by a standard argument, one can easily obtain the following equation:

$$\begin{aligned} \sqrt{n}({\hat{\alpha _n}}-\alpha ^*)=I^{-1}_{\alpha ^*}\frac{1}{\sqrt{n}}\sum _{i=1}^n t_{\pi ,i}(\alpha ^*)+o_p(1), \end{aligned}$$

(21)

where $I_{\alpha ^*}=-E[t^{'}_\pi (\alpha )_{\{\alpha ^T\}}(\alpha ^*)]$.

Similarly, by the law of larger numbers, we can obtain that

$$\begin{aligned} H_{n2}(M,\theta ^*_M,\alpha ^*)=-I_{\theta ^*_{M}}+o_p(1),\ \ H_{n3}(M,\theta ^*_M,\alpha ^*)=-A_{M,\alpha ^*}+o_p(1), \end{aligned}$$

(22)

where

$$\begin{aligned}&I_{\theta ^*_{M}}=-E\big [\delta q_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))t^{'}_{M\{\theta ^T_M\}}(\theta ^*_{M})+\{1-\delta q_{\alpha ^*}(\phi _{\pi }(Y,Z;\alpha ^*))\}\\&\times q_{\alpha ^*}(\phi _{\pi }(Y,Z;\alpha ^*))\delta t^{'}_{M\{\theta ^T_M\}}(\theta ^*_{M})\big ],\\&\quad A_{M,\alpha ^*}=-E\left\{ \delta \frac{\partial q_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))}{\partial \alpha ^*}t_M(\theta ^*_M)\right\} . \end{aligned}$$

Thus, (20) together with (21) and (22), we prove

$$\begin{aligned} \sqrt{n} ({\hat{\theta }}^{\text {IP}}_M-\theta ^*_{M})=\frac{1}{\sqrt{n}}\sum _{i=1}^n R_{M,i}+o_p(1), \end{aligned}$$

where

$$\begin{aligned}&R_{M,i}=I^{-1}_{\theta ^*_{M}}\bigg \{\delta _i q_{\alpha ^*}(\phi _\pi (Y_i,Z_i;\alpha ^*))t_{M,i}(\theta ^*_{M})+[1-\delta _iq_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))]\\&\quad \times q_{\alpha ^*}(\phi _{\pi }({Y_i,Z_i};\alpha ^*))\delta _i t_{M,i}(\theta ^*_M) -A_{M,\alpha }I^{-1}_{\alpha ^*}t_{\pi ,i}(\alpha ^*)\bigg \}. \end{aligned}$$

By the central limit theorem and the above prove process, we prove that ${\hat{\theta }}^{\text {IP}}_M$ is asymptotically normal with zero mean, as $n\rightarrow \infty $. This completes the proof of Theorem 1. $\square $

Proof of Theorem 2

Obviously, in order to prove Theorem 2, it suffices to prove the following equation:

$$\begin{aligned} P\{{\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})={\text {IC}}_{\text {IP}}(M_{\text {opt}})\}\rightarrow 1 \ \ (n\rightarrow \infty ). \end{aligned}$$

(23)

Recalling the definition of ${\hat{M}}_{\text {IP}}$ in (7), ${\hat{M}}_{\text {IP}}$ is the minimizer of ${\text {IC}}_{\text {IP}}(M)$ with respect to M. So, it is obvious that we have ${\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})\le {\text {IC}}_{\text {IP}}(M_{\text {opt}})$. Thus, to prove (23), we only need to prove that

$$\begin{aligned} P\{{\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})\ge {\text {IC}}_{\text {IP}}(M_{\text {opt}})\}\rightarrow 1 \ \ (n\rightarrow \infty ). \end{aligned}$$

(24)

To prove (24), it’s enough to prove the following equation:

$$\begin{aligned} P\{{\text {IC}}_{\text {IP}}(M)\ge {\text {IC}}_{\text {IP}}(M_{\text {opt}})\}\rightarrow 1 \ \ (n\rightarrow \infty ), \end{aligned}$$

(25)

for each candidate model M. By the definition of ${\text {IC}}_{\text {IP}}(M)$ defined in (6), we know that in order to prove (25), it’s equivalent to prove the following equation, when $n\rightarrow \infty $,

$$\begin{aligned} P({\hat{D}}_{\text {IP}}(M_{\text {opt}},{{\hat{\theta }}}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})+\lambda _n(d_M-d_{M_{\text {opt}}})\ge 0)\rightarrow 1. \end{aligned}$$

(26)

If $M=M_{\text {opt}}$, (26) is clearly true. Thus, we consider the case where the model M is not $M_{\text {opt}}$ only. By Theorem 1, applying Taylor-expansion to ${{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})$, we have

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})&={{\hat{D}}}_{\text {IP}}(M,\theta ^{*}_M,\alpha ^*)+ {{\hat{D}}}^{'}_{IP\{\theta ^T_M\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\theta }}^{\text {IP}}_M-\theta ^{*}_M)\\ {}&+{{\hat{D}}}^{'}_{IP\{\alpha ^T\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\alpha _n}}-\alpha ^*)+o_p(n^{-1/2}), \end{aligned}$$

where the definition of ${\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}}$ is similar to ${\hat{\theta }}^{\text {IP}}_M$. By (C.2), (C.4) and (C.5)(ii) as well as the root n consistency of ${\hat{\theta }}^{\text {IP}}_M$ and ${\hat{\alpha _n}}$, we have

$$\begin{aligned}&{{\hat{D}}}^{'}_{IP\{\theta ^T_M\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\theta }}^{\text {IP}}_M-\theta ^{*}_M)=O_p(n^{-1/2}),\\&\quad {{\hat{D}}}^{'}_{IP\{\alpha ^T\}}(M,\theta ^{*}_M,\alpha ^*)({\hat{\alpha _n}}-\alpha ^*)=O_p(n^{-1/2}). \end{aligned}$$

Thus, we have

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})={{\hat{D}}}_{\text {IP}}(M,\theta ^{*}_M,\alpha ^*)+O_p(n^{-1/2}). \end{aligned}$$

(27)

Similarly, we have

$$\begin{aligned} {{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})={{\hat{D}}}_{\text {IP}}(M,\theta ^{*}_{M_{\text {opt}}},\alpha ^*)+O_p(n^{-1/2}), \end{aligned}$$

(28)

where the definition of $\theta ^{*}_{M_{\text {opt}}}$ is similar to $\theta ^{*}_M$. Note that, $\pi (y,z)$ is a function of $\phi _\pi (y,z;\alpha ^*)$, together with (13) and (18) and the law of large numbers, we have

$$\begin{aligned} {\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})=D(M,\theta ^*_{M})+O_p(n^{-1/2}). \end{aligned}$$

(29)

Obviously, (29) is also true for $M=M_{\text {opt}}$, we then have

$$\begin{aligned}&{\hat{D}}_{\text {IP}}(M_{\text {opt}},{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})\nonumber \\&\quad =D(M_{\text {opt}},\theta ^*_{M_{\text {opt}}})-D(M,\theta ^*_{M})+O_p(n^{-1/2}). \end{aligned}$$

(30)

Recalling the definition of $D(M,\theta _M)$ given below (1), it follows that $D(M_{\text {opt}},\theta ^*_{M_{\text {opt}}})-D(M,\theta ^*_{M})$ is non-negative in probability. Recalling that $\lambda _n$ is a positive tuning parameter tending to zero as $n\rightarrow \infty $. By Fang and Shao (2016), we consider the following three cases to prove (26):

Case 1. M is an incorrect model and $d_{M_{\text {opt}}}<d_M$. In this case, we then have ${\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})>0$ in probability, and hence (26) is clearly true.

Case 2. M is an incorrect model but $d_{M_{\text {opt}}}\ge d_M$. Similar to Case 1, we have ${\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})>0$ in probability, (26) then holds by noting $\lambda _n\rightarrow 0$.

Case 3. M is a correct model but $d_{M_{\text {opt}}}< d_M$. In this case, we then have ${\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})=O_p(n^{-\frac{1}{2}})$, and hence (26) is true as long as $\sqrt{n}\lambda _n\rightarrow \infty $.

This completes the proof of (26) and hence the proof of Theorem 2. $\square $

About this article

Cite this article

Liang, Z., Wang, Q. & Wei, Y. Robust model selection with covariables missing at random. Ann Inst Stat Math 74, 539–557 (2022). https://doi.org/10.1007/s10463-021-00806-2

Download citation

Received: 30 November 2020
Revised: 21 July 2021
Accepted: 03 August 2021
Published: 25 August 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10463-021-00806-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust model selection with covariables missing at random

Abstract

Access this article

Similar content being viewed by others

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Estimation in partially linear varying-coefficient errors-in-variables models with missing response variables

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 78 KB)

Appendix

Proof of Theorem 1

Proof of Theorem 2

About this article

Cite this article

Keywords

Navigation

Robust model selection with covariables missing at random

Abstract

Access this article

Similar content being viewed by others

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Estimation in partially linear varying-coefficient errors-in-variables models with missing response variables

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 78 KB)

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

About this article

Cite this article

Share this article

Keywords

Search

Navigation