Abstract
Let \(f_{Y|X,Z}(y|x,z)\) be the conditional probability function of Y given (X, Z), where Y is the scalar response variable, while (X, Z) is the covariable vector. This paper proposes a robust model selection criterion for \(f_{Y|X,Z}(y|x,z)\) with X missing at random. The proposed method is developed based on a set of assumed models for the selection probability function. However, the consistency of model selection by our proposal does not require these models to be correctly specified, while it only requires that the selection probability function is a function of these assumed selective probability functions. Under some conditions, it is proved that the model selection by the proposed method is consistent and the estimator for population parameter vector is consistent and asymptotically normal. A Monte Carlo study was conducted to evaluate the finite-sample performance of our proposal. A real data analysis was used to illustrate the practical application of our proposal.
Similar content being viewed by others
References
Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651–673.
Claeskens, G., Consentino, F. (2008). Variable selection with incomplete covariate data. Biometrics, 64(4), 1062–1069.
Claeskens, G., Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464), 900–916.
Claeskens, G., Hjort, N. L. (2008). Model selection and model averaging. Cambridge University Press.
Fang, F., Shao, J. (2016). Model selection with nonignorable nonresponse. Biometrika, 103(4), 861–874.
Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics, 61(1), 74–85.
Gourieroux, C., Monfort, A. (1995). Statistics and econometric models (Vol. 2). Cambridge University Press.
Hens, N., Aerts, M., Molenberghs, G. (2006). Model selection for incomplete and design-based samples. Statistics in Medicine, 25(14), 2502–2520.
Horvitz, D. G., Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685.
Ibrahim, J. G., Zhu, H., Tang, N. (2008). Model selection criteria for missing data problems using the EM algorithm. Journal of the American Statistical Association, 103(484), 1648–1658.
Jiang, J., Rao, J. S., Gu, Z., Nguyen, T. (2008). Fence methods for mixed model selection. The Annals of Statistics, 36(4), 1669–1692.
Jiang, J., Nguyen, T., Rao, J. S. (2015). The E-MS algorithm: Model selection with incomplete data. Journal of the American Statistical Association, 110(511), 1136–1147.
Little, R. J. A., Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.
Mallow, C. L. (1973). Some comments on \(C_p\). Technometrics, 15(4), 661–675.
Newey, W. K., Mcfadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of Econometrics, 4(05), 2111–2245.
Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427), 846–866.
Rolling, C. A., Yang, Y. (2014). Model selection for estimating treatment effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 749–769.
Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Shao, Q., Yang, L. (2017). Oracally efficient estimation and consistent model selection for auto-regressive moving average time series with trend. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(2), 507–524.
Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924.
Wang, Q., Su, M., Wang, R. (2021). A beyond multiple robust approach for missing response problem. Computational Statistics & Data Analysis, 155, 107111.
Wei, Y., Wang, Q., Duan, X., Qin, J. (2021). Bias-corrected Kullback-Leibler distance criterion based model selection with covariables missing at random. Computational Statistics & Data Analysis, 160.
Zhang, X., Wang, H., Ma, Y., Carroll, R. J. (2017). Linear model selection when covariates contain errors. Journal of the American Statistical Association, 112(520), 1553–1561.
Acknowledgements
Wang’s research was supported by the National Natural Science Foundation of China (General program 11871460, Key program 11331011 and program for Innovative Research Group Project 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
We present proofs of theorems as follows.
Proof of Theorem 1
In order to prove Theorem 1, we need to prove the existence and consistency of \({{{\hat{\theta }}}}^{\text {IP}}_M\) defined in (5) firstly. Based on Property 24.1 in Gourieroux and Monfort (1995), the existence of \( {{{\hat{\theta }}}}^{\text {IP}}_M\) can be guaranteed under (C.1), (C.2), (C.5) and (C.6). Recalling the definition of \({{\hat{D}}}_{\text {IP}}(M,\theta _M,{{\hat{\alpha }}}_n)\) given in (4), based on Theorem 2.1 in Newey and Mcfadden (1994) and (C.1), (C.2), it suffices to prove the consistency of the \({\hat{\theta }}^{\text {IP}}_M\) by verifying the following equation:
Note that,
where \({{\tilde{D}}}_{\text {IP}}(M,\theta _M,\alpha ^*) = n^{-1}\sum \limits _{i=1}^n\left\{ \frac{\delta _i}{p_{\alpha ^*}\left( \phi _{\pi }\left( {Y_i,Z_i};\alpha ^*\right) \right) }\log g_M(Y_i|X_i,Z_i;\theta _M)\right\} \), in which \(p_{\alpha ^*}(\phi _\pi (Y,Z;\alpha ^*))\) is defined in the first paragraph of Sect. 3. We need only to prove,
and
According to Lemma 2.4 in Newey and Mcfadden (1994), with (C.1) and (C.2), it is direct to prove (12) by noting
Note that,
where \({\hat{q}}_{\alpha ,n}(u)=1/{\hat{p}}_{\alpha ,n}(u)\), \({\hat{q}}_{\alpha ,b_n}(u)=1/{{\hat{p}}}_{\alpha ,b_n}(u)={\hat{q}}_{\alpha ,n}(u){\hat{r}}_{\alpha ,n}(u)/{\hat{r}}_{\alpha ,b_n}(u)\), \(q_{\alpha }(u)=1/p_{\alpha }(u)\), \(q_{\alpha ,b_n}(u)=q_\alpha (u)r_\alpha (u)/r_{\alpha ,b_n}(u)\), \( r_{\alpha ,b_n}(u)=\max \{r_{\alpha }(u),b_n\}\), in which \(p_\alpha (v)\) and \(r_\alpha (v)\) are given at the first paragraph of Sect. 3. By (2) and (3), we know that the definition of \({\hat{q}}_{\alpha ,n}(u)\), \({\hat{q}}_{\alpha ,b_n}(u)\), \(q_\alpha (u)\) and \(q_{\alpha ,b_n}(u)\) are similar to \({\hat{a}}_{\gamma ,n}(v)\), \({\hat{a}}_{\gamma ,b_n}(u)\), \(a_\gamma (u)\) and \(a_{\gamma ,b_n}(u)\) in Wang et al. (2021), respectively. By (C.2), we have \(E\{\sup \limits _{\theta _M\in \Theta _M}\delta \log g_M(Y|X,Z;\theta _M)\}<\infty \). This together with conditions (C.3)-(C.7) proves \(\sup \limits _{\theta _M \in \Theta _M}Q_{n1}=o_p(1)\) and \(\sup \limits _{\theta _M \in \Theta _M}Q_{n2}=o_p(1)\), respectively, using the similar arguments to that of Lemma S1 and S2 in Wang et al. (2021). Clearly,
By (C.2), (C.3) and (C.4), \(\forall \epsilon >0\), we then have
This yields
This completes the proof of (11).
Now, in what follows, we prove the asymptotically normality of \(\hat{\theta }_M^{\text {IP}}\). By (14), we know that
Lemma 1 in the supplementary material proves that
Further, according to Lemma 2 in the supplementary material, we have
For \(Q_{n3}\), by (C.2), (C.8) and Markov’s inequality, we have
where \(C_i=\delta _i\log g_M(Y_i|X_i,Z_i;\theta _M)\). Then, we have \(Q_{n3}=o_p(n^{-1/2})\). Thus, we have
Let \(\Psi (u,v)\) be a general vector-valued or matrix-valued function, and we denote
Then, by (4), we have
where \( t_{M,i}(\theta _M)=\frac{\partial \log g_M(Y_i|X_i,Z_i;\theta _M)}{\partial \theta _M}\), for \(i=1,2,\cdot \cdot \cdot ,n\). With the same technique of (18), under (C.2)-(C.8), we can obtain that
From (5), it follows that \(K_M({\hat{\theta _M}}^{\text {IP}},\hat{\alpha }_n)=0\). Thus, applying Taylor expansion to \(K_M({\hat{\theta _M}}^{\text {IP}},{{\hat{\alpha }}}_n)\) around the point \((\theta ^*_{M},\alpha ^*)\), we have
where \(\alpha ^*\) is the probability limit of \({\hat{\alpha _n}}\) and \(\theta ^*_{M}\) is given in Condition (C.1). Let
And by a standard argument, one can easily obtain the following equation:
where \(I_{\alpha ^*}=-E[t^{'}_\pi (\alpha )_{\{\alpha ^T\}}(\alpha ^*)]\).
Similarly, by the law of larger numbers, we can obtain that
where
Thus, (20) together with (21) and (22), we prove
where
By the central limit theorem and the above prove process, we prove that \({\hat{\theta }}^{\text {IP}}_M\) is asymptotically normal with zero mean, as \(n\rightarrow \infty \). This completes the proof of Theorem 1. \(\square \)
Proof of Theorem 2
Obviously, in order to prove Theorem 2, it suffices to prove the following equation:
Recalling the definition of \({\hat{M}}_{\text {IP}}\) in (7), \({\hat{M}}_{\text {IP}}\) is the minimizer of \({\text {IC}}_{\text {IP}}(M)\) with respect to M. So, it is obvious that we have \({\text {IC}}_{\text {IP}}({\hat{M}}_{\text {IP}})\le {\text {IC}}_{\text {IP}}(M_{\text {opt}})\). Thus, to prove (23), we only need to prove that
To prove (24), it’s enough to prove the following equation:
for each candidate model M. By the definition of \({\text {IC}}_{\text {IP}}(M)\) defined in (6), we know that in order to prove (25), it’s equivalent to prove the following equation, when \(n\rightarrow \infty \),
If \(M=M_{\text {opt}}\), (26) is clearly true. Thus, we consider the case where the model M is not \(M_{\text {opt}}\) only. By Theorem 1, applying Taylor-expansion to \({{\hat{D}}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})\), we have
where the definition of \({\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}}\) is similar to \({\hat{\theta }}^{\text {IP}}_M\). By (C.2), (C.4) and (C.5)(ii) as well as the root n consistency of \({\hat{\theta }}^{\text {IP}}_M\) and \({\hat{\alpha _n}}\), we have
Thus, we have
Similarly, we have
where the definition of \(\theta ^{*}_{M_{\text {opt}}}\) is similar to \(\theta ^{*}_M\). Note that, \(\pi (y,z)\) is a function of \(\phi _\pi (y,z;\alpha ^*)\), together with (13) and (18) and the law of large numbers, we have
Obviously, (29) is also true for \(M=M_{\text {opt}}\), we then have
Recalling the definition of \(D(M,\theta _M)\) given below (1), it follows that \(D(M_{\text {opt}},\theta ^*_{M_{\text {opt}}})-D(M,\theta ^*_{M})\) is non-negative in probability. Recalling that \(\lambda _n\) is a positive tuning parameter tending to zero as \(n\rightarrow \infty \). By Fang and Shao (2016), we consider the following three cases to prove (26):
Case 1. M is an incorrect model and \(d_{M_{\text {opt}}}<d_M\). In this case, we then have \({\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})>0\) in probability, and hence (26) is clearly true.
Case 2. M is an incorrect model but \(d_{M_{\text {opt}}}\ge d_M\). Similar to Case 1, we have \({\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})>0\) in probability, (26) then holds by noting \(\lambda _n\rightarrow 0\).
Case 3. M is a correct model but \(d_{M_{\text {opt}}}< d_M\). In this case, we then have \({\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_M,{\hat{\alpha _n}})-{\hat{D}}_{\text {IP}}(M,{\hat{\theta }}^{\text {IP}}_{M_{\text {opt}}},{\hat{\alpha _n}})=O_p(n^{-\frac{1}{2}})\), and hence (26) is true as long as \(\sqrt{n}\lambda _n\rightarrow \infty \).
This completes the proof of (26) and hence the proof of Theorem 2. \(\square \)
About this article
Cite this article
Liang, Z., Wang, Q. & Wei, Y. Robust model selection with covariables missing at random. Ann Inst Stat Math 74, 539–557 (2022). https://doi.org/10.1007/s10463-021-00806-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-021-00806-2