Abstract
We investigate semiparametric estimation of regression coefficients through generalized estimating equations with single-index models when some covariates are missing at random. Existing popular semiparametric estimators may run into difficulties when some selection probabilities are small or the dimension of the covariates is not low. We propose a new simple parameter estimator using a kernel-assisted estimator for the augmentation by a single-index model without using the inverse of selection probabilities. We show that under certain conditions the proposed estimator is as efficient as the existing methods based on standard kernel smoothing, which are often practically infeasible in the case of multiple covariates. A simulation study and a real data example are presented to illustrate the proposed method. The numerical results show that the proposed estimator avoids some numerical issues caused by estimated small selection probabilities that are needed in other estimators.
Similar content being viewed by others
References
Bang, H., Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962–973.
Chen, H. Y. (2004). Nonparametric and semiparametric models for missing covariates in parametric regression. Journal of the American Statistical Association, 99, 1176–1189.
Fuchs, C. (1982). Maximum likelihood estimation and model selection in contingency tables with missing data. Journal of the American Statistical Association, 77, 270–278.
Han, P. (2014). Multiply robust estimation in regression analysis with missing data. Journal of the American Statistical Association, 109, 1159–1173.
Han, P. (2016). Combining inverse probability weighting and multiple imputation to improve robustness of estimation. Scandinavian Journal of Statistics, 43, 246–260.
Han, P., Wang, L. (2013). Estimation with missing data: Beyond double robustness. Biometrika, 100, 417–430.
Hartley, H., Hocking, R. (1971). The analysis of incomplete data. Biometrics, 27, 783–823.
Hsu, C.-H., Long, Q., Li, Y., Jacobs, E. (2014). A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data. Journal of Biopharmaceutical Statistics, 24, 634–648.
Ibrahim, J. G. (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association, 85, 765–769.
Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics, 30, 55–78.
Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., Herring, A. H. (2005). Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association, 100, 332–346.
Kang, J. D., Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.
Little, R. J., Rubin, D. B. (2014). Statistical analysis with missing data. New Jersey: Wiley.
Reilly, M., Pepe, M. S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika, 82, 299–314.
Robins, J. M., Ritov, Y. (1997). Toward a curse of dimensionality appropriate(coda) asymptotic theory for semi-parametric models. Statistics in Medicine, 16, 285–319.
Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89, 846–866.
Robins, J., Sued, M., Lei-Gomez, Q., Rotnitzky, A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science, 22, 544–559.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys. New Jersey: Wiley.
Schluchter, M. D., Jackson, K. L. (1989). Log-linear analysis of censored survival data with partially observed covariates. Journal of the American Statistical Association, 84, 42–52.
Sepanski, J., Knickerbocker, R., Carroll, R. (1994). A semiparametric correction for attenuation. Journal of the American Statistical Association, 89, 1366–1373.
Sinha, S., Saha, K. K., Wang, S. (2014). Semiparametric approach for non-monotone missing covariates in a parametric regression model. Biometrics, 70, 299–311.
Wang, C., Wang, S., Zhao, L.-P., Ou, S.-T. (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. Journal of the American Statistical Association, 92, 512–525.
Wang, S., Wang, C. (2001). A note on kernel assisted estimators in missing covariate regression. Statistics & Probability Letters, 55, 439–449.
Zhou, Y., Wan, A. T. K., Wang, X. (2008). Estimating equations inference with missing data. Journal of the American Statistical Association, 103, 1187–1199.
Acknowledgements
The authors thank the Associate Editor and two referees for their helpful comments and suggestions that have led to much improvement of this paper. This research was supported in part by the Simons Foundation Mathematics and Physical Sciences—Collaboration Grants for Mathematicians Program Award No. 499650.
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
1.1 Regularity conditions
To establish the asymptotic theory in this work, we first assume the following general regularity conditions:
-
(i)
The smoothing parameter h satisfies \(nh^2 \rightarrow \infty \) and \(nh^{2r} \rightarrow 0\), as \(n \rightarrow \infty \).
-
(ii)
All the selection probabilities \(\pi _i\)’s are bounded away from zero.
-
(iii)
The selection probability function on the single-index \(\pi ^*({\gamma })\) has r continuous and bounded partial derivatives a.e.
-
(iv)
The density function f(u) of U and the conditional density function \(f_{U|R}(u)\) of U|R have r continuous and bounded partial derivatives a.e.
-
(v)
The conditional distributions \(f_{U|R=0}(u)\) and \(f_{U|R=1}(u)\) have the same support, and \(b(u) = f_{U|R=0}(u)/f_{U|R=1}(u)\) is bounded over the support.
-
(vi)
The conditional expectations \({\psi }(u| {\gamma }) = E(T| {Q}^\top {\gamma } = u)\) and \(E(TT^\top | {Q}^\top {\gamma })\) exist and have r continuous and bounded partial derivatives a.e.
-
(vii)
For score T, \(E(TT^\top )\) and \(E\{(\partial /\partial {\beta })T\}\) exist and are positive definite, and \((\partial ^2/\partial {\beta } \partial {\beta }^\top )T\) exists and is continuous with respect to \( {\beta }\) a.e.
1.2 Proof of Lemma 1
Proof
The idea in the proof is similar to that in the proof of Lemma 1 in Wang and Wang (2001). Recall that \(u_i = Q_i^\top \gamma = y_i - \beta _Z^\top Z_i\) is the single index and that \(n_1\) is the number of complete cases. Let
Under the regularity conditions, we have \(E\{E_n(u)\} = O(h^r)\) and \(\mathrm{var}\{ E_n(u) \} = O\{(nh)^{-1}\}\) by the Taylor expansions. Then by the Chebyshev inequality, \(E_n(u) - E\{E_n(u)\} = O_p \{ (nh)^{-1/2} \}\), which implies \(E_n(u) = O_p\{ h^r + (nh)^{-1/2} \}\), and thus \(E_n(u_i) = O_p\{ h^r + (nh)^{-1/2} \}\). Similarly, we have \(W_{ni} - \psi _i V_{ni} = O_p \{ h^r + (nh)^{-1/2} \}\).
Define \(\delta _n = h^{2r} + (nh)^{-1}\). Under the SIM condition,
Let \(Q_i^* = R_i Q_i\), \(X_i^* = R_i X_i\) for \(i=1,\ldots ,n\) as the values of the complete cases. Then
where \(T^0_{i,k} = E_{Z_i|u_i,R_i=0}(T_{i,k})=\int T_{i,k} f(Z_i|u_i,R_i=0) \hbox {d}Z_i\), b(u) is defined in regularity condition (iv). The last step is because of the concentration of \(u_i\) on \(u_k\). Using the same idea and \(\{\cdot \cdot \cdot \}\) to denote a repeat of the preceding term, we also have
Let
Then the summations with \(R_i=0\) in \(S_n\) are i.i.d. random variables conditioning on all \((R,Q^*,X^*)\). Thus, we have
Then \(E(S_n) = O(h^r)\) and \(\mathrm{var}(S_n) = O(h^{2r} + (nh)^{-1})\) imply \(S_n = O_p(\eta _n)\). Back to (A.1), we have
\(\square \)
1.3 Proof of Lemma 2
Proof
(a) The proof is analogous to that of Lemma 1. The main difference is that this is the summation of the complete cases. Thus we need to condition on \(R_i = 1\). Then
where \(T^1_{i,k} = E_{Z_i|u_i,R_i=1}(T_{i,k})=\int T_{i,k} f(Z_i|u_i,R_i=1) \hbox {d}Z_i\). The rest of the proof follows in the same manner as in the proof of Lemma 1.
(b) Similarly to the proof of (a), we have
According to the Hölder inequality for the sum of the product terms in the second term below, we have
(c) The proof can be obtained analogously as in (b). \(\square \)
1.4 Proof of Theorem 1
Proof
Based on the conclusion of Lemma 1,
Since \(\varDelta _3( \beta ,{\hat{\psi }}( \gamma ))\) is asymptotically equivalent to a sum of i.i.d. random variables, \({\hat{\beta }}_A\) is asymptotically normally distributed and has the asymptotic covariance \({\varvec{\varSigma }_A} = {{\varvec{D}}}^{-1} {\varvec{\mathcal {M}}} {{\varvec{D}}}^{-1}\) with
\(\square \)
1.5 Proof of Theorem 2
Proof
We first consider the first part, \(\varDelta _1( \beta ,{\pi }(\hat{ \alpha }))\), of its estimating Eq. (11). By assumption, a correctly specified parametric model for the selection probabilities with parameter \( \alpha \) is given by
The log-likelihood is
The corresponding estimating equation for MLE \({\hat{\alpha }}\) is given by
Then we have
Moreover,
where \({{\varvec{F}}}( \alpha )=E \left\{ \frac{1}{\pi _1( \alpha )} \psi _1 \pi _1'( \alpha )^\top \right\} \), \({{\varvec{C}}}( \alpha )=E\left\{ \frac{\pi _1'( \alpha ) \pi _1'( \alpha )^\top }{\pi _1( \alpha ) \{1-\pi _1( \alpha )\}} \right\} \), \(P_n( \alpha ) = n^{-1/2}\sum \nolimits _{i=1}^n \frac{\pi _i'( \alpha )}{\pi _i( \alpha ) \{1-\pi _i( \alpha )\}} \{R_i \)\(- \pi _i( \alpha )\}\).
We now consider the second part of the estimating equation. By Lemmas 1 and 2(a), we obtain that
Recall that the additional condition for Lemma 2(c) requires \(\pi _i =\pi _i^*(\gamma )\). This implies that \(T_i^0 = T_i^1 = E_{Z_i|u_i} (T_i)\), \(\psi _i^0(\gamma ) = \psi _i^1(\gamma ) = E_{Z_i|u_i}\{\psi _i(\gamma )\}\). Let \(T_i^* = E_{Z_i|u_i} (T_i)\), \(\psi _i^*(\gamma ) = E_{Z_i|u_i}\{\psi _i(\gamma )\}\). Then
Equation (A.2) and Lemma 2(c) imply that
Then
As in the proof for the first part \(\varDelta _1( \beta ,{\pi }(\hat{ \alpha }))\), we can show that
Finally we have
In summary, we have shown that \(\varDelta _2( \beta ,{\pi }(\hat{ \alpha }),{\hat{\psi }}( \gamma ))\) is asymptotically equivalent to \(\varDelta _2( \beta ,\pi ^*( \gamma ),\psi )\), which is a sum of i.i.d. terms. Hence, \({\hat{\beta }}_{\mathrm{PIP}A}\) is asymptotically equivalent to the solution of \(\varDelta _2( \beta ,\pi ^*( \gamma ),\psi )=0\), having asymptotic normality with asymptotic covariance
\(\square \)
1.6 Proof of Corollary 1
Proof
By the fact that
where \({{\varvec{F}}}( \alpha )\) and \({{\varvec{C}}}( \alpha )\) are given in the proof of Theorem 1, and by (A.1) in Wang et al. (1997), with an extension to a general parametric model, we have the asymptotic covariance for \({\hat{\beta }}_\mathrm{PIP}\) as
where \(\tilde{{\varvec{S}}}=E(T_1 T_1^\top /\pi _1)\). By Wang and Wang (2001),
is the asymptotic covariance matrix for \({\hat{\beta }}\) when \({\hat{\psi }}\) is based on a standard kernel smoother, where \(\tilde{{\varvec{S}}}^*=E(\psi _1 \psi _1^\top / \pi _1)\).
First we show that \( \varvec{\varSigma }_P \succeq \tilde{\varvec{\varSigma }} . \) By the construction of the covariances, we only need to show that \(\tilde{\varvec{S}}^*-{{\varvec{V}}} \succeq {{\varvec{F}}}( \alpha ) { \varvec{C}}^{-1}( \alpha ) {{\varvec{F}}}( \alpha )^\top \). Define \(\xi = \left( \sqrt{\frac{1-\pi _1}{\pi _1}} \psi _1, \frac{\pi _1'( \alpha )}{\sqrt{(1-\pi _1)\pi _1}} \right) ^\top \). Then we have
By the Schur complement condition of the matrix above, we have
Therefore, \(\tilde{{\varvec{S}}}^*-{{\varvec{V}}} \succeq {{\varvec{F}}}( \alpha ) {{\varvec{C}}}^{-1}( \alpha ) {{\varvec{F}}}( \alpha )^\top \), which implies that \({\varvec{\varSigma }}_P \succeq \tilde{\varvec{\varSigma }}\).
Next, we show that \( \tilde{\varvec{\varSigma }} = \varvec{\varSigma }_{A} = \varvec{\varSigma }_{PA} \) and thus the asymptotic equivalence between \({\hat{\beta }}_A\) and \({\hat{\beta }}_{\mathrm{PIP}A}\). Based on the results of Theorem 1, we can rewrite \(\varDelta _3( \beta ,{\hat{\psi }}( \gamma ))\) as
The condition \(E(Z_i|u_i) = Z_i\) implies that \(T_i^0 = T_i^1 = T_i\) and \(\psi _i^0(\gamma ) = \psi _i^1(\gamma ) = \psi _i(\gamma )\). By Theorem 2, both \(\varDelta _2( \beta ,{\pi }(\hat{ \alpha }),{\hat{\psi }}( \gamma ))\) and \(\varDelta _3( \beta ,{\hat{\psi }}( \gamma ))\) are asymptotically equivalent to \(\varDelta _2( \beta ,\pi ^*( \gamma ),\psi )\) and thus have the same asymptotic covariance matrix as
Recall the condition of Lemma 2(c) that \(\pi _i = \pi _i^*(\gamma )\). Then \({{\varvec{S}}} = \tilde{{\varvec{S}}}\), \({{\varvec{S}}}^* = \tilde{{\varvec{S}}}^*\). Thus, we finally have
\(\square \)
About this article
Cite this article
Sun, Z., Wang, S. Semiparametric estimation in regression with missing covariates using single-index models. Ann Inst Stat Math 71, 1201–1232 (2019). https://doi.org/10.1007/s10463-018-0672-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0672-y