Dimension reduction for kernel-assisted M-estimators with missing response at random

Wang, Lei

doi:10.1007/s10463-018-0664-y

Dimension reduction for kernel-assisted M-estimators with missing response at random

Published: 25 April 2018

Volume 71, pages 889–910, (2019)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Lei Wang¹

293 Accesses
4 Citations
Explore all metrics

Abstract

To obtain M-estimators of a response variable when the data are missing at random, we can construct three bias-corrected nonparametric estimating equations based on inverse probability weighting, mean imputation, and augmented inverse probability weighting approaches. However, when the dimension of covariate is not low, the estimation efficiency will be affected due to the curse of dimensionality. To address this issue, we propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with bias-corrected estimating equations. We show that the resulting three kernel-assisted estimating equations yield asymptotically equivalent M-estimators that achieve the desirable properties. The finite-sample performance of the proposed estimators for response mean, distribution function and quantile is studied through simulation, and an application to HIV-CD4 data set is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional Covariates

Article 09 September 2020

Lei Wang, Siying Sun & Zheng Xia

On the maximal deviation of kernel regression estimators with NMAR response variables

Article 10 February 2022

Majid Mojirsheibani

Nonparametric $$M$$ -type regression estimation under missing response data

Article 10 March 2015

Shuanghua Luo & Cheng-yi Zhang

References

Andrews, D. W. (1995). Nonparametric kernel estimation for semiparametric models. Econometric Theory, 11, 560–586.
Article MathSciNet Google Scholar
Chen, X., Wan, A. T., Zhou, Y. (2015). Efficient quantile regression analysis with missing observations. Journal of the American Statistical Association, 110, 723–741.
Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89, 81–87.
Article MATH Google Scholar
Cook, R. D. (1994). On the interpretation of regression plots. Journal of the American Statistical Association, 89, 177–189.
Article MathSciNet MATH Google Scholar
Cook, R. D., Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction”. Journal of the American Statistical Association, 86, 28–33.
Deng, J., Wang, Q. (2017). Dimension reduction estimation for probability density with data missing at random when covariables are present. Journal of Statistical Planning and Inference, 181, 11–29.
Ding, X., Wang, Q. (2011). Fusion-refinement procedure for dimension reduction with missing response at random. Journal of the American Statistical Association, 106, 1193–1207.
Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundaker, H., Schooley, R. T., Haubrich, R. H., et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. The New England Journal of Medicine, 335, 1081–1089.
Article Google Scholar
Hu, Z., Follmann, D. A., Wang, N. (2014). Estimation of mean response via effective balancing score. Biometrika, 101, 613–624.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Book MATH Google Scholar
Ibrahim, J. G., Chen, M. H., Lipsitz, S. R., Herring, A. H. (2005). Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association, 100, 332–346.
Kim, J. K., Shao, J. (2013). Statistical methods for handling incomplete data. London: Chapman and Hall/CRC.
Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–327.
Article MathSciNet MATH Google Scholar
Li, Y., Wang, Q., Zhu, L., Ding, X. (2017). Mean response estimation with missing response in the presence of high-dimensional covariates. Communications in Statistics-Theory and Methods, 46, 628–643.
Ma, Y., Zhu, L. (2012). A semiparametric approach to dimension reduction. Journal of the American Statistical Association, 107, 168–179.
Ma, Y., Zhu, L. (2013). A review on dimension reduction. International Statistical Review, 81, 134–150.
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249.
Article MathSciNet MATH Google Scholar
Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325.
Rubins, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article MathSciNet Google Scholar
Serfling, R. J. (1981). Approximation theorems of mathematical statistics. New York: Wiley.
MATH Google Scholar
Shao, J., Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187.
Wang, D., Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values. The Annals of Statistics, 37, 490–517.
Wang, L., Rotnitzky, A., Lin, X. (2010). Nonparametric regression with missing outcomes using weighted kernel estimating equations. Journal of the American Statistical Association, 105, 1135–1146.
Wang, Q. (2007). M-estimators based on inverse probability weighted estimating equations with response missing at random. Communications in Statistics-Theory and Methods, 36, 1091–1103.
Article MathSciNet MATH Google Scholar
Wooldridge, J. M. (2007). Inverse probability weighted estimation for general missing data problems. Journal of Econometrics, 141, 1281–1301.
Article MathSciNet MATH Google Scholar
Xia, Y., Tong, H., Li, W. K., Zhu, L. X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B, 64, 363–410.
Xue, L. (2009). Empirical likelihood confidence intervals for response mean with data missing at random. Scandinavian Journal of Statistics, 36, 671–685.
Article MathSciNet MATH Google Scholar
Zhang, B. (1995). M-estimation and quantile estimation in the presence of auxiliary information. Journal of Statistical Planning and Inference, 44, 77–94.
Article MathSciNet MATH Google Scholar
Zhu, L. P., Zhu, L. X., Ferre, L., Wang, T. (2010). Sufficient dimension reduction through discretization-expectation estimation. Biometrika, 97, 295–304.

Download references

Acknowledgements

We are grateful to the Editor, the Associate Editor and one anonymous referee for their insightful comments and suggestions on this article, which have led to significant improvements. This work was supported by the National Natural Science Foundation of China (11501208) and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Institute of Statistics and LPMC, Nankai University, Tianjin, 300071, China
Lei Wang

Authors

Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Wang.

Appendix

(C1)
The true value $\theta _0$ is the unique root of $n^{-1}\sum _{i=1}^ng_l(Y_i,S_i,\delta _i,\theta )=0$, $n^{-1}\sum _{i=1}^ng_l(Y_i,S_i,\delta _i,\theta )$ is differentiable at $\theta =\theta _0$ for $l=1, 2, 3$ with $\sum _{i=1}^n{\partial g_l(Y_i,S_i,\delta _i,\theta _0)}/{\partial \theta } \ne 0$.
(C2)
The function $\varphi (Y,\theta )$ is monotone and continuous in $\theta $, $E|\varphi (Y,\theta )|< \infty $, $\partial \varphi (Y,\theta )/\partial \theta $ is continuous at $\theta =\theta _0$; $E|\partial \varphi (Y,\theta _0)/\partial \theta |< \infty $, $ E\{\varphi ^2(Y,\theta )|S\} < \infty $.
(C3)
The kernel $K(\cdot )$ is bounded and has compact support, and is of order $m \ge 2$, i.e., $\int K(s_1,...,s_{d})ds_1 \cdots ds_{d}=1$, $\int s_j^tK(s_1,...,s_{d})ds_1\cdots ds_{d}=0$, and $\int s_j^mK(s_1,...,s_{d})ds_1\cdots ds_{d}\ne 0$ for any $j =1,...,d$ and $t=1,..., m-1$.
(C4)
The function $\pi (S)$ and the S-density function f(S) have continuous and bounded partial derivatives with respect to S up to order m, and $\pi (S)$ are bounded away from 0 and 1.
(C5)
The function $m_{\varphi }(S, \theta )$ is twice continuously differentiable in the neighborhood of S; has bounded partial derivatives up to order m.
(C6)
As $n\rightarrow \infty $, $nh^{2d}\rightarrow \infty $, $nh^{d}/\log n\rightarrow \infty $, $nh^{2m}\rightarrow 0$, and the estimator $\hat{B}$ obtained by SDR is a root-n consistent estimator of B.

Proof of Theorem 1

For $g_2(Y_i, \hat{S}_i,\delta _i,\theta )$, note that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_2(Y_i,\hat{S}_i,\delta _i,\theta ) =&\frac{1}{n}\sum \limits _{i=1}^n\{\delta _i\varphi (Y_i,\theta ) +(1-\delta _i){m}_{\varphi }(S_i, \theta )\}\\&+\frac{1}{n}\sum \limits _{i=1}^n(1-\delta _i)\{{\hat{m}}_{\varphi }(\hat{S}_i,\theta )-m_{\varphi }(S_i, \theta )\}, \end{aligned}$$

where $S_i=BX_i$ and $\hat{S}_i=\hat{B}X_i$. Define $G(S)=f(S)\pi (S)$ and

$$\begin{aligned} \hat{G}_n(S)=\dfrac{1}{n}\sum \limits _{j=1}^{n}\delta _j{K}_h(S_j-S). \end{aligned}$$

Let $\varDelta _n(\hat{S}_i,S_i)=\hat{G}_n(\hat{S})-G(S_i)$. Then,

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^n(1-\delta _i)\{{\hat{m}}_{\varphi }(\hat{S}_i,\theta )-m_{\varphi }(S_i, \theta )\}=A_{n1}+A_{n2}-A_{n3}, \end{aligned}$$

where

$$\begin{aligned} \begin{array}{llll} A_{n1}=\dfrac{1}{n^2}\sum \limits _{i=1}^n\sum \limits _{j=1}^n (1-\delta _i){K}_h(\hat{S}_j-\hat{S}_i)\dfrac{\delta _j\{\varphi (Y_j,\theta )-m_{\varphi }(S_j, \theta )\}}{ G(S_i)},\\ A_{n2}=\dfrac{1}{n^2}\sum \limits _{i=1}^n\sum \limits _{j=1}^n(1-\delta _i) {K}_h(\hat{S}_j-\hat{S}_i)\dfrac{\delta _j\{m_{\varphi }(S_j, \theta )-m_{\varphi }(S_i, \theta )\}}{G(S_i)},\\ A_{n3}=\dfrac{1}{n}\sum \limits _{i=1}^n(1-\delta _i) \{{\hat{m}}_{\varphi }(\hat{S}_i,\theta )-m_{\varphi }(S_i, \theta )\} \dfrac{\varDelta _n(\hat{S}_i,S_i)}{G(S_i)}. \end{array} \end{aligned}$$

Using the fact $\delta \varphi (Y, \theta )\perp X|B X$, we can show that

$$\begin{aligned} E[\delta _j\{\varphi (Y_j, \theta )-m_{\varphi }(S_j, \theta )\}|X_j]=0. \end{aligned}$$

As in Wang and Chen (2009), we can prove that

$$\begin{aligned} A_{n1}=\dfrac{1}{n}\sum \limits _{i=1}^n\delta _i\{\pi ^{-1}(S_i)-1\} \{\varphi (Y_i, \theta )-m_{\varphi }(S_i, \theta )\}+o_p(n^{-1/2}), \end{aligned}$$

and $A_{n2}=o_p(n^{-1/2})$. Using the arguments in Andrews (1995) and $\Vert \hat{B}-B\Vert =O_p(n^{-1/2})$, it leads to

$$\begin{aligned} \sup _i\big |\hat{m}_{\varphi }(\hat{S}_i, \theta )-m_{\varphi }(S_i, \theta )\big |=o_p(n^{-1/4}), \end{aligned}$$

such that $A_{n3}=o_p(n^{-1/2})$. Thus, we have

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_2(Y_i,\hat{S}_i,\delta _i,\theta )=&\frac{1}{n}\sum _{i=1}^n\{\delta _i\varphi (Y_i,\theta ) +(1-\delta _i){m}_{\varphi }({S}_i, \theta )\}\\&+\dfrac{1}{n}\sum \limits _{i=1}^n\delta _i\{\pi ^{-1}(S_i)-1\} \{\varphi (Y_i, \theta )-m_{\varphi }(S_i, \theta )\}\\&+o_p(n^{-1/2}). \end{aligned}$$

It leads to

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_2(Y_i,\hat{S}_i,\delta _i,\theta ) \rightarrow E\varphi (Y,\theta ). \end{aligned}$$

Furthermore, we have

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n\{g_2(Y_i,\hat{S}_i,\delta _i,\theta )-E\varphi (Y,\theta )\} \rightarrow N(0,V(\theta )^2). \end{aligned}$$

For $g_1(Y_i, \hat{S}_i,\delta _i,\theta )$, we have

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_1(Y_i,\hat{S}_i,\delta _i,\theta )&=\frac{1}{n}\sum \limits _{i=1}^n\Big [\frac{\delta _i\varphi (Y_i,\theta )}{\pi (S_i)} +\frac{\delta _i\varphi (Y_i,\theta )\{\pi (S_i)- \hat{\pi }(\hat{S}_i)\}}{\pi ^2(S_i)}\Big ]\\&\quad +\frac{1}{n}\sum \limits _{i=1}^n\frac{\delta _i\varphi (Y_i,\theta )\{\pi (S_i)- \hat{\pi }(\hat{S}_i)\}^2}{\pi ^2(S_i)\hat{\pi }(\hat{S}_i)}. \end{aligned}$$

Using the similar arguments in Wang (2007), we can prove that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_1(Y_i,\hat{S}_i,\delta _i,\theta ) =\frac{1}{n}\sum \limits _{i=1}^n\Big [\frac{\delta _i\varphi (Y_i,\theta )}{\pi (S_i)}+\big \{1-\frac{\delta _i}{\pi (S_i)}\big \}{m}_{\varphi }(S_i, \theta )\Big ], \end{aligned}$$

which leads to

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_1(Y_i,\hat{S}_i,\delta _i,\theta ) \rightarrow E\varphi (Y,\theta ), \end{aligned}$$

and

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n\{g_1(Y_i,\hat{S}_i,\delta _i,\theta )-E\varphi (Y,\theta )\} \rightarrow N(0,V(\theta )^2). \end{aligned}$$

For $g_3(Y_i, \hat{S}_i,\delta _i,\theta )$, it can be seen that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_3(Y_i,\hat{S}_i,\delta _i,\theta )&=\frac{1}{n}\sum \limits _{i=1}^n\Big [\frac{\delta _i\varphi (Y_i,\theta )}{\pi (S_i)}+\big \{1-\frac{\delta _i}{\pi (S_i)}\big \}{m}_{\varphi }(S_i, \theta )\Big ]\\&\quad +\frac{1}{n}\sum \limits _{i=1}^n\Big [\Big \{\frac{\delta _i}{\hat{\pi }(\hat{S}_i)}-\frac{\delta _i}{{\pi }({S}_i)}\Big \}\Big \{\varphi (Y_i,\theta )-{m}_{\varphi }(S_i, \theta )\Big \}\Big ]\\&\quad +\frac{1}{n}\sum \limits _{i=1}^n\Big [\Big \{1-\frac{\delta _i}{\hat{\pi }(\hat{S}_i)}\Big \}\Big \{\hat{m}_{\varphi }(\hat{S}_i, \theta )-{m}_{\varphi }(S_i, \theta )\Big \}\Big ]. \end{aligned}$$

Using the similar arguments for $g_1(Y_i, \hat{S}_i,\delta _i,\theta )$ and $g_2(Y_i, \hat{S}_i,\delta _i,\theta )$, it can be proved that the last two terms on the right side of the above equation are $o_p(1)$. The proof is completed. $\square $

Proof of Theorem 2

By Taylor expansion, there exists ${\theta }_l^*$ between $\hat{\theta }_l$ and $\hat{\theta }_0$, $l=1,2,3,$ such that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^ng_l(Y_i,\hat{S}_i,\delta _i,\hat{\theta }_l) =\frac{1}{n}\sum _{i=1}^ng_l(Y_i,\hat{S}_i,\delta _i,{\theta }_0)+\frac{1}{n}\sum _{i=1}^n \frac{\partial g_l(Y_i,\hat{S}_i,\delta _i,{\theta }_l^*)}{\partial \theta } (\hat{\theta }_l-\hat{\theta }_0). \end{aligned}$$

Since $\sum _{i=1}^ng_l(Y_i,\hat{S}_i,\delta _i,\hat{\theta }_l)=0$, we have

$$\begin{aligned} \sqrt{n}(\hat{\theta }_l-{\theta }_0)=-\sqrt{n}\Big \{\frac{1}{n}\sum _{i=1}^n \frac{\partial g_l(Y_i,\hat{S}_i,\delta _i,{\theta }_l^*)}{\partial \theta }\Big \}^{-1}\frac{1}{n}\sum _{i=1}^ng_l(Y_i,\hat{S}_i,\delta _i,{\theta }_0). \end{aligned}$$

Similar to Theorem 1, as $n \rightarrow \infty $, it can be proved that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n \frac{\partial g_l(Y_i,\hat{S}_i,\delta _i,{\theta }_l^*)}{\partial \theta } \rightarrow E\Big \{\frac{\varphi (Y,\theta _0)}{\partial \theta }\Big \}. \end{aligned}$$

The proof is completed. $\square $

Lemma 1

Assume that $P(\delta = 1|X)>0$ and $P(Y=0|X)=0$. For any given y, it can be verified that

$$\begin{aligned} \mathcal {S}_{\delta I(Y \le y)|X} \subseteq \mathcal {S}_{\delta Y|X}, \end{aligned}$$

where $\mathcal {S}$ denotes for the central subspace (Cook 1994).

Proof of Lemma 1

Suppose that B is a basis of $\mathcal {S}_{\delta Y|X}$, such that $\delta Y \perp X | BX.$ Then, we have $\mathrm{Pr}(\delta Y=0|X)=\mathrm{Pr}(\delta Y=0|BX)$ and $\mathrm{Pr}(\delta Y \le y |X)=\mathrm{Pr}(\delta Y \le y |BX)$. Note $\mathrm{Pr}(\delta Y \le y |X)=\mathrm{Pr}(\delta =1, Y \le y |X)+I(y \ge 0)\mathrm{Pr}(\delta =0|X)$ and $\mathrm{Pr}(\delta Y =0 |X)=\mathrm{Pr}(\delta =1, Y=0 |X)+\mathrm{Pr}(\delta =0|X)=\mathrm{Pr}(\delta =0 |X).$ We have

$$\begin{aligned} \mathrm{Pr}(\delta I(Y \le y)=1|X)&=\mathrm{Pr}(\delta =1, Y \le y |X)\\&=\mathrm{Pr}(\delta Y \le y |X)-I(y \ge 0)\mathrm{Pr}(\delta =0|X)\\&=\mathrm{Pr}(\delta Y \le y |X)-I(y \ge 0)\mathrm{Pr}(\delta Y=0|X)\\&=\mathrm{Pr}(\delta Y \le y |BX)-I(y \ge 0)\mathrm{Pr}(\delta Y=0|BX)\\&=\mathrm{Pr}(\delta I(Y \le y)=1|BX).\\ \mathrm{Pr}(\delta I(Y \le y)=0|X)&=\mathrm{Pr}(\delta =0|X)+\mathrm{Pr}(\delta =1, Y \ge y |X)\\&=\mathrm{Pr}(\delta Y=0|X)+\mathrm{Pr}(\delta Y \ge y |X)-I(y \le 0)\mathrm{Pr}(\delta =0|X)\\&=\mathrm{Pr}(\delta Y \ge y |X)+I(y \ge 0)\mathrm{Pr}(\delta Y=0|X)\\&=\mathrm{Pr}(\delta Y \ge y |BX)+I(y \ge 0)\mathrm{Pr}(\delta Y=0|BX)\\&=\mathrm{Pr}(\delta I(Y \le y)=0|BX). \end{aligned}$$

$\square $

About this article

Cite this article

Wang, L. Dimension reduction for kernel-assisted M-estimators with missing response at random. Ann Inst Stat Math 71, 889–910 (2019). https://doi.org/10.1007/s10463-018-0664-y

Download citation

Received: 19 February 2017
Revised: 08 April 2018
Published: 25 April 2018
Issue Date: 07 August 2019
DOI: https://doi.org/10.1007/s10463-018-0664-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimension reduction for kernel-assisted M-estimators with missing response at random

Abstract

Access this article

Similar content being viewed by others

An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional Covariates

On the maximal deviation of kernel regression estimators with NMAR response variables

Nonparametric $$M$$ -type regression estimation under missing response data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 1

Proof of Theorem 2

Lemma 1

Proof of Lemma 1

About this article

Cite this article

Keywords

Navigation

Dimension reduction for kernel-assisted M-estimators with missing response at random

Abstract

Access this article

Similar content being viewed by others

An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional Covariates

On the maximal deviation of kernel regression estimators with NMAR response variables

Nonparametric $$M$$ -type regression estimation under missing response data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Lemma 1

Proof of Lemma 1

About this article

Cite this article

Share this article

Keywords

Search

Navigation