Prediction model-based kernel density estimation when group membership is subject to missing

He, Hua; Wang, Wenjuan; Tang, Wan

doi:10.1007/s10182-016-0283-y

Prediction model-based kernel density estimation when group membership is subject to missing

Original Paper
Published: 19 November 2016

Volume 101, pages 267–288, (2017)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

274 Accesses
2 Citations
Explore all metrics

Abstract

The density function is a fundamental concept in data analysis. When a population consists of heterogeneous subjects, it is often of great interest to estimate the density functions of the subpopulations. Nonparametric methods such as kernel smoothing estimates may be applied to each subpopulation to estimate the density functions if there are no missing values. In situations where the membership for a subpopulation is missing, kernel smoothing estimates using only subjects with membership available are valid only under missing complete at random (MCAR). In this paper, we propose new kernel smoothing methods for density function estimates by applying prediction models of the membership under the missing at random (MAR) assumption. The asymptotic properties of the new estimates are developed, and simulation studies and a real study in mental health are used to illustrate the performance of the new estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimension reduction for kernel-assisted M-estimators with missing response at random

Article 25 April 2018

Kernel estimation for a superpopulation probability density function under informative selection

Article 05 October 2017

A nonparametric multiple imputation approach for missing categorical data

Article Open access 06 June 2017

References

Alonzo, T.A., Pepe, M.S.: Assessing accuracy of a continuous screening test in the presence of verification bias. J. R. Stat. Soc. Ser. C 54, 173–190 (2005)
Article MathSciNet MATH Google Scholar
Alonzo, T.A., Pepe, M.S., Lumley, T.: Estimating disease prevalence in two-phase studies. Biostatistics 4, 313–326 (2003)
Article MATH Google Scholar
Begg, C.B., Greenes, R.A.: Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 39, 207–215 (1983)
Article MathSciNet Google Scholar
Chaudron, L., Szilagyi, P., Tang, W., Anson, E., Talbot, N., Wadkins, H., Tu, X., Wisner, K.: Accuracy of depression screening tools for identifying postpartum depression among urban mothers. Pediatrics 125(3), e609–e617 (2010)
Article Google Scholar
He, H., Lyness, J.M., McDermott, M.P.: Direct estimation of the area under the receiver operating characteristic curve in the presence of verification bias. Stat. Med. 28(3), 361–376 (2009)
Article MathSciNet Google Scholar
He, H., McDermott, M.: A robust method for correcting verification bias for binary tests. Biostatistics 13(1), 32–47 (2012)
Article MATH Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)
Article MathSciNet MATH Google Scholar
Linn, B.S., Linn, M.W., Gurel, L.: Cumulative illness rating scale. J. Am. Geriatr. Soc. 16, 622–626 (1968)
Article Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, Hoboken (2002)
MATH Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Pepe, M.S., Reilly, M., Fleming, T.R.: Auxiliary outcome data and the mean score method. J. Stat. Plan. Inference 42, 137–160 (1994)
Article MathSciNet MATH Google Scholar
Reilly, M., Pepe, M.S.: A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82, 299–314 (1995)
Article MathSciNet MATH Google Scholar
Simonoff, J.S.: Smoothing Methods in Statistics. Springer, New York (1996)
Book MATH Google Scholar
Spitzer, R.L., Gibbon, M., Williams, J.B.W.: Structured Clinical Interview for Axis I DSM-IV Disorders. Biometrics Research Department, New York State Psychiatric Institute (1994)
Tang, W., He, H., Gunzler, D.: Kernel smoothing density estimation when group membership is subject to missing. J. Stat. Plan. Inference 142(3), 685–694 (2012)
Article MathSciNet MATH Google Scholar
Wand, M.P., Jones, M.C.: Kernel Smoothing, Volume 60 of Monographs on Statistics and Applied Probability. Chapman and Hall Ltd., London (1995)
Book Google Scholar
Wang, Q.: Probability density estimation with data missing at random when covariables are present. J. Statist. Plann. Inference 138(3), 568–587 (2008)
Article MathSciNet MATH Google Scholar
White, H.: Maximum likelihood estimation of misspecified models. Econometrica 50(1), 1–25 (1982)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported in part by NIH Grants R33 DA027521 and R01GM108337. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors would also like to thank Jeffrey M. Lyness, M.D. for providing the data used in Sect. 6.

Author information

Authors and Affiliations

Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, 70112, USA
Hua He
Brightech International, LLC, Somerset, New Jersey, 08873, USA
Wenjuan Wang
Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, 70112, USA
Wan Tang

Authors

Hua He
View author publications
You can also search for this author in PubMed Google Scholar
Wenjuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wan Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wan Tang.

Appendix

In this appendix, we give proofs for Theorem 1–4.

Proof of Theorem 1

We first show the asymptotic distribution of $\widetilde{f}_{{\text {MS}}}(t;h)$ in Theorem 1 (a). Let $u_{i}=D_{i}R_{i}+d_{i}(1-R_{i})$ and $f_{h}(t)=E\left[ K_{h}(t-T_{i})\mid D_{i}=1\right] $, as defined in (3.3), based on (3.1), we have

$$\begin{aligned}&\sqrt{n}\left[ \widetilde{f}_{MS}(t;h)-f_{h}(t)\right] =\sqrt{n}\left[ \frac{\frac{1}{n}\sum {}_{i=1}^{n}{u}_{i}K_{h}(t-T_{i})}{\frac{1}{n}\sum {}_{i=1}^{n}{u}_{i}}-f_{h}(t)\right] \nonumber \\&=\sqrt{n}\left[ \frac{\frac{1}{n}\sum {}_{i=1}^{n}{u}_{i}K_{h}(t-T_{i} )}{\frac{1}{n}\sum {}_{i=1}^{n}{u}_{i}}-\frac{\frac{1}{n}\sum {}_{i=1}^{n} {u}_{i}K_{h}(t-T_{i})}{p}+\frac{\frac{1}{n}\sum {}_{i=1}^{n}{u}_{i} K_{h}(t-T_{i})}{p}-f_{h}(t)\right] \nonumber \\&=\frac{1}{p}\frac{1}{\sqrt{n}}\left[ \frac{\sum {}_{i=1}^{n}{u}_{i} K_{h}(t-T_{i})}{\sum {}_{i=1}^{n}{u}_{i}}\sum \limits _{i=1}^{n}\left( p-{u} _{i}\right) +\sum \limits _{i=1}^{n}\left[ {u}_{i}K_{h}(t-T_{i})-pf_{h}(t)\right] \right] . \end{aligned}$$

(7.1)

For any given h, at point t, as $n\rightarrow \infty $ , by the Weak Law of Large Numbers (WLLN), we have

$$\begin{aligned} \frac{\sum {}_{i=1}^{n}{u}_{i}K_{h}(t-T_{i})}{\sum {}_{i=1}^{n}{u}_{i} }\rightarrow f_{h}(t). \end{aligned}$$

(7.2)

By the Central Limit Theory (CLT), we have

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\left( p-{u}_{i}\right) \rightarrow N(0,Var({u}_{i})) \end{aligned}$$

(7.3)

and

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\left[ {u}_{i}K_{h}(t-T_{i})-pf_{h} (t)\right] \rightarrow N(0,Var({u}_{i}K_{h}(t-T_{i}))). \end{aligned}$$

(7.4)

By applying Slutsky’s theorem and with the consideration of the correlation between (7.3) and (7.4), we have

$$\begin{aligned} \sqrt{n}\left[ \widetilde{f}_{{\text {MS}}}(t;h)-f_{h}(t)\right] \rightarrow N(0,\sigma _{1}^{2}), \end{aligned}$$

where $\sigma _{1}^{2}=\frac{1}{p^{2}}Var\left( {u}_{i}K_{h}(t-T_{i})-{u} _{i}f_{h}(t)\right) $. The asymptotic distribution of $\widetilde{f} _{{\text {MS}}}(t;h)$ in Theorem 1(a) has been proved.

Replacing $u_{i}=D_{i}R_{i}+d_{i}(1-R_{i})\ $ by $u_{i}=d_{i}$, and applying a similar argument for the proof of Theorem 1 (a), we can prove Theorem 1 (b). $\square $

Proof of Theorem 2

Let f(t) be the density function for the diseased population, and let $z_{i}=\left( T_{i}-t\right) /h,$ then

$$\begin{aligned} Bias\left[ \widetilde{f}_{{\text {MS}}}(t;h)\right]&=E\left[ \widetilde{f} _{{\text {MS}}}(t;h)-f(t)\right] \nonumber \\&=E\left[ \widetilde{f}_{{\text {MS}}}(t;h)-f_{h}(t)+f_{h}(t)-f(t)\right] \nonumber \\&=E\left[ \widetilde{f}_{{\text {MS}}}(t;h)-f_{h}(t)\right] +E\left[ f_{h} (t)-f(t)\right] . \end{aligned}$$

(7.5)

Based on Theorem 1, we have

$$\begin{aligned} E\left[ \widetilde{f}_{{\text {MS}}}(t;h)-f_{h}(t)\right] =0\text { as } n\rightarrow \infty . \end{aligned}$$

Since both $f_{h}(t)$ and f(t) are defined for the diseased population, we have

$$\begin{aligned} f_{h}(t)-f(t)&=\int K_{h}(T_{i}-t)f(T_{i})dT_{i}-f(t)\nonumber \\&=\int \frac{1}{h}K(z_{i})f(t+hz_{i})hdz_{i}-f(t)\nonumber \\&=\int K(z_{i})\left[ f(t)+f^{\prime }(t)hz_{i}+\frac{1}{2}f^{\prime \prime }(t)h^{2}z_{i}^{2}+o(h^{2})\right] dz_{i}-f(t)\nonumber \\&=\frac{1}{2}h^{2}\mu _{2}(K)f^{\prime \prime }(t)+o(h^{2}). \end{aligned}$$

(7.6)

Combining (7.5) and (7.6), we have $Bias\left[ \widetilde{f}_{{\text {MS}}}(t;h)\right] =\frac{1}{2}h^{2}\mu _{2}(K)f^{\prime \prime } (t)+o(h^{2}).$

Next, we derive the variance of $\widetilde{f}_{MS}(t;h).$ Let $w(t)=E\big ( \pi _{i}d_{i}+d_{i}^{2}(1-\pi _{i})\mid T_{i}=t\big )$, $z_{i}=\left( T_{i}-t\right) /h$ and g(t) be the population density function of T. Based on Theorem 1, the asymptotic variance for $\widetilde{f} _{{\text {MS}}}(t)$ is

$$\begin{aligned} Var(\widetilde{f}_{{\text {MS}}}(t))&=\frac{1}{np^{2}}Var\left[ \left( K_{h} (t-T_{i})-f_{h}(t)\right) \left( D_{i}R_{i}+d_{i}(1-R_{i})\right) \right] \\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) \left( D_{i}R_{i}+d_{i}(1-R_{i})\right) \right] ^{2}\\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}\left( D_{i}R_{i}+d_{i}^{2}(1-R_{i})\right) \right] \\&=\frac{1}{np^{2}}E\left\{ E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}\left( D_{i}R_{i}+d_{i}^{2}(1-R_{i})\right) \mid T_{i}=t,x_{i}=x\right] \right\} \\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}\left( \pi _{i}d_{i}+d_{i}^{2}(1-\pi _{i})\right) \right] \\&=\frac{1}{np^{2}}E\left\{ E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}\left( \pi _{i}d_{i}+d_{i}^{2}(1-\pi _{i})\right) \mid T_{i}=t\right] \right\} \\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}w(T_{i})\right] ,\\&=\frac{1}{np^{2}}\int \left( \frac{1}{h^{2}}K^{2}(z_{i})+o\left( \frac{1}{h^{2} }\right) \right) w(t+hz_{i})g(t+hz_{i})hdz_{i}\\&=\frac{g(t)w(t)}{nhp^{2}}\int K^{2}(z_{i})dz_{i}+o\left( \frac{1}{nh}\right) \\&=\frac{g(t)w(t)}{nhp^{2}}\int K^{2}(z_{i})dz_{i}+o\left( \frac{1}{nh}\right) . \end{aligned}$$

Hence, the asymptotic variance of $\widetilde{f}_{{\text {MS}}}(t)$ is

$$\begin{aligned} \frac{g(t)R(K)}{nhp^{2}}E\left[ d_{i}^{2}+\pi _{i}(d_{i}-d_{i}^{2})\mid T_{i}=t\right] +o\left( \frac{1}{nh}\right) ,\text { with }R(K)=\int K^{2}(t)dt. \end{aligned}$$

Let $w(t)=E\left( d_{i}^{2}\mid T_{i}=t\right) .$ Based on Theorem 1, with a similar argument, the asymptotic variance for $\widetilde{f}_{BG}(t)$ can be derived as below:

$$\begin{aligned}&\frac{1}{np^{2}}Var\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) d_{i}\right] \\&\quad =\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2} d_{i}^{2}\right] \\&\quad =\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}w(T_{i})\right] \\&\quad =\frac{g(t)R(K)}{np^{2}}E\left[ d_{i}^{2}\mid T_{i}=t\right] +o\left( \frac{1}{nh}\right) . \end{aligned}$$

$\square $

Proof of Theorem 3

We first show the asymptotic distribution of in Theorem 3 (a). Suppose we have a prediction model (3.8) and the parameters are estimated from (3.9). Let $\widehat{{\beta }}$ be the estimate of $\beta $. Based on the Taylor expansion of (3.9) at $\beta $, we have

$$\begin{aligned} \widehat{{\beta }}-{\beta }=\frac{1}{n}\mathbf {I}^{-1}\sum \limits _{i=1}^{n}\Psi i+o\left( \frac{1}{n}\right) , \end{aligned}$$

where $\mathbf {I=-}E[\frac{\partial \Psi _{i}}{\partial {\beta }^{T}}].$ If $\widehat{{\beta }}$ are estimated from the score equation, $\mathbf {I}$ is the Fisher information matrix.

Since

the asymptotic distribution of $\sqrt{n}\left[ \widetilde{f}_{{\text {MS}}} (t)-f_{h}(t)\right] $ is already given in Theorem 1(a). We will focus on deriving the asymptotic distribution of $\sqrt{n}\left[ \widehat{f}_{{\text {MS}}}(t)-\widetilde{f}_{{\text {MS}}}(t)\right] $.

Let $\widehat{u}_{i}=D_{i}R_{i}+\widehat{d}_{i}(1-R_{i})=D_{i}R_{i} +g(x_{i};\widehat{{\beta }})(1-R_{i}).$ Based on (3.10) and (3.1), by applying WLLE, we have

The second term

$$\begin{aligned} II= & {} \frac{1}{\sqrt{n}p}\sum \limits _{i=1}^{n}K_{h}(t-T_{i})\left( \widehat{u}_{i}-{u}_{i}\right) \\= & {} \frac{1}{\sqrt{n}p}\sum \limits _{i=1}^{n}K_{h}(t-T_{i})\left[ \frac{\partial u_{i}}{\partial {\beta }^{T}}(\widehat{{\beta }}-\mathbf {\beta })+o(\frac{1}{n})\right] \\= & {} \frac{1}{n}\left[ \sum \limits _{i=1}^{n}K_{h}(t-T_{i})\frac{\partial u_{i} }{\partial {\beta }^{T}}\right] \frac{\sqrt{n}}{p}(\widehat{{\beta }}-{\beta })+o(1)\\= & {} E\left( K_{h}(t-T_{i})\frac{\partial u_{i}}{\partial {\beta }^{T} }\right) \frac{\mathbf {I}^{-1}}{\sqrt{n}p}\sum \limits _{i=1}^{n}\Psi i+o(1), \end{aligned}$$

where $\mathbf {I}=-E[\frac{\partial \Psi _{i}}{\partial {\beta }^{T}}]$.

For the first term, by Slutsky’s theorem,

$$\begin{aligned} \frac{1}{\frac{1}{n}\Sigma _{i=1}^{n}\widehat{u}_{i}}-\frac{1}{\frac{1}{n}\Sigma _{i=1}^{n}u_{i}}=-\frac{1}{p^{2}}E\left[ \frac{\partial u_{i} }{\partial {\beta }^{T}}({\beta })\right] (\widehat{{\beta }}-{\beta })+o(1). \end{aligned}$$

Thus,

$$\begin{aligned} I&=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\frac{1}{n}\Sigma _{i=1}^{n} \widehat{u}_{i}K_{h}(t-T_{i})\left[ \frac{1}{\frac{1}{n}\Sigma _{i=1} ^{n}\widehat{u}_{i}}-\frac{1}{\frac{1}{n}\Sigma _{i=1}^{n}u_{i}}\right] \\&=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\left[ (u_{i}+\frac{\partial u_{i} }{\partial {\beta }^{T}}(\widehat{{\beta }}-{\beta } ))K_{h}(t-T_{i})\right] \\&\quad \times \left( -\frac{1}{p^{2}}E\left[ \frac{\partial u_{i}}{\partial {\beta }^{T}}({\beta })\right] (\widehat{{\beta }}-{\beta })\right) +o(1)\\&=-\frac{f_{h}(t)}{p}E\left[ \frac{\partial u_{i}}{\partial \mathbf {\beta }^{T}}({\beta })\right] \sqrt{n}(\widehat{{\beta }} -{\beta })+o(1)\\&=-\frac{f_{h}(t)}{p}E\left[ \frac{\partial u_{i}}{\partial \mathbf {\beta }^{T}}({\beta })\right] \frac{1}{\sqrt{n}}\mathbf {I}^{-1}\sum \limits _{i=1}^{n}\Psi i+o(1). \end{aligned}$$

It follows that

where $\sigma _{3}^{2}=\frac{1}{p^{2}}Var\big ( {u}_{i}K_{h}(t-T_{i} )-f_{h}(t){u}_{i}+\big ( E\left( K_{h}(t-T_{i})\frac{\partial u_{i} }{\partial {\beta }^{T}}\right) -f_{h}(t)E\big ( \frac{\partial u_{i} }{\partial {\beta }^{T}}\big ) \big ) \mathbf {I}^{-1}\Psi i\big ) $. Let $\mathbf {c}=E\left[ K_{h}(t-T_{i})(1-R_{i})\frac{\partial g_{i} }{\partial {\beta }^{T}}({\beta })\right] $ and $\mathbf {d} =E\left[ \!(1-R_{i})\frac{\partial g_{i}}{\partial {\beta }^{T} }({\beta })\!\right] .$ Since $\frac{\partial u_{i}}{\partial {\beta }^{T}}=(1-R_{i})\frac{\partial g_{i}}{\partial {\beta }^{T} },$ we have

$$\begin{aligned} \sigma _{3}^{2}=\frac{1}{p^{2}}Var\left\{ u_{i}\left[ K_{h}(t-T_{i} )-f_{h}(t)\right] +\left( \mathbf {c}-f_{h}(t)\mathbf {d}\right) \mathbf {e}^{-1}\Psi _{i}\right\} . \end{aligned}$$

Theorem 3(b) can be proved similarly by replacing $u_{i}$ by $d_{i}$ and $\widehat{u}_{i}$ by $\widehat{d}_{i}$ in the above arguments. $\square $

Proof of Theorem 4

Based on Theorem 3, the asymptotic bias for both and is $f_{h}(t)-f(t).$ Thus, the bias follows from the proof of Theorem 2.

The proof for the asymptotic variance also follows similarly to that of Theorem 2:

Let $w(t)=E\left[ \pi _{i}d_{i}+d_{i}^{2}(1-\pi _{i})\mid T_{i}=t\right] ,$ $w_{1}(t)=E\big \{ u_{i}\left( \mathbf {c}-f_{h}(t)\mathbf {d}\right) \mathbf {e}^{-1}\Psi _{i}\mid T_{i}=t\big \} $, and $w_{2}(T_{i})=E\left[ (\left( \mathbf {c}-f_{h}(t)\mathbf {d}\right) \mathbf {e}^{-1}\Psi _{i} )^{2}\right] .$ Based on Theorem 3, the asymptotic variance for can be derived as below:

$$\begin{aligned}&\frac{1}{np^{2}}Var\left[ u_{i}\left( K_{h}(t-T_{i})-f_{h}(t)\right) +\left( \mathbf {c}-f_{h}(t)\mathbf {d}\right) \mathbf {e}^{-1}\Psi _{i}\right] \\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) \left( D_{i}R_{i}+d_{i}(1-R_{i})\right) +\left( \mathbf {c}-f_{h}(t)\mathbf {d} \right) \mathbf {e}^{-1}\Psi _{i}\right] ^{2}\\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}\left( D_{i}R_{i}+d_{i}^{2}(1-R_{i})\right) \right. \\&\quad \left. +2\left( K_{h}(t-T_{i})-f_{h}(t)\right) \left( D_{i}R_{i} +d_{i}(1-R_{i})\right) \left( \mathbf {c}-f_{h}(t)\mathbf {d}\right) \mathbf {e}^{-1}\Psi _{i}\right. \\&\quad \left. +(\left( \mathbf {c}-f_{h}(t)\mathbf {d}\right) \mathbf {e}^{-1}\Psi _{i})^{2}\right] \\&=\frac{1}{np^{2}}E\left[ \left( K_{h}(t-T_{i})-f_{h}(t)\right) ^{2}w(T_{i})+2\left( K_{h}(t-T_{i})-f_{h}(t)\right) w_{1}(T_{i})+w_{2} (T_{i})\right] \\&=\frac{1}{np^{2}}\int \left( \frac{1}{h^{2}}K^{2}(z_{i})\right) w(t+hz_{i})g(t+hz_{i})hdz_{i}+o\left( \frac{1}{nh}\right) \\&=\frac{g(t)w(t)}{nhp^{2}}\int K^{2}(z_{i})dz_{i}+o\left( \frac{1}{nh}\right) . \end{aligned}$$

Hence, the asymptotic variance of is

$$\begin{aligned} \frac{g(t)R(K)}{nhp^{2}}E\left[ d_{i}^{2}+\pi _{i}(d_{i}-d_{i}^{2})\mid T_{i}=t\right] +o\left( \frac{1}{nh}\right) ,\text { where }R(K)=\int K^{2}(t)dt. \end{aligned}$$

Similarly, we can prove the asymptotic variance of in (3.16). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, H., Wang, W. & Tang, W. Prediction model-based kernel density estimation when group membership is subject to missing. AStA Adv Stat Anal 101, 267–288 (2017). https://doi.org/10.1007/s10182-016-0283-y

Download citation

Received: 06 May 2016
Accepted: 10 November 2016
Published: 19 November 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s10182-016-0283-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction model-based kernel density estimation when group membership is subject to missing

Abstract

Access this article

Similar content being viewed by others

Dimension reduction for kernel-assisted M-estimators with missing response at random

Kernel estimation for a superpopulation probability density function under informative selection

A nonparametric multiple imputation approach for missing categorical data

References

Acknowledgements